# mxnet.gluon.loss.KLDivLoss¶

class mxnet.gluon.loss.KLDivLoss(from_logits=True, axis=-1, weight=None, batch_axis=0, **kwargs)[source]

The Kullback-Leibler divergence loss.

KL divergence measures the distance between contiguous distributions. It can be used to minimize information loss when approximating a distribution. If from_logits is True (default), loss is defined as:

$L = \sum_i {label}_i * \big[\log({label}_i) - {pred}_i\big]$

If from_logits is False, loss is defined as:

\begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\prob = \softmax({pred})\\L = \sum_i {label}_i * \big[\log({label}_i) - log({pred}_i)\big]\end{aligned}\end{align}

pred and label can have arbitrary shape as long as they have the same number of elements.

Parameters
• from_logits (bool, default is True) – Whether the input is log probability (usually from log_softmax) instead of unnormalized numbers.

• axis (int, default -1) – The dimension along with to compute softmax. Only used when from_logits is False.

• weight (float or None) – Global scalar weight for loss.

• batch_axis (int, default 0) – The axis that represents mini-batch.

Inputs:
• pred: prediction tensor with arbitrary shape. If from_logits is True, pred should be log probabilities. Otherwise, it should be unnormalized predictions, i.e. from a dense layer.

• label: truth tensor with values in range (0, 1). Must have the same size as pred.

• sample_weight: element-wise weighting tensor. Must be broadcastable to the same shape as pred. For example, if pred has shape (64, 10) and you want to weigh each sample in the batch separately, sample_weight should have shape (64, 1).

Outputs:
• loss: loss tensor with shape (batch_size,). Dimenions other than batch_axis are averaged out.

References

Kullback-Leibler divergence

__init__(from_logits=True, axis=-1, weight=None, batch_axis=0, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__([from_logits, axis, weight, batch_axis]) Initialize self. apply(fn) Applies fn recursively to every child block as well as self. cast(dtype) Cast this Block to use another data type. collect_params([select]) Returns a ParameterDict containing this Block and all of its children’s Parameters(default), also can returns the select ParameterDict which match some given regular expressions. export(path[, epoch]) Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface. forward(x, *args) Defines the forward computation. hybrid_forward(F, pred, label[, sample_weight]) Overrides to construct symbolic graph for this Block. hybridize([active]) Activates or deactivates HybridBlock s recursively. infer_shape(*args) Infers shape of Parameters from inputs. infer_type(*args) Infers data type of Parameters from inputs. initialize([init, ctx, verbose, force_reinit]) Initializes Parameter s of this Block and its children. load_parameters(filename[, ctx, …]) Load parameters from file previously saved by save_parameters. load_params(filename[, ctx, allow_missing, …]) [Deprecated] Please use load_parameters. name_scope() Returns a name space object managing a child Block and parameter names. register_child(block[, name]) Registers block as a child of self. register_forward_hook(hook) Registers a forward hook on the block. register_forward_pre_hook(hook) Registers a forward pre-hook on the block. save_parameters(filename) Save parameters to file. save_params(filename) [Deprecated] Please use save_parameters. summary(*inputs) Print the summary of the model’s output and parameters.

Attributes

 name Name of this Block, without ‘_’ in the end. params Returns this Block’s parameter dictionary (does not include its children’s parameters). prefix Prefix of this Block.