Table Of Contents
Table Of Contents


class mxnet.gluon.loss.SoftmaxCrossEntropyLoss(axis=-1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

Computes the softmax cross entropy loss. (alias: SoftmaxCELoss)

If sparse_label is True (default), label should contain integer category indicators:

\[ \begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\p = \softmax({pred})\\L = -\sum_i \log p_{i,{label}_i}\end{aligned}\end{align} \]

label’s shape should be pred’s shape with the axis dimension removed. i.e. for pred with shape (1,2,3,4) and axis = 2, label’s shape should be (1,2,4).

If sparse_label is False, label should contain probability distribution and label’s shape should be the same with pred:

\[ \begin{align}\begin{aligned}p = \softmax({pred})\\L = -\sum_i \sum_j {label}_j \log p_{ij}\end{aligned}\end{align} \]
  • axis (int, default -1) – The axis to sum over when computing softmax and entropy.

  • sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.

  • from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.

  • weight (float or None) – Global scalar weight for loss.

  • batch_axis (int, default 0) – The axis that represents mini-batch.

  • pred: the prediction tensor, where the batch_axis dimension ranges over batch size and axis dimension ranges over the number of classes.

  • label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the axis dimension removed. i.e. for pred with shape (1,2,3,4) and axis = 2, label’s shape should be (1,2,4) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].

  • sample_weight: element-wise weighting tensor. Must be broadcastable to the same shape as label. For example, if label has shape (64, 10) and you want to weigh each sample in the batch separately, sample_weight should have shape (64, 1).

  • loss: loss tensor with shape (batch_size,). Dimenions other than batch_axis are averaged out.

__init__(axis=-1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.


__init__([axis, sparse_label, from_logits, …])

Initialize self.


Applies fn recursively to every child block as well as self.


Cast this Block to use another data type.


Returns a ParameterDict containing this Block and all of its children’s Parameters(default), also can returns the select ParameterDict which match some given regular expressions.

export(path[, epoch])

Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.

forward(x, *args)

Defines the forward computation.

hybrid_forward(F, pred, label[, sample_weight])

Overrides to construct symbolic graph for this Block.


Activates or deactivates HybridBlock s recursively.


Infers shape of Parameters from inputs.


Infers data type of Parameters from inputs.

initialize([init, ctx, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load_parameters(filename[, ctx, …])

Load parameters from file previously saved by save_parameters.

load_params(filename[, ctx, allow_missing, …])

[Deprecated] Please use load_parameters.


Returns a name space object managing a child Block and parameter names.

register_child(block[, name])

Registers block as a child of self.


Registers a forward hook on the block.


Registers a forward pre-hook on the block.


Save parameters to file.


[Deprecated] Please use save_parameters.


Print the summary of the model’s output and parameters.



Name of this Block, without ‘_’ in the end.


Returns this Block’s parameter dictionary (does not include its children’s parameters).


Prefix of this Block.