# mxnet.gluon.loss.SoftmaxCrossEntropyLoss¶

class mxnet.gluon.loss.SoftmaxCrossEntropyLoss(axis=-1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

Computes the softmax cross entropy loss. (alias: SoftmaxCELoss)

If sparse_label is True (default), label should contain integer category indicators:

\begin{align}\begin{aligned}\DeclareMathOperator{softmax}{softmax}\\p = \softmax({pred})\\L = -\sum_i \log p_{i,{label}_i}\end{aligned}\end{align}

label’s shape should be pred’s shape with the axis dimension removed. i.e. for pred with shape (1,2,3,4) and axis = 2, label’s shape should be (1,2,4).

If sparse_label is False, label should contain probability distribution and label’s shape should be the same with pred:

\begin{align}\begin{aligned}p = \softmax({pred})\\L = -\sum_i \sum_j {label}_j \log p_{ij}\end{aligned}\end{align}
Parameters
• axis (int, default -1) – The axis to sum over when computing softmax and entropy.

• sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.

• from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.

• weight (float or None) – Global scalar weight for loss.

• batch_axis (int, default 0) – The axis that represents mini-batch.

Inputs:
• pred: the prediction tensor, where the batch_axis dimension ranges over batch size and axis dimension ranges over the number of classes.

• label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the axis dimension removed. i.e. for pred with shape (1,2,3,4) and axis = 2, label’s shape should be (1,2,4) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].

• sample_weight: element-wise weighting tensor. Must be broadcastable to the same shape as label. For example, if label has shape (64, 10) and you want to weigh each sample in the batch separately, sample_weight should have shape (64, 1).

Outputs:
• loss: loss tensor with shape (batch_size,). Dimenions other than batch_axis are averaged out.

__init__(axis=-1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

