Table Of Contents
Table Of Contents

ctc_loss

mxnet.ndarray.contrib.ctc_loss(data=None, label=None, data_lengths=None, label_lengths=None, use_data_lengths=_Null, use_label_lengths=_Null, blank_label=_Null, out=None, name=None, **kwargs)

Connectionist Temporal Classification Loss.

The shapes of the inputs and outputs:

  • data: (sequence_length, batch_size, alphabet_size)
  • label: (batch_size, label_sequence_length)
  • out: (batch_size)

The data tensor consists of sequences of activation vectors (without applying softmax), with i-th channel in the last dimension corresponding to i-th label for i between 0 and alphabet_size-1 (i.e always 0-indexed). Alphabet size should include one additional value reserved for blank label. When blank_label is "first", the 0-th channel is be reserved for activation of blank label, or otherwise if it is “last”, (alphabet_size-1)-th channel should be reserved for blank label.

label is an index matrix of integers. When blank_label is "first", the value 0 is then reserved for blank label, and should not be passed in this matrix. Otherwise, when blank_label is "last", the value (alphabet_size-1) is reserved for blank label.

If a sequence of labels is shorter than label_sequence_length, use the special padding value at the end of the sequence to conform it to the correct length. The padding value is 0 when blank_label is "first", and -1 otherwise.

For example, suppose the vocabulary is [a, b, c], and in one batch we have three sequences ‘ba’, ‘cbb’, and ‘abac’. When blank_label is "first", we can index the labels as {‘a’: 1, ‘b’: 2, ‘c’: 3}, and we reserve the 0-th channel for blank label in data tensor. The resulting label tensor should be padded to be:

[[2, 1, 0, 0], [3, 2, 2, 0], [1, 2, 1, 3]]

When blank_label is "last", we can index the labels as {‘a’: 0, ‘b’: 1, ‘c’: 2}, and we reserve the channel index 3 for blank label in data tensor. The resulting label tensor should be padded to be:

[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]

out is a list of CTC loss values, one per example in the batch.

See Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves et al. for more information on the definition and the algorithm.

Defined in src/operator/contrib/ctc_loss.cc:L115

Parameters:
  • data (NDArray) – Input data to the ctc_loss op.
  • label (NDArray) – Ground-truth labels for the loss.
  • data_lengths (NDArray) – Lengths of data for each of the samples. Only required when use_data_lengths is true.
  • label_lengths (NDArray) – Lengths of labels for each of the samples. Only required when use_label_lengths is true.
  • use_data_lengths (boolean, optional, default=0) – Whether the data lenghts are decided by data_lengths. If false, the lengths are equal to the max sequence length.
  • use_label_lengths (boolean, optional, default=0) – Whether the label lenghts are decided by label_lengths, or derived from padding_mask. If false, the lengths are derived from the first occurrence of the value of padding_mask. The value of padding_mask is 0 when first CTC label is reserved for blank, and -1 when last label is reserved for blank. See blank_label.
  • blank_label ({'first', 'last'},optional, default='first') – Set the label that is reserved for blank label.If “first”, 0-th label is reserved, and label values for tokens in the vocabulary are between 1 and alphabet_size-1, and the padding mask is -1. If “last”, last label value alphabet_size-1 is reserved for blank label instead, and label values for tokens in the vocabulary are between 0 and alphabet_size-2, and the padding mask is 0.
  • out (NDArray, optional) – The output NDArray to hold the result.
Returns:

out – The output of this function.

Return type:

NDArray or list of NDArrays