Trainer.update(batch_size, ignore_stale_grad=False)[source]

Makes one step of parameter update.

Should be called after autograd.backward() and outside of record() scope, and after trainer.update().

For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.

  • batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).

  • ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.