Table Of Contents
Table Of Contents


class mxnet.gluon.Parameter(name, grad_req='write', shape=None, dtype=<class 'numpy.float32'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]

A Container holding parameters (weights) of Blocks.

Parameter holds a copy of the parameter on each Context after it is initialized with Parameter.initialize(...). If grad_req is not 'null', it will also hold a gradient array on each Context:

ctx = mx.gpu(0)
x = mx.nd.zeros((16, 100), ctx=ctx)
w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier())
b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero())
out = mx.nd.FullyConnected(x,,, num_hidden=64)
  • name (str) – Name of this parameter.
  • grad_req ({'write', 'add', 'null'}, default 'write') –

    Specifies how to update gradient to grad arrays.

    • 'write' means everytime gradient is written to grad NDArray.
    • 'add' means everytime gradient is added to the grad NDArray. You need to manually call zero_grad() to clear the gradient buffer before each iteration when using this option.
    • ’null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
  • shape (int or tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for Symbol API, but init will throw an error when using NDArray API.
  • dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example, numpy.float32 or 'float32'.
  • lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
  • wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
  • init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
  • stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter.
  • grad_stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter’s gradient.

This can be set before or after initialization. Setting grad_req to 'null' with x.grad_req = 'null' saves memory and computation when you don’t need gradient w.r.t x.

Type:{‘write’, ‘add’, ‘null’}

Local learning rate multiplier for this Parameter. The actual learning rate is calculated with learning_rate * lr_mult. You can set it with param.lr_mult = 2.0


Local weight decay multiplier for this Parameter.


Get and set parameters

Parameter.initialize([init, ctx, …]) Initializes parameter and gradient arrays.[ctx]) Returns a copy of this parameter on one context.
Parameter.list_data() Returns copies of this parameter on all contexts, in the same order as creation.
Parameter.list_row_sparse_data(row_id) Returns copies of the ‘row_sparse’ parameter on all contexts, in the same order as creation.
Parameter.row_sparse_data(row_id) Returns a copy of the ‘row_sparse’ parameter on the same context as row_id’s.
Parameter.set_data(data) Sets this parameter’s value on all contexts.

Get and set gradients associated with parameters

Parameter.grad([ctx]) Returns a gradient buffer for this parameter on one context.
Parameter.list_grad() Returns gradient buffers on all contexts, in the same order as values().
Parameter.zero_grad() Sets gradient buffer on all contexts to 0.

Handle device contexts

Parameter.cast(dtype) Cast data and gradient of this Parameter to a new data type.
Parameter.list_ctx() Returns a list of contexts this parameter is initialized on.
Parameter.reset_ctx(ctx) Re-assign Parameter to other contexts.

Convert to symbol

Parameter.var() Returns a symbol representing this parameter.