Table Of Contents
Table Of Contents

Performance

The following tutorials will help you learn how to tune MXNet or use tools that will improve training and inference performance.

Essential

Improving Performancehttps://mxnet.incubator.apache.org/versions/master/faq/perf.html

How to get the best performance from MXNet.

Profilerhttps://mxnet.incubator.apache.org/versions/master/tutorials/python/profiler.html

How to profile MXNet models.

Tuning NumPy Operationshttps://mxnet.incubator.apache.org/versions/master/tutorials/gluon/gotchas_numpy_in_mxnet.html

Gotchas using NumPy in MXNet.

Compression

Compression: float16compression/float16.html

How to use float16 in your model to boost training speed.

Gradient Compressioncompression/gradient_compression.html

How to use gradient compression to reduce communication bandwidth and increase speed.

Accelerated Backend

TensorRTbackend/tensorRt.html

How to use NVIDIA’s TensorRT to boost inference performance.

Distributed Training

Distributed Training Using the KVStore APIhttps://mxnet.incubator.apache.org/versions/master/faq/distributed_training.html

How to use the KVStore API to use multiple GPUs when training a model.

Training with Multiple GPUs Using Model Parallelismhttps://mxnet.incubator.apache.org/versions/master/faq/model_parallel_lstm.html

An overview of using multiple GPUs when training an LSTM.

Data Parallelism in MXNethttps://mxnet.incubator.apache.org/versions/master/faq/multi_devices.html

An overview of distributed training strategies.

MXNet with Horovodhttps://github.com/apache/incubator-mxnet/tree/master/example/distributed_training-horovod

A set of example scripts demonstrating MNIST and ImageNet training with Horovod as the distributed training backend.