mxnet.ndarray.contrib.ROIAlign¶

mxnet.ndarray.contrib.ROIAlign(data=None, rois=None, pooled_size=_Null, spatial_scale=_Null, sample_ratio=_Null, out=None, name=None, **kwargs)

This operator takes a 4D feature map as an input array and region proposals as rois, then align the feature map over sub-regions of input and produces a fixed-sized output array. This operator is typically used in Faster R-CNN & Mask R-CNN networks.

Different from ROI pooling, ROI Align removes the harsh quantization, properly aligning the extracted features with the input. RoIAlign computes the value of each sampling point by bilinear interpolation from the nearby grid points on the feature map. No quantization is performed on any coordinates involved in the RoI, its bins, or the sampling points. Bilinear interpolation is used to compute the exact values of the input features at four regularly sampled locations in each RoI bin. Then the feature map can be aggregated by avgpooling.

References

He, Kaiming, et al. “Mask R-CNN.” ICCV, 2017

Defined in src/operator/contrib/roi_align.cc:L522

Parameters
• data (NDArray) – Input data to the pooling operator, a 4D Feature maps

• rois (NDArray) – Bounding box coordinates, a 2D array

• pooled_size (Shape(tuple), required) – ROI Align output roi feature map height and width: (h, w)

• spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers

• sample_ratio (int, optional, default='-1') – Optional sampling ratio of ROI align, using adaptive size by default.

• out (NDArray, optional) – The output NDArray to hold the result.

Returns

out – The output of this function.

Return type

NDArray or list of NDArrays