Support for LocalResponseNormalization (LRN) operation #228

MarkGHX · 2021-11-16T06:30:01Z

Hi all,

I'm a student intern working on GSoC 2021 project "OpenCV.js: Accelerate OpenCV.js DNN via WebNN" (opencv/opencv#20406). Here is the proposal. In this project, I will improve the performance of OpenCV.js DNN Module using WebNN.

Here is a brief result of the improvements:

Model	OpenCV.js wasm	OpenCV.js wasm+simd+threads	OpenCV native default	OpenCV OpenVINO	OpenCV WebNN	OpenCV.js WebNN-polyfill	OpenCV.js WebNN-Electron
GoogleNet	825.07ms	51.55ms	29.32ms	10.35ms	24.8ms	69.15ms	24.90ms
SqueezeNet	462.12ms	31.69ms	17.4ms	4.29ms	4.56ms	21.27ms	4.07ms

However, I found that there is a performance gap of GoogleNet between OpenCV OpenVINO and OpenCV WebNN, while this gap doesn't exist for SqueezeNet. This is mainly because LRN layer is not supported by WebNN. Thus, the GoogleNet is divided into four parts, which slows down the inference speed. After a further investigation, I found that both ONNX (link) and TFLite (link) support LRN and both GoogleNet and AlexNet need LRN layer. Thus, I think it is useful for WebNN to support this frequently used LRN op.

fdwr · 2024-07-25T07:21:45Z

I prototyped localResponseNormalization here, which worked for my limited use. Though, between the various implementations (TensorFlow, CoreML, DirectML, Caffe, PyTorch), there are enough little differences (kernel symmetry, which axes are used, which dimensions are windowed...) that the final WebNN form probably ought to be a more generic form taking axes rather than just axis, and windowSize rather than a radius; and it warrants a table with a clear mapping to each backend.

Known Models

AlexNet
Inception v1.12

Behavior

Local response normalization produces an output the same size as the input, using a sliding window where each output element equals the corresponding input element divided by an adjusted averaged window around it. The shape of that sliding window can vary in size and rank, along a single axis or more. Although not obvious at first, the operator is really a variation of pooling, with the general form:

function localResponseNormalization(input, axes, windowLength, scale, bias, exponent)
{
    let leadingPadding = floor((windowLength - 1) / 2); // Center halfway around sliding window
    let trailingPadding = ceil((windowLength - 1) / 2); // Center halfway around sliding window
    let padding = new Array(axes.size() * 2).fill([leadingPadding, trailingPadding]).flat();
        // 1D padding = [leadingPadding, trailingPadding]
        // 2D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding]
        // 3D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding, ...]
    let windowDimensions = new Array(axes.size()).fill([windowLength]).flat();
        // 1D windowDimensions = [windowLength]
        // 2D windowDimensions = [windowLength, windowLength]
        // 3D windowDimensions = [windowLength, windowLength, windowLength]

    let regionAverages = averagePoolND(pow(input, 2), axes, windowsDimensions, padding);
    output = input / pow((regionAverages * scale + bias), exponent);
}

Where averagePoolND is the more general pooling function form that simply takes axes directly (like the related reduction functions), rather than implied rightmost dimensions like averagePool2D (which is a subset, where averagePool2D(input, ...) = averagePoolND(input, axes = [input.rank - 2, input.rank - 1], ...)). Conversely, averagePoolND with two axes can be implemented via existing implementation's limited averagePool2D via transposes.

function averagePoolND(input, axes, ...)
{
    // e.g. Given input rank=4 and axes=[1], returns [0,2,3,1].
    //      Given input rank=3 and axes=[0,1], returns [2,0,1].
    let permutation = GetPermutationToRightmostAxes(input.rank, axes);
    let inversePermutation = GetInversePermutation(axes);
    let poolingOperator;
    switch (axes.size())
    {
    case 1: poolingOperator = averagePool1D; break;
    case 1: poolingOperator = averagePool2D; break;
    default: throw ... // Unsupported axis count
    }
    return transpose(poolingOperator(transpose(input, permutation), inversePermutation);
}

Note if you only have averagePool2D to work with (WebNN lacks an averagePool1D), then you can just set the padding for the first dimension to [0,0,*,*] and windowDimensions = [1,*].

Implementations

Implementations consistently:

have a scaling parameter, beta, and bias.
perform the equation in the same order

Implementations differ in:

their default values for the scaling, exponent, and bias.
how many axes they support, either 1 or 2. Although 3 axes would be a natural logical continuation, I haven't actually seen an implementation that directly accepts 3.
the exact sizes they support. Most support any positive window dimension length (1,2,3,4...), but TF only supports odd sizes (1,3,5,7...). None directly support a windowsDimensions parameter like pooling, only a window length (effectively limiting the windowDimensions to squares for 2D cases).
the minimum input dimension count (1, 2, or 3).
how they treat edges, whether to repeat edge values or pad as zeros.

API/Library	Input rank	Axes	Padding	Kernel size	Defaults
TensorFlow	?	1D `[rank-1]`	edge repeat	radius * 2 + 1	s=1 e=0.5 b=1
PyTorch	>=2D	1D `[1]`	zeros	square length	s=.0001 e=0.75 b=1
CoreML	>=3D	1D `[rank-3]`	zeros	square length	s=.0001 e=0.75 b=1
Caffe	>=3D	1D `[1]` / 2D `[rank-2, rank-1]`	zeros	square length	s=1 e=0.75 b=NA
NCNN	?	1D `[1]` / 2D `[rank-2, rank-1]`	zeros?	square length	s=1 e=0.75 b=1
ONNX	>=2D	1D `[1]`	zeros	square length	s=.0001 e=0.75 b=1
DirectML	4D	1D `[1]` / 2D `[2,3]`	zeros	square length	s=.0001 e=0.75 b=1

CoreML (1D normalization)

2D [a,_,1] axes=[0] // rank - 3. Append ones for trailing dimensions since minimum rank 3 requirement.
3D [a,_,_] axes=[0]
4D [_,a,_,_] axes=[1]
5D [_,_,a,_,_] axes=[2]

TensorFlow (1D normalization)

2D [_,a] axes=[1] // rank - 1
3D [_,_,a] axes=[2]
4D [_,_,_,a] axes=[3]
5D [_,_,_,_,a] axes=[4]

PyTorch or Caffe or NCNN or ONNX (1D normalization)

2D [_,a] axes=[1]
3D [_,a,_] axes=[1]
4D [_,a,_,_] axes=[1]
5D [_,a,_,_,_] axes=[1]

DirectML (1D normalization)

2D [_,a,1,1] axes=[1] // Append ones for trailing dimensions.
3D [_,a,_,1] axes=[1]
4D [_,a,_,_] axes=[1]
5D [*,a,_,_] axes=[1] // Flatten extraleading dimensions.

Caffe and NCNN (2D normalization)

2D [a,a] axes=[0,1] // rank - 2, rank - 1
3D [_,a,a] axes=[1,2]
4D [_,_,a,a] axes=[2,3]
5D [_,_,_,a,a] axes=[3,4]

DirectML (2D normalization)

2D [1,1,a,a] axes=[2,3] // rank - 2, rank - 1. Append ones for leading dimensions.
3D [1,_,a,a] axes=[2,3]
4D [_,_,a,a] axes=[2,3]
5D [*,_,a,a] axes=[2,3] // Flatten extra leading dimensions.

Possible IDL

partial interface MLGraphBuilder {
  ...
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance, optional MLBatchNormalizationOptions options = {});
  MLOperand instanceNormalization(MLOperand input, optional MLInstanceNormalizationOptions options = {});
  MLOperand layerNormalization(MLOperand input, optional MLLayerNormalizationOptions options = {});
+ MLOperand localResponseNormalization(MLOperand input, optional MLLocalResponseNormalizationOptions options = {});
  ...
};

+dictionary MLLocalResponseNormalizationOptions {
+  sequence<unsigned long> axes;
+  unsigned long windowLength; // 1 up to input size or more
+  float scale = 1.0;      // Sometimes labeled alpha.
+  float bias = 1.0;       // Sometimes labeled k
+  float exponent = 0.5;   // Sometimes labeled beta.
+};

Data Types

float16	float32

huningxin · 2024-10-23T13:43:17Z

@fdwr mentioned in #375 (comment)

Decomposition might also be fine rather than a dedicated operator

+1

There is an example of LRN decomposition in torch: https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#local_response_norm

Please note the decomposition requires the avg_pool2d to include the zero-padding in the averaging calculation (torch's avg_pool2d's count_include_pad parameter default to True).

However, the default behavior of WebNN averagePool2d doesn't count the padding elements, as Chromium prototype:

To support LRN decomposition, WebNN averagePool2d may need to support includePadding option, for example

dictionary MLAveragePool2dOptions : MLPool2dOptions {
  // Indicates whether to include the zero-padding in the averaging calculation.
  boolean includePadding = false;
};

MLOperand averagePool2d(MLOperand input, optional MLAveragePool2dOptions options = {})

Any thoughts?

fdwr · 2024-10-24T03:12:58Z

Please note the decomposition requires the avg_pool2d to include the zero-padding in the averaging calculation

Excluding padding in the averaging window (includePadding = false) is the WebNN default, and that's actually the much harder case to emulate. Supporting includePadding = true though is easy - you just add zero padding beforehand via pad and then call averagePool per normal. So adding includePadding may be useful (assuming all the backends support it), but not necessary for implementation.

@Honry's ONNX decomposition should work by adding a Pad here and calling AveragePool with pads as 0's and count_include_pad = 0:

huningxin · 2024-10-24T03:26:44Z

@fdwr

Supporting includePadding = true though is easy - you just add zero padding beforehand via pad and then call averagePool per normal.

Great idea!

So adding includePadding may be useful (assuming all the backends support it),

AFAIK, TFLite average_pool_2d doesn't support includePadding = true, it needs to be emulated via adding zero padding beforehand.

but not necessary for implementation.

Agreed, it could be handled by framework.

a-sully · 2024-10-28T21:31:46Z

Agreed, it could be handled by framework.

SGTM. Let's close this issue as not planned?

fdwr · 2024-10-28T21:37:29Z

Using decomposition in higher layers (e.g. ORT's WebNN EP) for localResponseNormalization rather than a dedicated WebNN operator due to the rarity of the operator in models and the awkward backend differences.

dontcallmedom added the opset label Mar 3, 2023

inexorabletash added the feature request label Feb 1, 2024

fdwr changed the title ~~Support for LRN operation~~ Support for LocalResponseNormalization operation Jul 23, 2024

fdwr changed the title ~~Support for LocalResponseNormalization operation~~ Support for LocalResponseNormalization (LRN) operation Jul 23, 2024

fdwr mentioned this issue Aug 15, 2024

Support for transformers #375

Open

fdwr closed this as completed Oct 28, 2024

BruceDai mentioned this issue Oct 30, 2024

Implement LRN webmachinelearning/webnn-baseline#85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for LocalResponseNormalization (LRN) operation #228

Support for LocalResponseNormalization (LRN) operation #228

MarkGHX commented Nov 16, 2021

fdwr commented Jul 25, 2024 •

edited

Loading

huningxin commented Oct 23, 2024

fdwr commented Oct 24, 2024 •

edited

Loading

huningxin commented Oct 24, 2024

a-sully commented Oct 28, 2024

fdwr commented Oct 28, 2024

Support for LocalResponseNormalization (LRN) operation #228

Support for LocalResponseNormalization (LRN) operation #228

Comments

MarkGHX commented Nov 16, 2021

fdwr commented Jul 25, 2024 • edited Loading

Known Models

Behavior

Implementations

Possible IDL

Data Types

huningxin commented Oct 23, 2024

fdwr commented Oct 24, 2024 • edited Loading

huningxin commented Oct 24, 2024

a-sully commented Oct 28, 2024

fdwr commented Oct 28, 2024

fdwr commented Jul 25, 2024 •

edited

Loading

fdwr commented Oct 24, 2024 •

edited

Loading