Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for LocalResponseNormalization (LRN) operation #228

Closed
MarkGHX opened this issue Nov 16, 2021 · 6 comments
Closed

Support for LocalResponseNormalization (LRN) operation #228

MarkGHX opened this issue Nov 16, 2021 · 6 comments

Comments

@MarkGHX
Copy link

MarkGHX commented Nov 16, 2021

Hi all,

I'm a student intern working on GSoC 2021 project "OpenCV.js: Accelerate OpenCV.js DNN via WebNN" (opencv/opencv#20406). Here is the proposal. In this project, I will improve the performance of OpenCV.js DNN Module using WebNN.

Here is a brief result of the improvements:

Model OpenCV.js wasm OpenCV.js wasm+simd+threads OpenCV native default OpenCV OpenVINO OpenCV WebNN OpenCV.js WebNN-polyfill OpenCV.js WebNN-Electron
GoogleNet 825.07ms 51.55ms 29.32ms 10.35ms 24.8ms 69.15ms 24.90ms
SqueezeNet 462.12ms 31.69ms 17.4ms 4.29ms 4.56ms 21.27ms 4.07ms

However, I found that there is a performance gap of GoogleNet between OpenCV OpenVINO and OpenCV WebNN, while this gap doesn't exist for SqueezeNet. This is mainly because LRN layer is not supported by WebNN. Thus, the GoogleNet is divided into four parts, which slows down the inference speed. After a further investigation, I found that both ONNX (link) and TFLite (link) support LRN and both GoogleNet and AlexNet need LRN layer. Thus, I think it is useful for WebNN to support this frequently used LRN op.

@fdwr fdwr changed the title Support for LRN operation Support for LocalResponseNormalization operation Jul 23, 2024
@fdwr fdwr changed the title Support for LocalResponseNormalization operation Support for LocalResponseNormalization (LRN) operation Jul 23, 2024
@fdwr
Copy link
Collaborator

fdwr commented Jul 25, 2024

I prototyped localResponseNormalization here, which worked for my limited use. Though, between the various implementations (TensorFlow, CoreML, DirectML, Caffe, PyTorch), there are enough little differences (kernel symmetry, which axes are used, which dimensions are windowed...) that the final WebNN form probably ought to be a more generic form taking axes rather than just axis, and windowSize rather than a radius; and it warrants a table with a clear mapping to each backend.

Known Models

  • AlexNet
  • Inception v1.12

Behavior

Local response normalization produces an output the same size as the input, using a sliding window where each output element equals the corresponding input element divided by an adjusted averaged window around it. The shape of that sliding window can vary in size and rank, along a single axis or more. Although not obvious at first, the operator is really a variation of pooling, with the general form:

function localResponseNormalization(input, axes, windowLength, scale, bias, exponent)
{
    let leadingPadding = floor((windowLength - 1) / 2); // Center halfway around sliding window
    let trailingPadding = ceil((windowLength - 1) / 2); // Center halfway around sliding window
    let padding = new Array(axes.size() * 2).fill([leadingPadding, trailingPadding]).flat();
        // 1D padding = [leadingPadding, trailingPadding]
        // 2D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding]
        // 3D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding, ...]
    let windowDimensions = new Array(axes.size()).fill([windowLength]).flat();
        // 1D windowDimensions = [windowLength]
        // 2D windowDimensions = [windowLength, windowLength]
        // 3D windowDimensions = [windowLength, windowLength, windowLength]

    let regionAverages = averagePoolND(pow(input, 2), axes, windowsDimensions, padding);
    output = input / pow((regionAverages * scale + bias), exponent);
}

Where averagePoolND is the more general pooling function form that simply takes axes directly (like the related reduction functions), rather than implied rightmost dimensions like averagePool2D (which is a subset, where averagePool2D(input, ...) = averagePoolND(input, axes = [input.rank - 2, input.rank - 1], ...)). Conversely, averagePoolND with two axes can be implemented via existing implementation's limited averagePool2D via transposes.

function averagePoolND(input, axes, ...)
{
    // e.g. Given input rank=4 and axes=[1], returns [0,2,3,1].
    //      Given input rank=3 and axes=[0,1], returns [2,0,1].
    let permutation = GetPermutationToRightmostAxes(input.rank, axes);
    let inversePermutation = GetInversePermutation(axes);
    let poolingOperator;
    switch (axes.size())
    {
    case 1: poolingOperator = averagePool1D; break;
    case 1: poolingOperator = averagePool2D; break;
    default: throw ... // Unsupported axis count
    }
    return transpose(poolingOperator(transpose(input, permutation), inversePermutation);
}

Note if you only have averagePool2D to work with (WebNN lacks an averagePool1D), then you can just set the padding for the first dimension to [0,0,*,*] and windowDimensions = [1,*].

Implementations

Implementations consistently:

  • have a scaling parameter, beta, and bias.
  • perform the equation in the same order

Implementations differ in:

  • their default values for the scaling, exponent, and bias.
  • how many axes they support, either 1 or 2. Although 3 axes would be a natural logical continuation, I haven't actually seen an implementation that directly accepts 3.
  • the exact sizes they support. Most support any positive window dimension length (1,2,3,4...), but TF only supports odd sizes (1,3,5,7...). None directly support a windowsDimensions parameter like pooling, only a window length (effectively limiting the windowDimensions to squares for 2D cases).
  • the minimum input dimension count (1, 2, or 3).
  • how they treat edges, whether to repeat edge values or pad as zeros.
API/Library Input rank Axes Padding Kernel size Defaults
TensorFlow ? 1D [rank-1] edge repeat radius * 2 + 1 s=1 e=0.5 b=1
PyTorch >=2D 1D [1] zeros square length s=.0001 e=0.75 b=1
CoreML >=3D 1D [rank-3] zeros square length s=.0001 e=0.75 b=1
Caffe >=3D 1D [1] / 2D [rank-2, rank-1] zeros square length s=1 e=0.75 b=NA
NCNN ? 1D [1] / 2D [rank-2, rank-1] zeros? square length s=1 e=0.75 b=1
ONNX >=2D 1D [1] zeros square length s=.0001 e=0.75 b=1
DirectML 4D 1D [1] / 2D [2,3] zeros square length s=.0001 e=0.75 b=1

CoreML (1D normalization)

  • 2D [a,_,1] axes=[0] // rank - 3. Append ones for trailing dimensions since minimum rank 3 requirement.
  • 3D [a,_,_] axes=[0]
  • 4D [_,a,_,_] axes=[1]
  • 5D [_,_,a,_,_] axes=[2]

TensorFlow (1D normalization)

  • 2D [_,a] axes=[1] // rank - 1
  • 3D [_,_,a] axes=[2]
  • 4D [_,_,_,a] axes=[3]
  • 5D [_,_,_,_,a] axes=[4]

PyTorch or Caffe or NCNN or ONNX (1D normalization)

  • 2D [_,a] axes=[1]
  • 3D [_,a,_] axes=[1]
  • 4D [_,a,_,_] axes=[1]
  • 5D [_,a,_,_,_] axes=[1]

DirectML (1D normalization)

  • 2D [_,a,1,1] axes=[1] // Append ones for trailing dimensions.
  • 3D [_,a,_,1] axes=[1]
  • 4D [_,a,_,_] axes=[1]
  • 5D [*,a,_,_] axes=[1] // Flatten extraleading dimensions.

Caffe and NCNN (2D normalization)

  • 2D [a,a] axes=[0,1] // rank - 2, rank - 1
  • 3D [_,a,a] axes=[1,2]
  • 4D [_,_,a,a] axes=[2,3]
  • 5D [_,_,_,a,a] axes=[3,4]

DirectML (2D normalization)

  • 2D [1,1,a,a] axes=[2,3] // rank - 2, rank - 1. Append ones for leading dimensions.
  • 3D [1,_,a,a] axes=[2,3]
  • 4D [_,_,a,a] axes=[2,3]
  • 5D [*,_,a,a] axes=[2,3] // Flatten extra leading dimensions.

Possible IDL

partial interface MLGraphBuilder {
  ...
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance, optional MLBatchNormalizationOptions options = {});
  MLOperand instanceNormalization(MLOperand input, optional MLInstanceNormalizationOptions options = {});
  MLOperand layerNormalization(MLOperand input, optional MLLayerNormalizationOptions options = {});
+ MLOperand localResponseNormalization(MLOperand input, optional MLLocalResponseNormalizationOptions options = {});
  ...
};
+dictionary MLLocalResponseNormalizationOptions {
+  sequence<unsigned long> axes;
+  unsigned long windowLength; // 1 up to input size or more
+  float scale = 1.0;      // Sometimes labeled alpha.
+  float bias = 1.0;       // Sometimes labeled k
+  float exponent = 0.5;   // Sometimes labeled beta.
+};

Data Types

float16 float32

@huningxin
Copy link
Contributor

@fdwr mentioned in #375 (comment)

Decomposition might also be fine rather than a dedicated operator

+1

There is an example of LRN decomposition in torch: https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#local_response_norm

Please note the decomposition requires the avg_pool2d to include the zero-padding in the averaging calculation (torch's avg_pool2d's count_include_pad parameter default to True).

However, the default behavior of WebNN averagePool2d doesn't count the padding elements, as Chromium prototype:

To support LRN decomposition, WebNN averagePool2d may need to support includePadding option, for example

dictionary MLAveragePool2dOptions : MLPool2dOptions {
  // Indicates whether to include the zero-padding in the averaging calculation.
  boolean includePadding = false;
};

MLOperand averagePool2d(MLOperand input, optional MLAveragePool2dOptions options = {})

Any thoughts?

@fdwr
Copy link
Collaborator

fdwr commented Oct 24, 2024

Please note the decomposition requires the avg_pool2d to include the zero-padding in the averaging calculation

Excluding padding in the averaging window (includePadding = false) is the WebNN default, and that's actually the much harder case to emulate. Supporting includePadding = true though is easy - you just add zero padding beforehand via pad and then call averagePool per normal. So adding includePadding may be useful (assuming all the backends support it), but not necessary for implementation.

@Honry's ONNX decomposition should work by adding a Pad here and calling AveragePool with pads as 0's and count_include_pad = 0:

image

@huningxin
Copy link
Contributor

@fdwr

Supporting includePadding = true though is easy - you just add zero padding beforehand via pad and then call averagePool per normal.

Great idea!

So adding includePadding may be useful (assuming all the backends support it),

AFAIK, TFLite average_pool_2d doesn't support includePadding = true, it needs to be emulated via adding zero padding beforehand.

but not necessary for implementation.

Agreed, it could be handled by framework.

@a-sully
Copy link
Contributor

a-sully commented Oct 28, 2024

Agreed, it could be handled by framework.

SGTM. Let's close this issue as not planned?

@fdwr
Copy link
Collaborator

fdwr commented Oct 28, 2024

Using decomposition in higher layers (e.g. ORT's WebNN EP) for localResponseNormalization rather than a dedicated WebNN operator due to the rarity of the operator in models and the awkward backend differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants