[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828

lazycal · 2022-06-13T02:50:29Z

Describe the bug
Say a=[0, 1, 2, ..., N-1] is passed to a 1-D Pad op with mode='reflect' and padding=[-(N-1), N-1]. ORT produces seg-fault or values that look like garbage memory (depending on N).

I am guessing that ORT performs the negative padding (i.e., slicing) before the reflected padding. So it will remove the first N-1 elements from a, at which point there are not enough elements to be "reflected".

I tried reading the code related to Pad. Though not able to fully understand it, the logic appears to verify my guess:

onnxruntime/onnxruntime/core/providers/cpu/tensor/pad.cc

Line 327 in 0869f4f

    
           SliceIterator<T> input(input_tensor, input_shape, input_starts, input_extents, {});

this line slices the input. Afterwards, in

onnxruntime/onnxruntime/core/providers/cpu/tensor/pad.cc

Lines 421 to 427 in 0869f4f

    
           output = input.CopyInnermostAxisSolitaryInnerStep(output); 
        
           int64_t prePad = reshaped_pad[inner_axis]; 
        
           int64_t postPad = reshaped_pad[inner_axis + new_dims_count]; 
        
           if (inner_no_pad_size == 1) { 
        
             PadInnermostAxis(axisStart - prePad, axisStart + prePad, -1 /* inputDelta */, prePad); 
        
             PadInnermostAxis(output, output - 2, -1 /* inputDelta */, postPad);

L421 copies the after-slicing data into a new and smaller buffer output and perform the reflected padding at L426-427. This smaller buffer has no enough elements to "reflect" in this case, and out-of-bound array accessing follows, causing the errors.

Urgency
None

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 18.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: 0869f4f
Python version: 3.7.11
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source): clang version 11.1.0-++20211011094159+1fdec59bffc1-1~~exp1~~20211011214614.8
CUDA/cuDNN version: 11.2
GPU model and memory: RTX2080, 8GB

To Reproduce
Run this script:

import onnxruntime as ort
import onnx
from onnx import helper
from onnx import AttributeProto, TensorProto, GraphProto
import numpy as np
import sys
N = int(sys.argv[1])
print('testing N=', N)
x = helper.make_tensor_value_info('x', TensorProto.FLOAT, [N])
y = helper.make_tensor_value_info('y', TensorProto.FLOAT, [N])
pads = np.array([-(N - 1), (N - 1)], dtype=np.int64)

node = helper.make_node(
    'Pad',
    inputs=['x', 'pads'],
    outputs=['y'],
    mode='reflect'
)

graph_def = helper.make_graph(
    [node],        # nodes
    'test-model',      # name
    [x],  # inputs
    [y],               # outputs
    initializer=[helper.make_tensor('pads', TensorProto.INT64, [2], pads)]
)

model_def = helper.make_model(graph_def, producer_name='onnx-example')
onnx.checker.check_model(model_def, full_check=True)
onnx.save(model_def, "output.onnx")

x = np.arange(N, dtype=np.float32)
print('input=')
print(x)
# print(f'input=[0,1,...,{N-1}]')
print(f'padding=[-{N-1}, {N-1}]')
print('output=')
y_ort = ort.InferenceSession("output.onnx", provider_options=[
    "CPUExecutionProvider"]).run(["y"], {"x": x})[0]
print(y_ort)

Expected behavior
It appears that ORT supports Pad op using both negative and positive paddings at the same time (and ONNX spec also doesn't seem to prohibit this).

If that is the case, this sounds like a bug. Hope it can be an easy fix, e.g., passing the original buffer to the 2nd argument of PadInnermostAxis seems to do the job, but I unfortunately don't know the code well enough to make the patch
If not, would be great to have a check that rejects mixed positive and negative padding, instead of segfault or silently producing incorrect values.

Screenshots

and the stack trace for segfault run:

Additional context

The text was updated successfully, but these errors were encountered:

yuslepukhin · 2022-06-13T17:02:16Z

Thank you for a good report.

### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```

### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```

### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```

### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```

yuslepukhin added the component:operator label Jun 13, 2022

gramalingam mentioned this issue Jun 17, 2022

[Tracking] Spec clarifications onnx/onnx#3651

Open

sophies927 added core runtime issues related to core runtime and removed component:operator labels Aug 12, 2022

adrianlizarraga mentioned this issue Nov 11, 2024

[Quant Tool] Flaky test due to Pad reflect bug #22798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828

[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828

lazycal commented Jun 13, 2022 •

edited

Loading

yuslepukhin commented Jun 13, 2022

[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828

[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828

Comments

lazycal commented Jun 13, 2022 • edited Loading

yuslepukhin commented Jun 13, 2022

lazycal commented Jun 13, 2022 •

edited

Loading