-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Mixing negative and positive paddings causes segfault/uninitialized memory values produced in reflected pad #11828
Labels
core runtime
issues related to core runtime
Comments
Thank you for a good report. |
sophies927
added
core runtime
issues related to core runtime
and removed
component:operator
labels
Aug 12, 2022
adrianlizarraga
added a commit
that referenced
this issue
Nov 12, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
yf711
pushed a commit
that referenced
this issue
Nov 12, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
ishwar-raut1
pushed a commit
to ishwar-raut1/onnxruntime
that referenced
this issue
Nov 19, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
guschmue
pushed a commit
that referenced
this issue
Dec 2, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this issue
Dec 11, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this issue
Dec 11, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this issue
Dec 11, 2024
### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: microsoft#8265 microsoft#11828 microsoft#20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Say
a=[0, 1, 2, ..., N-1]
is passed to a 1-D Pad op withmode='reflect'
andpadding=[-(N-1), N-1]
. ORT produces seg-fault or values that look like garbage memory (depending on N).I am guessing that ORT performs the negative padding (i.e., slicing) before the reflected padding. So it will remove the first
N-1
elements froma
, at which point there are not enough elements to be "reflected".I tried reading the code related to Pad. Though not able to fully understand it, the logic appears to verify my guess:
onnxruntime/onnxruntime/core/providers/cpu/tensor/pad.cc
Line 327 in 0869f4f
onnxruntime/onnxruntime/core/providers/cpu/tensor/pad.cc
Lines 421 to 427 in 0869f4f
L421 copies the after-slicing data into a new and smaller buffer
output
and perform the reflected padding at L426-427. This smaller buffer has no enough elements to "reflect" in this case, and out-of-bound array accessing follows, causing the errors.Urgency
None
System information
exp120211011214614.8To Reproduce
Run this script:
Expected behavior
It appears that ORT supports Pad op using both negative and positive paddings at the same time (and ONNX spec also doesn't seem to prohibit this).
PadInnermostAxis
seems to do the job, but I unfortunately don't know the code well enough to make the patchScreenshots
and the stack trace for segfault run:
Additional context
The text was updated successfully, but these errors were encountered: