Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognize NaN operands in Min and Max ops #19984

Merged
merged 5 commits into from
Mar 21, 2024

Conversation

tpboudreau
Copy link
Contributor

@tpboudreau tpboudreau commented Mar 19, 2024

Description

Update the Min and Max CUDA math operations on float/double types to propagate NaNs: if either operand is NaN, the result should be NaN.

TODO: float16/bfloat16 need similar change.

Motivation

Currently, results differ between the CPU and CUDA implementations of the floating point Min and Max operators: the CPU operators correctly return NaN results if either operand is NaN. This PR updates the CUDA implementations to conform with this correct behavior.

See the the issue and comments raised here.

Context

Same behavior in numpy, torch and Java:

>>> numpy.min([numpy.NAN, 1])
nan
>>> numpy.max([numpy.NAN, 1])
nan

>>> torch.min(torch.tensor([1, float('nan')]))
tensor(nan)
>>> torch.max(torch.tensor([1, float('nan')]))
tensor(nan)

C languguage fmin and fmax has different behavior:

fmax(NaN,1) = 1
fmin(NaN,1) = 1

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf
image

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@baijumeswani
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@baijumeswani
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@baijumeswani
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

Any idea of buffer overflow:

1: [ RUN ] MathOpTest.Min_12_MLFloat16_Nan
1: =================================================================
1: ==5581==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60900006c1d2 at pc 0x000004062aad bp 0x7ffe126023c0 sp 0x7ffe126023b0
1: READ of size 2 at 0x60900006c1d2 thread T0
1: #0 0x4062aac in Eigen::internal::mapbase_evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> >, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const>::coeff(long) const /build/Debug/_deps/eigen-src/Eigen/src/Core/CoreEvaluators.h:917
1: #1 0x40600a6 in Eigen::internal::binary_evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, Eigen::half, Eigen::half>::coeff(long) const /build/Debug/_deps/eigen-src/Eigen/src/Core/CoreEvaluators.h:775
1: #2 0x405b37c in Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>::assignCoeff(long) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:660
1: #3 0x4054d8d in Eigen::internal::dense_assignment_loop<Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>, 1, 0>::run(Eigen::internal::generic_dense_assignment_kernel<Eigen::internal::evaluator<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >, Eigen::internal::evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >, Eigen::internal::assign_op<Eigen::half, Eigen::half>, 0>&) (/build/Debug/onnxruntime_test_all+0x4054d8d) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #4 0x4049957 in void Eigen::internal::call_dense_assignment_loop<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x4049957) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #5 0x4036752 in Eigen::internal::Assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half>, Eigen::internal::Dense2Dense, void>::run(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x4036752) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #6 0x40115c3 in void Eigen::internal::call_assignment_no_alias<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&) (/build/Debug/onnxruntime_test_all+0x40115c3) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #7 0x400a70e in void Eigen::internal::call_assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::assign_op<Eigen::half, Eigen::half> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&, Eigen::internal::assign_op<Eigen::half, Eigen::half> const&, Eigen::internal::enable_if<!Eigen::internal::evaluator_assume_aliasing<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const>, Eigen::internal::evaluator_traits<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >::Shape>::value, void*>::type) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:858
1: #8 0x400105f in void Eigen::internal::call_assignment<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >(Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >&, Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> const&) /build/Debug/_deps/eigen-src/Eigen/src/Core/AssignEvaluator.h:836
1: #9 0x3fe1290 in Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >& Eigen::DenseBase<Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> > >::operator=<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> >(Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<Eigen::half, Eigen::half, 0>, Eigen::Map<Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> const, 0, Eigen::Stride<0, 0> > const, Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_opEigen::half, Eigen::Array<Eigen::half, -1, 1, 0, -1, 1> > const> > const&) (/build/Debug/onnxruntime_test_all+0x3fe1290) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #10 0x3f979ea in onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*)::{lambda(onnxruntime::BroadcastHelper&)#3}::operator()(onnxruntime::BroadcastHelper&) const (/build/Debug/onnxruntime_test_all+0x3f979ea) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #11 0x3f97a8d in onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*)::{lambda(onnxruntime::BroadcastHelper&)#3}::_FUN(onnxruntime::BroadcastHelper&) (/build/Debug/onnxruntime_test_all+0x3f97a8d) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #12 0x3f9ec02 in void onnxruntime::BroadcastLooperonnxruntime::BroadcastHelper(onnxruntime::BroadcastHelper&, onnxruntime::ProcessBroadcastSpanFuncs const&) /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:1014
1: #13 0x3f8118a in UntypedBroadcastVariadic /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:2035
1: #14 0x3f982c6 in onnxruntime::common::Status onnxruntime::MinMaxMLFloat16(onnxruntime::OpKernel const&, onnxruntime::OpKernelContext*) (/build/Debug/onnxruntime_test_all+0x3f982c6) (BuildId: f67bc24268f9fc76c40282cb1ebc3773948a7f2e)
1: #15 0x3f6f081 in onnxruntime::Min_8::Compute(onnxruntime::OpKernelContext*) const /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:809
1: #16 0x5655b25 in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:495
1: #17 0x5578d97 in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) /onnxruntime_src/onnxruntime/core/framework/execution_steps.cc:73
1: #18 0x5706493 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) /onnxruntime_src/onnxruntime/core/framework/stream_execution_context.cc:222
1: #19 0x5656654 in operator() /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:589
1: #20 0x565a7a7 in __invoke_impl<void, onnxruntime::ExecuteThePlan(const SessionState&, gsl::span, gsl::span, gsl::span, std::vector&, const std::unordered_map<long unsigned int, std::function<common::Status(const TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const logging::Logger&, const DeviceStreamCollection*, bool const&, bool, bool)::<lambda()>&> /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/invoke.h:61
1: #21 0x565a5bc in __invoke_r<void, onnxruntime::ExecuteThePlan(const SessionState&, gsl::span, gsl::span, gsl::span, std::vector&, const std::unordered_map<long unsigned int, std::function<common::Status(const TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const logging::Logger&, const DeviceStreamCollection*, bool const&, bool, bool)::<lambda()>&> /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/invoke.h:111
1: #22 0x5659ab5 in _M_invoke /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:290
1: #23 0x17ef75b in std::function<void ()>::operator()() const /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:591
1: #24 0x17df502 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) /onnxruntime_src/include/onnxruntime/core/platform/threadpool.h:233
1: #25 0x5656ec8 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) /onnxruntime_src/onnxruntime/core/framework/sequential_executor.cc:588
1: #26 0x575e9e8 in ExecuteGraphImpl /onnxruntime_src/onnxruntime/core/framework/utils.cc:633
1: #27 0x576057d in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) /onnxruntime_src/onnxruntime/core/framework/utils.cc:751
1: #28 0x57607d0 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) /onnxruntime_src/onnxruntime/core/framework/utils.cc:778
1: #29 0x3896ccb in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) /onnxruntime_src/onnxruntime/core/session/inference_session.cc:2508
1: #30 0x389bafd in onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >) /onnxruntime_src/onnxruntime/core/session/inference_session.cc:2716
1: #31 0x1d76c7d in void onnxruntime::test::BaseTester::ExecuteModelonnxruntime::InferenceSession(onnxruntime::Model&, onnxruntime::InferenceSession&, onnxruntime::test::BaseTester::ExpectResult, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, OrtRunOptions const
, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:332
1: #32 0x1d6f2e5 in onnxruntime::test::BaseTester::ExecuteModelForEps(std::vector<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider >, std::allocator<std::unique_ptr<onnxruntime::IExecutionProvider, std::default_deleteonnxruntime::IExecutionProvider > > >&&, onnxruntime::Model&, onnxruntime::SessionOptions, onnxruntime::test::BaseTester::ExpectResult, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, OrtRunOptions const*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, OrtValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, OrtValue> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<std::shared_ptronnxruntime::CustomRegistry, std::allocator<std::shared_ptronnxruntime::CustomRegistry > > const*, bool, bool, unsigned long*, unsigned long*) /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:835

@tpboudreau
Copy link
Contributor Author

Any idea of buffer overflow:

I have some leads, but it needs further research. Maybe I should reduce this PR to correcting Min/Max for only the float and double operand types -- that would cover many cases, including the original bug report -- and open a follow-up PR with fixes for the remaining 16-bit types after sorting out this issue?

@tianleiwu
Copy link
Contributor

I have some leads, but it needs further research. Maybe I should reduce this PR to correcting Min/Max for only the float and double operand types -- that would cover many cases, including the original bug report -- and open a follow-up PR with fixes for the remaining 16-bit types after sorting out this issue?

That's fine.

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tpboudreau
Copy link
Contributor Author

@tianleiwu -- thanks for the help and for reviewing so quickly!

@tianleiwu tianleiwu merged commit 983fd83 into microsoft:main Mar 21, 2024
80 of 82 checks passed
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
### Description
Update the Min and Max CUDA math operations on float/double types to
propagate NaNs: if either operand is NaN, the result should be NaN.

TODO: float16/bfloat16 need similar change.

### Motivation
Currently, results differ between the CPU and CUDA implementations of
the floating point Min and Max operators: the CPU operators correctly
return NaN results if either operand is NaN. This PR updates the CUDA
implementations to conform with this correct behavior.

See the the issue and comments raised
[here](onnx/onnx#6003).

### Context
Same behavior in numpy, torch and Java:
```
>>> numpy.min([numpy.NAN, 1])
nan
>>> numpy.max([numpy.NAN, 1])
nan

>>> torch.min(torch.tensor([1, float('nan')]))
tensor(nan)
>>> torch.max(torch.tensor([1, float('nan')]))
tensor(nan)
```

C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin)
and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has
different behavior:
```
fmax(NaN,1) = 1
fmin(NaN,1) = 1
```

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331)

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf
tianleiwu pushed a commit that referenced this pull request Sep 24, 2024
This makes min and max with NaN for either operand always return NaN for
float16 data, matching the behaviour of float and double.

The behaviour for floats and doubles was previously fixed for the CPU
provider in #21492 and the CUDA provider in #19984, but these PRs didn't
fix the behaviour for float16 due to tests causing asan errors. The
memory access violations with float16 data have now been fixed in
#22135, so this PR is a follow up to make float16 min and max behave the
same as float and double for both the CPU and CUDA providers now that we
can add tests for this.

### Motivation and Context

Relevant previous issues (not float16 specific):
* #21455
* onnx/onnx#6003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants