Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks and valgrind errors when running with TensorRT #7286

Open
matthill opened this issue Apr 8, 2021 · 5 comments
Open

Memory leaks and valgrind errors when running with TensorRT #7286

matthill opened this issue Apr 8, 2021 · 5 comments
Assignees
Labels
ep:TensorRT issues related to TensorRT execution provider stale issues that have not been addressed in a while; categorized by a bot

Comments

@matthill
Copy link
Contributor

matthill commented Apr 8, 2021

Describe the bug
Running inference either with address sanitization enabled or under Valgrind using TensorRT produces a number of Valgrind errors (a few memory leaks on initialization/destruction, a number of mismatched delete/delete[], and uninitilized values)

Urgency
none

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04, 18.04, Jetson
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: tested on 1.5.2 and 1.7.1
  • Python version: none
  • Visual Studio version (if applicable): none
  • GCC/Compiler version (if compiling from source): 9.3
  • CUDA/cuDNN version: 11 / 7.1
  • GPU model and memory: RTX 2080, Jetson Tx-2

To Reproduce
Compile software using ONNXRuntime with TensorRT. In this case, the model uses a dynamic batch size.
Either enable address sanitization on the compile, or run under valgrind
valgrind --leak-check=full --show-posssibly-lost=no [binary]

Expected behavior
Output should be clean

Additional context

==132002== LEAK SUMMARY:
==132002== definitely lost: 96 bytes in 2 blocks
==132002== indirectly lost: 48 bytes in 2 blocks

==132002== 72 (48 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 3,517 of 5,620
==132002== at 0x483CFE3: operator new(unsigned long) (vg_replace_malloc.c:417)
==132002== by 0x10C5E5BAB: createInferRuntime_INTERNAL (in /usr/lib/libnvinfer.so.7.1.3)
==132002== by 0xFF027C51: nvinfer1::(anonymous namespace)::createInferRuntime(nvinfer1::ILogger&) (NvInferRuntime.h:1906)
==132002== by 0xFF02A191: onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider(onnxruntime::TensorrtExecutionProviderInfo const&) (tensorrt_execution_provider.cc:226)
==132002== by 0xFF074DBF: std::_MakeUniqonnxruntime::TensorrtExecutionProvider::__single_object std::make_unique<onnxruntime::TensorrtExecutionProvider, onnxruntime::TensorrtExecutionProviderInfo&>(onnxruntime::TensorrtExecutionProviderInfo&) (unique_ptr.h:857)
==132002== by 0xFF074C66: onnxruntime::TensorrtProviderFactory::CreateProvider() (tensorrt_provider_factory.cc:28)
==132002== by 0x6740220: onnxruntime::IExecutionProviderFactory_Translator::CreateProvider() (provider_bridge_ort.cc:709)
==132002== by 0x590B690: (anonymous namespace)::InitializeSession(OrtSessionOptions const*, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (onnxruntime_c_api.cc:457)
==132002== by 0x590BCC0: OrtApis::CreateSessionFromArray(OrtEnv const*, void const*, unsigned long, OrtSessionOptions const*, OrtSession**) (onnxruntime_c_api.cc:508)

==132002== 72 (48 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 3,516 of 5,620
==132002== at 0x483CFE3: operator new(unsigned long) (vg_replace_malloc.c:417)
==132002== by 0x10C5E5BAB: createInferRuntime_INTERNAL (in /usr/lib/libnvinfer.so.7.1.3)
==132002== by 0xFF027C51: nvinfer1::(anonymous namespace)::createInferRuntime(nvinfer1::ILogger&) (NvInferRuntime.h:1906)
==132002== by 0xFF02A191: onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider(onnxruntime::TensorrtExecutionProviderInfo const&) (tensorrt_execution_provider.cc:226)
==132002== by 0xFF074DBF: std::_MakeUniqonnxruntime::TensorrtExecutionProvider::__single_object std::make_unique<onnxruntime::TensorrtExecutionProvider, onnxruntime::TensorrtExecutionProviderInfo&>(onnxruntime::TensorrtExecutionProviderInfo&) (unique_ptr.h:857)
==132002== by 0xFF074C66: onnxruntime::TensorrtProviderFactory::CreateProvider() (tensorrt_provider_factory.cc:28)
==132002== by 0x6740220: onnxruntime::IExecutionProviderFactory_Translator::CreateProvider() (provider_bridge_ort.cc:709)
==132002== by 0x590B690: (anonymous namespace)::InitializeSession(OrtSessionOptions const*, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (onnxruntime_c_api.cc:457)
==132002== by 0x590BCC0: OrtApis::CreateSessionFromArray(OrtEnv const*, void const*, unsigned long, OrtSessionOptions const*, OrtSession**) (onnxruntime_c_api.cc:508)

==132002== Conditional jump or move depends on uninitialised value(s)
==132002== at 0xD4347DC7: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4337142: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4208565: cudnnReduceTensor (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0x5AE3227: onnxruntime::common::Status onnxruntime::cuda::ReduceComputeCore<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::CUDAExecutionProvider&, onnxruntime::Tensor const&, onnxruntime::cuda::PrepareReduceMetadata&, onnxruntime::Tensor&, cudnnReduceTensorOp_t, std::vector<long, std::allocator > const&, bool, bool, bool, bool, onnxruntime::TensorShape const*) (reduction_ops.cc:597)
==132002== by 0x5ADECF9: onnxruntime::common::Status onnxruntime::cuda::ReduceKernel::ComputeImpl<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::OpKernelContext*, cudnnReduceTensorOp_t) const (reduction_ops.cc:636)
==132002== by 0x5ADDFF5: onnxruntime::cuda::ArgMax::ComputeInternal(onnxruntime::OpKernelContext*) const (reduction_ops.h:97)
==132002== by 0x59F20A0: onnxruntime::cuda::CudaKernel::Compute(onnxruntime::OpKernelContext*) const (cuda_common.h:66)
==132002== by 0x67A90D3: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (sequential_executor.cc:305)

==132002== Conditional jump or move depends on uninitialised value(s)
==132002== at 0xD4347D20: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4337142: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4208565: cudnnReduceTensor (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0x5AE3227: onnxruntime::common::Status onnxruntime::cuda::ReduceComputeCore<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::CUDAExecutionProvider&, onnxruntime::Tensor const&, onnxruntime::cuda::PrepareReduceMetadata&, onnxruntime::Tensor&, cudnnReduceTensorOp_t, std::vector<long, std::allocator > const&, bool, bool, bool, bool, onnxruntime::TensorShape const*) (reduction_ops.cc:597)
==132002== by 0x5ADECF9: onnxruntime::common::Status onnxruntime::cuda::ReduceKernel::ComputeImpl<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::OpKernelContext*, cudnnReduceTensorOp_t) const (reduction_ops.cc:636)
==132002== by 0x5ADDFF5: onnxruntime::cuda::ArgMax::ComputeInternal(onnxruntime::OpKernelContext*) const (reduction_ops.h:97)
==132002== by 0x59F20A0: onnxruntime::cuda::CudaKernel::Compute(onnxruntime::OpKernelContext*) const (cuda_common.h:66)
==132002== by 0x67A90D3: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (sequential_executor.cc:305)
==132002== by 0x678EE21: onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (utils.cc:492)
==132002== by 0x678F277: onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (utils.cc:516)
==132002== by 0x598BB3B: onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<OrtValue, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) (inference_session.cc:1464)

==132002== Conditional jump or move depends on uninitialised value(s)
==132002== at 0xD4347B1F: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4337142: ??? (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0xD4208565: cudnnReduceTensor (in /usr/lib/libcudnn_ops_infer.so.8.0.4)
==132002== by 0x5AE3227: onnxruntime::common::Status onnxruntime::cuda::ReduceComputeCore<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::CUDAExecutionProvider&, onnxruntime::Tensor const&, onnxruntime::cuda::PrepareReduceMetadata&, onnxruntime::Tensor&, cudnnReduceTensorOp_t, std::vector<long, std::allocator > const&, bool, bool, bool, bool, onnxruntime::TensorShape const*) (reduction_ops.cc:597)
==132002== by 0x5ADECF9: onnxruntime::common::Status onnxruntime::cuda::ReduceKernel::ComputeImpl<float, (cudnnReduceTensorIndices_t)1>(onnxruntime::OpKernelContext*, cudnnReduceTensorOp_t) const (reduction_ops.cc:636)
==132002== by 0x5ADDFF5: onnxruntime::cuda::ArgMax::ComputeInternal(onnxruntime::OpKernelContext*) const (reduction_ops.h:97)
==132002== by 0x59F20A0: onnxruntime::cuda::CudaKernel::Compute(onnxruntime::OpKernelContext*) const (cuda_common.h:66)
==132002== by 0x67A90D3: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (sequential_executor.cc:305)

==132002== Mismatched free() / delete / delete []
==132002== at 0x484065D: operator delete (vg_replace_malloc.c:938)
==132002== by 0x673D0B2: onnxruntime::ProviderHostImpl::HeapFree(void*) (provider_bridge_ort.cc:296)
==132002== by 0xFF01B3F5: operator delete(void*) (provider_bridge_provider.cc:49)
==132002== by 0xFF2BCC2F: std::experimental::filesystem::v1::__cxx11::path::_Cmpt& std::vector<std::experimental::filesystem::v1::__cxx11::path::_Cmpt, std::allocatorstd::experimental::filesystem::v1::__cxx11::path::_Cmpt >::emplace_back<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::experimental::filesystem::v1::__cxx11::path::_Type, unsigned long&>(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&&, std::experimental::filesystem::v1::__cxx11::path::_Type&&, unsigned long&) (in /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_tensorrt.so)
==132002== by 0xFF2BCE35: std::experimental::filesystem::v1::__cxx11::path::_M_add_filename(unsigned long, unsigned long) (in /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_tensorrt.so)
==132002== by 0xFF2BC211: std::experimental::filesystem::v1::__cxx11::path::_M_split_cmpts() (in /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_tensorrt.so)
==132002== by 0xFF02817F: std::experimental::filesystem::v1::__cxx11::path::_M_append(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (fs_path.h:437)
==132002== by 0xFF039980: std::enable_if<std::_and<std::_not<std::is_same<std::remove_cv<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::type, std::experimental::filesystem::v1::__cxx11::path> >, std::_not<std::is_void<std::remove_pointer<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::type> >, std::experimental::filesystem::v1::__cxx11::path::__constructible_from<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, void> >::value, std::experimental::filesystem::v1::__cxx11::path>::type& std::experimental::filesystem::v1::__cxx11::path::append<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (fs_path.h:273)
==132002== by 0xFF02833D: (anonymous namespace)::GetEnginePath(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (tensorrt_execution_provider.cc:41)
==132002== by 0xFF032B0E: onnxruntime::TensorrtExecutionProvider::Provider_Compile(std::vector<onnxruntime::Provider_Node*, std::allocatoronnxruntime::Provider_Node* > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const (tensorrt_execution_provider.cc:1052)
==132002== by 0xFF055D62: std::_Function_handler<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::Provider_Compile(std::vector<onnxruntime::Provider_Node*, std::allocatoronnxruntime::Provider_Node* > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}>::_M_invoke(std::_Any_data const&, void*&&, OrtApi const*&&, OrtKernelContext*&&) (std_function.h:286)
==132002== by 0x6718F60: std::function<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*)>::operator()(void*, OrtApi const*, OrtKernelContext*) const (std_function.h:688)
==132002== Address 0x1305056e0 is 0 bytes inside a block of size 95 alloc'd
==132002== at 0x483CFE3: operator new(unsigned long) (vg_replace_malloc.c:417)
==132002== by 0xFF0563E2: void std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) (basic_string.tcc:219)
==132002== by 0xFF2BCE20: std::experimental::filesystem::v1::__cxx11::path::_M_add_filename(unsigned long, unsigned long) (in /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_tensorrt.so)
==132002== by 0xFF2BC211: std::experimental::filesystem::v1::__cxx11::path::_M_split_cmpts() (in /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_tensorrt.so)
==132002== by 0xFF02817F: std::experimental::filesystem::v1::__cxx11::path::_M_append(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (fs_path.h:437)
==132002== by 0xFF039980: std::enable_if<std::_and<std::_not<std::is_same<std::remove_cv<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::type, std::experimental::filesystem::v1::__cxx11::path> >, std::_not<std::is_void<std::remove_pointer<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::type> >, std::experimental::filesystem::v1::__cxx11::path::__constructible_from<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, void> >::value, std::experimental::filesystem::v1::__cxx11::path>::type& std::experimental::filesystem::v1::__cxx11::path::append<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (fs_path.h:273)
==132002== by 0xFF02833D: (anonymous namespace)::GetEnginePath(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (tensorrt_execution_provider.cc:41)
==132002== by 0xFF032B0E: onnxruntime::TensorrtExecutionProvider::Provider_Compile(std::vector<onnxruntime::Provider_Node*, std::allocatoronnxruntime::Provider_Node* > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const (tensorrt_execution_provider.cc:1052)
==132002== by 0xFF055D62: std::_Function_handler<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::Provider_Compile(std::vector<onnxruntime::Provider_Node*, std::allocatoronnxruntime::Provider_Node* > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}>::_M_invoke(std::_Any_data const&, void*&&, OrtApi const*&&, OrtKernelContext*&&) (std_function.h:286)
==132002== by 0x6718F60: std::function<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*)>::operator()(void*, OrtApi const*, OrtKernelContext*) const (std_function.h:688)
==132002== by 0x6716B54: onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const (func_kernel.h:41)
==132002== by 0x67A90D3: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (sequential_executor.cc:305)

==132002== Mismatched free() / delete / delete []
==132002== at 0x483F651: operator delete(void*) (vg_replace_malloc.c:802)
==132002== by 0x59EAA8D: __gnu_cxx::new_allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > >::deallocate(std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >, unsigned long) (new_allocator.h:128)
==132002== by 0x59E70D7: std::allocator_traits<std::allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > > >::deallocate(std::allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > >&, std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >
, unsigned long) (alloc_traits.h:470)
==132002== by 0x59E326D: std::_Vector_base<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >, std::allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > > >::_M_deallocate(std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >, unsigned long) (stl_vector.h:351)
==132002== by 0x59DF6AD: std::_Vector_base<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >, std::allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > > >::~_Vector_base() (stl_vector.h:332)
==132002== by 0x59DF702: std::vector<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability >, std::allocator<std::unique_ptr<onnxruntime::ComputeCapability, std::default_deleteonnxruntime::ComputeCapability > > >::~vector() (stl_vector.h:680)
==132002== by 0x67180E9: onnxruntime::GraphPartitioner::Partition(onnxruntime::Graph&, bool, onnxruntime::FuncManager&) const (graph_partitioner.cc:179)
==132002== by 0x5985600: onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph&, onnxruntime::GraphTransformerManager const&, onnxruntime::ExecutionProviders const&, onnxruntime::KernelRegistryManager&, onnxruntime::InsertCastTransformer const&, onnxruntime::SessionState&) (inference_session.cc:803)
==132002== by 0x59885EB: onnxruntime::InferenceSession::Initialize() (inference_session.cc:1116)
==132002== by 0x590B7EF: (anonymous namespace)::InitializeSession(OrtSessionOptions const
, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (onnxruntime_c_api.cc:469)
==132002== by 0x590BCC0: OrtApis::CreateSessionFromArray(OrtEnv const*, void const*, unsigned long, OrtSessionOptions const*, OrtSession**) (onnxruntime_c_api.cc:508)

@SherlockNoMad SherlockNoMad added ep:TensorRT issues related to TensorRT execution provider type:bug labels Apr 8, 2021
@stevenlix
Copy link
Contributor

stevenlix commented Apr 8, 2021

do you mind pointing out which model you are using? I tried faster-rcnn model, which has dynamic shape and many partitions, but haven't seen definitely/indirectly lost in valgrind.

@matthill
Copy link
Contributor Author

I am able to reproduce using the C sample app and Squeezenet (files attached).

My TensorRT provider is linked as follows:
linux-vdso.so.1 (0x00007ffc337e0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc8d4050000)
libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fc8d3dd2000)
libonnxruntime_providers_shared.so => /storage/projects/alpr/modules/libonnxruntime/build/Linux/Debug/libonnxruntime_providers_shared.so (0x00007fc8d3dcd000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc8d3daa000)
libnvinfer.so.7 => /usr/lib/libnvinfer.so.7 (0x00007fc8baead000)
libnvinfer_plugin.so.7 => /usr/lib/libnvinfer_plugin.so.7 (0x00007fc8ba419000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc8ba238000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc8ba0e9000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc8ba0ce000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8b9edc000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc8d44c5000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc8b9ed1000)
libcublas.so.11 => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fc8b407f000)
libcudnn.so.8 => /usr/lib/libcudnn.so.8 (0x00007fc8b3e56000)
libmyelin.so.1 => /usr/lib/libmyelin.so.1 (0x00007fc8b362f000)
libnvrtc.so.11.0 => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libnvrtc.so.11.0 (0x00007fc8b1e44000)
libcublasLt.so.11 => /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fc8a6cb5000)

onnxrt_trt_memsample.tar.gz

To reproduce:
mkdir build && cd build
wget https://github.com/microsoft/onnxruntime/raw/master/nodejs/test/testdata/squeezenet.onnx
cmake ..
make -j8
ASAN_OPTIONS=protect_shadow_gap=0 ./onnx_memtest

Output shows the delete/delete[] mismatch:

Using Onnxruntime C API

==1505436==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x6030007811b0
#0 0x7f77b02d98df in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x1108df)
#1 0x7f77a78ded5f in __gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::deallocate(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
#2 0x7f77a78d67b8 in std::allocator_traits<std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::deallocate(std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >
, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
#3 0x7f77a78ca3eb in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::_M_deallocate(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long) /usr/include/c++/9/bits/stl_vector.h:351
#4 0x7f77a78c14c1 in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::~_Vector_base() /usr/include/c++/9/bits/stl_vector.h:332
#5 0x7f77a78c1516 in std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::~vector() /usr/include/c++/9/bits/stl_vector.h:680
#6 0x7f77a799b279 in onnxruntime::IndexedSubGraph::MetaDef::~MetaDef() /storage/projects/alpr/modules/libonnxruntime/include/onnxruntime/core/graph/indexed_sub_graph.h:27
#7 0x7f77a799b2cf in std::default_deleteonnxruntime::IndexedSubGraph::MetaDef::operator()(onnxruntime::IndexedSubGraph::MetaDef
) const /usr/include/c++/9/bits/unique_ptr.h:81
#8 0x7f77a799aa69 in std::unique_ptr<onnxruntime::IndexedSubGraph::MetaDef, std::default_deleteonnxruntime::IndexedSubGraph::MetaDef >::~unique_ptr() /usr/include/c++/9/bits/unique_ptr.h:292
#9 0x7f77a79979d1 in onnxruntime::IndexedSubGraph::~IndexedSubGraph() /storage/projects/alpr/modules/libonnxruntime/include/onnxruntime/core/graph/indexed_sub_graph.h:26
#10 0x7f77a8840795 in onnxruntime::ProviderHostImpl::IndexedSubGraph__operator_delete(onnxruntime::IndexedSubGraph*) /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/provider_bridge_ort.cc:392
#11 0x7f778823b78c in onnxruntime::IndexedSubGraph::operator delete(void*) /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/providers/shared_library/provider_interfaces.h:755
#12 0x7f778825d53c in std::default_deleteonnxruntime::IndexedSubGraph::operator()(onnxruntime::IndexedSubGraph*) const /usr/include/c++/9/bits/unique_ptr.h:81
#13 0x7f778825487f in std::unique_ptr<onnxruntime::IndexedSubGraph, std::default_deleteonnxruntime::IndexedSubGraph >::~unique_ptr() /usr/include/c++/9/bits/unique_ptr.h:292
#14 0x7f7788249279 in onnxruntime::TensorrtExecutionProvider::RemoveTensorRTGraphCycles(std::vector<std::pair<std::vector<unsigned long, std::allocator >, bool>, std::allocator<std::pair<std::vector<unsigned long, std::allocator >, bool> > >&, onnxruntime::GraphViewer const&) const (/usr/lib/libonnxruntime_providers_tensorrt.so+0x36279)
#15 0x7f778824a299 in onnxruntime::TensorrtExecutionProvider::GetCapability(onnxruntime::GraphViewer const&, std::vector<onnxruntime::KernelRegistry const*, std::allocator<onnxruntime::KernelRegistry const*> > const&) const (/usr/lib/libonnxruntime_providers_tensorrt.so+0x37299)
#16 0x7f77a88139fe in PartitionOnnxFormatModelImpl /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:201
#17 0x7f77a88152d3 in onnxruntime::GraphPartitioner::PartitionOnnxFormatModel(onnxruntime::Graph&, bool, onnxruntime::FuncManager&, onnxruntime::KernelRegistry&, onnxruntime::GraphPartitioner::Mode, int&) const /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:373
#18 0x7f77a8816951 in onnxruntime::GraphPartitioner::Partition(onnxruntime::Graph&, bool, onnxruntime::FuncManager&, onnxruntime::GraphPartitioner::Mode, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >) const /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:539
#19 0x7f77a7938aab in onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph&, onnxruntime::GraphTransformerManager const&, onnxruntime::ExecutionProviders const&, onnxruntime::KernelRegistryManager&, onnxruntime::InsertCastTransformer const&, onnxruntime::SessionState&, bool) /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/session/inference_session.cc:853
#20 0x7f77a793bd9c in onnxruntime::InferenceSession::Initialize() /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/session/inference_session.cc:1207
#21 0x7f77a78b9a35 in (anonymous namespace)::InitializeSession(OrtSessionOptions const
, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (/usr/lib/libonnxruntime.so.1+0x85a35)
#22 0x7f77a78b9cbf in OrtApis::CreateSession(OrtEnv const*, char const*, OrtSessionOptions const*, OrtSession**) (/usr/lib/libonnxruntime.so.1+0x85cbf)
#23 0x55fad9b50af6 in main /tmp/onnxrt_trt_memsample/main.cpp:63
#24 0x7f77a746d0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#25 0x55fad9b5044d in _start (/tmp/onnxrt_trt_memsample/build/onnx_memtest+0x644d)

0x6030007811b0 is located 0 bytes inside of 32-byte region [0x6030007811b0,0x6030007811d0)
allocated by thread T0 here:
#0 0x7f77b02d8b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
#1 0x7f77a883f1d7 in onnxruntime::ProviderHostImpl::HeapAllocate(unsigned long) /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/provider_bridge_ort.cc:191
#2 0x7f778822f469 in operator new(unsigned long) (/usr/lib/libonnxruntime_providers_tensorrt.so+0x1c469)
#3 0x7f7788233fa7 in __gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::allocate(unsigned long, void const*) /usr/include/c++/9/ext/new_allocator.h:114
#4 0x7f7788233706 in std::allocator_traits<std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::allocate(std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >&, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:444
#5 0x7f7788232e3d in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::_M_allocate(unsigned long) /usr/include/c++/9/bits/stl_vector.h:343
#6 0x7f7788263712 in void std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /usr/include/c++/9/bits/vector.tcc:440
#7 0x7f77882586f3 in std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::push_back(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /usr/include/c++/9/bits/stl_vector.h:1195
#8 0x7f7788246aed in onnxruntime::TensorrtExecutionProvider::GetSubGraph(std::pair<std::vector<unsigned long, std::allocator >, bool>, onnxruntime::GraphViewer const&) const /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:681
#9 0x7f7788248f23 in onnxruntime::TensorrtExecutionProvider::RemoveTensorRTGraphCycles(std::vector<std::pair<std::vector<unsigned long, std::allocator >, bool>, std::allocator<std::pair<std::vector<unsigned long, std::allocator >, bool> > >&, onnxruntime::GraphViewer const&) const (/usr/lib/libonnxruntime_providers_tensorrt.so+0x35f23)
#10 0x7f778824a299 in onnxruntime::TensorrtExecutionProvider::GetCapability(onnxruntime::GraphViewer const&, std::vector<onnxruntime::KernelRegistry const
, std::allocator<onnxruntime::KernelRegistry const*> > const&) const (/usr/lib/libonnxruntime_providers_tensorrt.so+0x37299)
#11 0x7f77a88139fe in PartitionOnnxFormatModelImpl /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:201
#12 0x7f77a88152d3 in onnxruntime::GraphPartitioner::PartitionOnnxFormatModel(onnxruntime::Graph&, bool, onnxruntime::FuncManager&, onnxruntime::KernelRegistry&, onnxruntime::GraphPartitioner::Mode, int&) const /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:373
#13 0x7f77a8816951 in onnxruntime::GraphPartitioner::Partition(onnxruntime::Graph&, bool, onnxruntime::FuncManager&, onnxruntime::GraphPartitioner::Mode, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >) const /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/framework/graph_partitioner.cc:539
#14 0x7f77a7938aab in onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph&, onnxruntime::GraphTransformerManager const&, onnxruntime::ExecutionProviders const&, onnxruntime::KernelRegistryManager&, onnxruntime::InsertCastTransformer const&, onnxruntime::SessionState&, bool) /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/session/inference_session.cc:853
#15 0x7f77a793bd9c in onnxruntime::InferenceSession::Initialize() /storage/projects/alpr/modules/libonnxruntime/onnxruntime/core/session/inference_session.cc:1207
#16 0x7f77a78b9a35 in (anonymous namespace)::InitializeSession(OrtSessionOptions const
, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (/usr/lib/libonnxruntime.so.1+0x85a35)
#17 0x7f77a78b9cbf in OrtApis::CreateSession(OrtEnv const*, char const*, OrtSessionOptions const*, OrtSession**) (/usr/lib/libonnxruntime.so.1+0x85cbf)
#18 0x55fad9b50af6 in main /tmp/onnxrt_trt_memsample/main.cpp:63
#19 0x7f77a746d0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

SUMMARY: AddressSanitizer: alloc-dealloc-mismatch (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x1108df) in operator delete(void*)
==1505436==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
==1505436==ABORTING

@chilo-ms
Copy link
Contributor

Hi Matthill, thanks for sharing the C sample code and I can reproduce the same result using Address Sanitizer.
But for valgrind, I can't reproduce the same result as you did. I was wondering which Onnxruntime version were you using?
Also, did valgrind run with the same C sample code you shared? (i.e. valgrind --leak-check=full --show-reachable=no --show-possibly-lost=no ./onnx_memtest)

We had a fix for TRT memory leak recently and probably it solves some part of the memory leaks. But of course we want to handle all memory leak issues.

@faxu faxu removed the type:bug label Aug 18, 2021
@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 19, 2022
@nicktasios
Copy link

I'm having the same issue. I was getting weird results from my network and pinpointed it to the onnxruntime usage of TensorRT. Then I found this issue and tried running with a fixed batch size, which fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

6 participants