Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks in Taichi AOT runtime (GFX/Vulkan) #6448

Open
k-ye opened this issue Oct 27, 2022 · 1 comment
Open

Memory leaks in Taichi AOT runtime (GFX/Vulkan) #6448

k-ye opened this issue Oct 27, 2022 · 1 comment
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@k-ye
Copy link
Member

k-ye commented Oct 27, 2022

Describe the bug

I've run Taichi AOT with valgrind, which reported memory leaks in several places:

You can check with valgrind with this cmd line:

valgrind --leak-check=full --log-file='mem-leak-test.txt' --track-origins=yes -v ${CMD} ${ARGS}

==3169151== 868,128 (288 direct, 867,840 indirect) bytes in 1 blocks are definitely lost in loss record 3,570 of 3,575
==3169151==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3169151==    by 0x525E254: taichi::lang::gfx::GfxRuntime::register_taichi_kernel(taichi::lang::gfx::GfxRuntime::RegisterParams) (in /libtaichi_c_api.so)
==3169151==    by 0x4D6E8C8: taichi::lang::gfx::KernelImpl::KernelImpl(taichi::lang::gfx::GfxRuntime*, taichi::lang::gfx::GfxRuntime::RegisterParams&&) (in /libtaichi_c_api.so)
==3169151==    by 0x52788A3: taichi::lang::gfx::(anonymous namespace)::AotModuleImpl::make_new_kernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /libtaichi_c_api.so)
==3169151==    by 0x4B3A193: taichi::lang::aot::Module::get_kernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /libtaichi_c_api.so)
==3169151==    by 0x5277D09: taichi::lang::gfx::(anonymous namespace)::AotModuleImpl::get_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /libtaichi_c_api.so)
==3169151==    by 0x4AE2A0E: AotModule::get_cgraph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /libtaichi_c_api.so)
==3169151==    by 0x4AE4B62: ti_get_aot_module_compute_graph (in /libtaichi_c_api.so)

==3169151== 109,720 (104 direct, 109,616 indirect) bytes in 1 blocks are definitely lost in loss record 3,556 of 3,575
==3169151==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3169151==    by 0x52A3574: std::_Hashtable<unsigned int, std::pair<unsigned int const, taichi::lang::vulkan::VulkanResourceBinder::Set>, std::allocator<std::pair<unsigned int const, taichi::lang::vulkan::VulkanResourceBinder::Set> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_rehash_aux(unsigned long, std::integral_constant<bool, true>) (in /libtaichi_c_api.so)
==3169151==    by 0x52A341B: std::_Hashtable<unsigned int, std::pair<unsigned int const, taichi::lang::vulkan::VulkanResourceBinder::Set>, std::allocator<std::pair<unsigned int const, taichi::lang::vulkan::VulkanResourceBinder::Set> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned int const, taichi::lang::vulkan::VulkanResourceBinder::Set>, false>*, unsigned long) (in /libtaichi_c_api.so)
==3169151==    by 0x528EADE: taichi::lang::vulkan::VulkanResourceBinder::buffer(unsigned int, unsigned int, taichi::lang::DevicePtr, unsigned long) (in /libtaichi_c_api.so)
==3169151==    by 0x528B25E: taichi::lang::vulkan::VulkanPipeline::create_descriptor_set_layout(taichi::lang::vulkan::VulkanPipeline::Params const&) (in /libtaichi_c_api.so)
==3169151==    by 0x528AE6D: taichi::lang::vulkan::VulkanPipeline::VulkanPipeline(taichi::lang::vulkan::VulkanPipeline::Params const&) (in /libtaichi_c_api.so)
==3169151==    by 0x529A3D5: taichi::lang::vulkan::VulkanDevice::create_pipeline(taichi::lang::PipelineSourceDesc const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /libtaichi_c_api.so)
==3169151==    by 0x525D1EB: taichi::lang::gfx::CompiledTaichiKernel::CompiledTaichiKernel(taichi::lang::gfx::CompiledTaichiKernel::Params const&) (in /libtaichi_c_api.so)

==3169151== 20,020 (272 direct, 19,748 indirect) bytes in 1 blocks are definitely lost in loss record 3,533 of 3,575
==3169151==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3169151==    by 0x1739F0FC: ???
==3169151==    by 0x177D2B26: ???
==3169151==    by 0x177F15D6: ???
==3169151==    by 0x177D9094: ???
==3169151==    by 0x177D91AF: ???
==3169151==    by 0x52858FD: vkapi::create_compute_pipeline(VkDevice_T*, unsigned int, VkPipelineShaderStageCreateInfo&, std::shared_ptr<vkapi::DeviceObjVkPipelineLayout>, std::shared_ptr<vkapi::DeviceObjVkPipelineCache>, std::shared_ptr<vkapi::DeviceObjVkPipeline>) (in /libtaichi_c_api.so)
==3169151==    by 0x528C70D: taichi::lang::vulkan::VulkanPipeline::create_compute_pipeline(taichi::lang::vulkan::VulkanPipeline::Params const&) (in libtaichi_c_api.so)
==3169151==    by 0x528AE8B: taichi::lang::vulkan::VulkanPipeline::VulkanPipeline(taichi::lang::vulkan::VulkanPipeline::Params const&) (in libtaichi_c_api.so)
@k-ye k-ye added the potential bug Something that looks like a bug but not yet confirmed label Oct 27, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Oct 27, 2022
k-ye added a commit that referenced this issue Oct 27, 2022
Issue: #6448

### Brief Summary

We need to explicitly destroy the SPIRV module
@neozhaoliang neozhaoliang moved this from Untriaged to In Progress in Taichi Lang Oct 28, 2022
@PENGUINLIONG PENGUINLIONG moved this from In Progress to Todo in Taichi Lang Nov 18, 2022
@jim19930609
Copy link
Contributor

jim19930609 commented Nov 25, 2022

Tried the command valgrind --leak-check=full --log-file='mem-leak-test.txt' --track-origins=yes -v ${CMD} ${ARGS} on taichi-aot-demo. However, both tutorial and mpm88 does not seem to observe the memory leak mentioned above.

Does notice a mem-leak in C-API, but that's more related to Vulkan implementation of vkCmdBindPipeline()

==2628974== 5,496 bytes in 1 blocks are definitely lost in loss record 1,569 of 1,587
==2628974==    at 0x484147B: calloc (vg_replace_malloc.c:1340)
==2628974==    by 0x8D4D58F: ??? 
==2628974==    by 0x90F9DB1: ??? 
==2628974==    by 0x90FB8C4: ??? 
==2628974==    by 0x91096BE: ??? 
==2628974==    by 0x54D75A0: taichi::lang::vulkan::VulkanCommandList::bind_pipeline(taichi::lang::Pipeline*) (taichi/rhi/vulkan/vulkan_device.cpp:831)
==2628974==    by 0x54744CA: taichi::lang::gfx::GfxRuntime::launch_kernel(taichi::lang::gfx::GfxRuntime::KernelHandle, taichi::lang::RuntimeContext*) (taichi/runtime/gfx/runtime.cpp:508)
==2628974==    by 0x54ADF49: taichi::lang::gfx::KernelImpl::launch(taichi::lang::RuntimeContext*) (taichi/runtime/gfx/aot_graph_data.h:14)
==2628974==    by 0x4ED1582: ti_launch_kernel (c_api/src/taichi_core_impl.cpp:607)
==2628974==    by 0x404030: launch (taichi4/_skbuild/linux-x86_64-3.8/cmake-install/c_api/include/taichi/cpp/taichi.hpp:533)
==2628974==    by 0x404030: launch (taichi4/_skbuild/linux-x86_64-3.8/cmake-install/c_api/include/taichi/cpp/taichi.hpp:536)
==2628974==    by 0x404030: App0_tutorial::run() (0_tutorial_kernel/app.cpp:42)
==2628974==    by 0x40387C: main (0_tutorial_kernel/app.cpp:68)

To further investigate the memory leak problem, we'll probably have to separate out a minimal AOT run from taco.

mpm88_mem_leak_test.txt
tutorial_mem_leak_test.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
Status: Todo
Development

No branches or pull requests

3 participants