Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Crash running parquet reader benchmarks. #13229

Closed
nvdbaranec opened this issue Apr 26, 2023 · 6 comments · Fixed by #16787
Closed

[BUG] Crash running parquet reader benchmarks. #13229

nvdbaranec opened this issue Apr 26, 2023 · 6 comments · Fixed by #16787
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@nvdbaranec
Copy link
Contributor

The PARQUET_READER_NVBENCH crashes (segfault) at exit on some machines. It doesn't seem to happen consistently for everyone, but it tends to be reproducible once it starts happening.

To reproduce, run PARQUET_READER_NVBENCH and you should get a segfault right at the end after it has printed out all of it's results.

I've narrowed it down to something specific to the parquet_read_io_compression suite. In addition, compute-sanitizer does not turn anything up so this seems to be something purely cpu-side.

@nvdbaranec nvdbaranec added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Apr 26, 2023
@wence-
Copy link
Contributor

wence- commented Apr 27, 2023

The first benchmark there doesn't appear to be valgrind-clean which may give a hint:

==45341== Warning: set address range perms: large range [0x300200000, 0x8f41ff000) (noaccess)
Run:  [1/24] parquet_read_io_compression [Device=0 io=FILEPATH compression=SNAPPY cardinality=0 run_length=1]
==45341== Invalid read of size 4
==45341==    at 0x7CC2078: cudf::io::detail::parquet::writer::impl::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x7CC56E0: cudf::io::detail::parquet::writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x7B9F8FA: cudf::io::write_parquet(cudf::io::parquet_writer_options const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x1588E1: parquet_read_common(cudf::io::parquet_writer_options const&, cuio_source_sink_pair&, nvbench::state&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x15F8D5: void BM_parquet_read_io_compression<(cudf::io::io_type)0, (cudf::io::compression_type)2>(nvbench::state&, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >) [clone .isra.0] (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x15FACE: void nvbench::tl::detail::foreach<nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > >, nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run_device(std::optional<nvbench::device_info> const&)::{lambda(auto:1)#1}, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>, nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run_device(std::optional<nvbench::device_info> const&)::{lambda(auto:1)#1}&&) [clone .isra.0] (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x161D09: nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run() (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x161DAC: nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > >::do_run() (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x12837E: main (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==  Address 0x83739940 is 0 bytes after a block of size 272 alloc'd
==45341==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==45341==    by 0x7CC1665: cudf::io::detail::parquet::writer::impl::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x7CC56E0: cudf::io::detail::parquet::writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x7B9F8FA: cudf::io::write_parquet(cudf::io::parquet_writer_options const&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/libcudf.so)
==45341==    by 0x1588E1: parquet_read_common(cudf::io::parquet_writer_options const&, cuio_source_sink_pair&, nvbench::state&) (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x15F8D5: void BM_parquet_read_io_compression<(cudf::io::io_type)0, (cudf::io::compression_type)2>(nvbench::state&, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >) [clone .isra.0] (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x15FACE: void nvbench::tl::detail::foreach<nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > >, nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run_device(std::optional<nvbench::device_info> const&)::{lambda(auto:1)#1}, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>, nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run_device(std::optional<nvbench::device_info> const&)::{lambda(auto:1)#1}&&) [clone .isra.0] (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x161D09: nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > > >::run() (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x161DAC: nvbench::benchmark<BM_parquet_read_io_compression_line_142, nvbench::type_list<nvbench::type_list<nvbench::enum_type<(cudf::io::io_type)0, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)1, cudf::io::io_type>, nvbench::enum_type<(cudf::io::io_type)2, cudf::io::io_type> >, nvbench::type_list<nvbench::enum_type<(cudf::io::compression_type)2, cudf::io::compression_type>, nvbench::enum_type<(cudf::io::compression_type)0, cudf::io::compression_type> > > >::do_run() (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341==    by 0x12837E: main (in /home/wence/Documents/src/rapids/cudf/cpp/build/cuda-11.8.0/branch-23.06/release/benchmarks/PARQUET_READER_NVBENCH)
==45341== 
[ 45341][17:55:37:116091][warning] Running benchmarks without dropping the L3 cache; results may not reflect file IO throughput

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment and removed Needs Triage Need team to review and classify labels Jun 7, 2023
@sdrp713
Copy link
Contributor

sdrp713 commented Jul 30, 2024

I found that when setting the io type to Filepath, it is the source of the segfaults and cudaContext errors. I was wondering if the issue had some relation to kvikio, as this is the error I am seeing when running valgrind. I was also wondering if there is any way to disable using kvikio to see if that is the source of the issue?

==179002== 576 bytes in 1 blocks are possibly lost in loss record 35,124 of 35,671
==179002==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==179002==    by 0x40147D9: calloc (rtld-malloc.h:44)
==179002==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==179002==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==179002==    by 0xD80B7B4: allocate_stack (allocatestack.c:430)
==179002==    by 0xD80B7B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==179002==    by 0xD57FDF3: __gthread_create (gthr-default.h:663)
==179002==    by 0xD57FDF3: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (thread.cc:172)
==179002==    by 0x5B455AB: kvikio::defaults::defaults() (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B4CAFE: cudf::io::file_sink::file_sink(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B4D593: cudf::io::data_sink::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x58D11DE: cudf::io::(anonymous namespace)::make_datasinks(cudf::io::sink_info const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x58D1940: cudf::io::write_parquet(cudf::io::parquet_writer_options const&, rmm::cuda_stream_view) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x1583BC: BM_parquet_read_io_compression(nvbench::state&) (in /home/raprabhu/cudf/cpp/build/benchmarks/PARQUET_READER_NVBENCH)
==179002==    by 0x165398: nvbench::runner<nvbench::benchmark<BM_parquet_read_io_compression_line_300, nvbench::type_list<> > >::run() (in /home/raprabhu/cudf/cpp/build/benchmarks/PARQUET_READER_NVBENCH)
==179002==    by 0x1655EC: nvbench::benchmark<BM_parquet_read_io_compression_line_300, nvbench::type_list<> >::do_run() (in /home/raprabhu/cudf/cpp/build/benchmarks/PARQUET_READER_NVBENCH)
==179002== 
==179002== 2,560 bytes in 4 blocks are possibly lost in loss record 35,368 of 35,671
==179002==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==179002==    by 0x40147D9: calloc (rtld-malloc.h:44)
==179002==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==179002==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==179002==    by 0xD80B7B4: allocate_stack (allocatestack.c:430)
==179002==    by 0xD80B7B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==179002==    by 0x57D42149: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D4521C: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D31280: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D54E4D: cuFileDriverOpen (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x5B42C03: kvikio::cuFileAPI::cuFileAPI() (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B45766: kvikio::defaults::defaults() (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B4CAFE: cudf::io::file_sink::file_sink(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B4D593: cudf::io::data_sink::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x58D11DE: cudf::io::(anonymous namespace)::make_datasinks(cudf::io::sink_info const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002== 12,288 bytes in 1 blocks are possibly lost in loss record 35,487 of 35,671
==179002==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==179002==    by 0x4013DD9: malloc (rtld-malloc.h:56)
==179002==    by 0x4013DD9: allocate_dtv_entry (dl-tls.c:696)
==179002==    by 0x4013DD9: allocate_and_init (dl-tls.c:709)
==179002==    by 0x4013DD9: tls_get_addr_tail (dl-tls.c:907)
==179002==    by 0x401820B: __tls_get_addr (tls_get_addr.S:55)
==179002==    by 0x57D3595F: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57DF2C99: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D9089A: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D352A2: ??? (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x57D55582: cuFileWrite (in /home/raprabhu/miniconda3/envs/cudf_dev/targets/x86_64-linux/lib/libcufile.so.1.7.2)
==179002==    by 0x5B4B1CD: kvikio::FileHandle::write(void const*, unsigned long, unsigned long, unsigned long) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B4C184: std::_Function_handler<void (), BS::thread_pool::submit_task<kvikio::detail::submit_task<kvikio::FileHandle::pwrite(void const*, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda(void const*, unsigned long, unsigned long, unsigned long)#3}, void*>(kvikio::FileHandle::pwrite(void const*, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda(void const*, unsigned long, unsigned long, unsigned long)#3}, void*, unsigned long, unsigned long, unsigned long)::{lambda()#1}, unsigned long>(kvikio::detail::submit_task<kvikio::FileHandle::pwrite(void const*, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda(void const*, unsigned long, unsigned long, unsigned long)#3}, void*>(kvikio::FileHandle::pwrite(void const*, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda(void const*, unsigned long, unsigned long, unsigned long)#3}, void*, unsigned long, unsigned long, unsigned long)::{lambda()#1}&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0x5B38DC6: BS::thread_pool::worker(unsigned int, std::function<void ()> const&) (in /home/raprabhu/miniconda3/envs/cudf_dev/lib/libcudf.so)
==179002==    by 0xD57FE94: execute_native_thread_routine (thread.cc:104)

@GregoryKimball
Copy link
Contributor

Thank you @sdrp713 for looking into this segfault and sharing some valgrind results. You might try setting LIBCUDF_CUFILE_POLICY=OFF (more info here). Also @madsbk would you please let us know if you have any ideas?

@madsbk
Copy link
Member

madsbk commented Aug 6, 2024

Or try LIBCUDF_CUFILE_POLICY=KVIKIO KVIKIO_COMPAT_MODE=on, which make KvikIO skip the use of libcufile.so.

However, I think the valgrind outputs looks like innocent memory leaks at exit? Not something to trigger segfaults?

@sdrp713
Copy link
Contributor

sdrp713 commented Aug 6, 2024

Ok, I tried setting LIBCUDF_CUFILE_POLICY=OFF and the seg fault was gone; however, when I set LIBCUDF_CUFILE_POLICY=KVIKIO KVIKIO_COMPAT_MODE=on, the seg fault came back

@madsbk
Copy link
Member

madsbk commented Aug 7, 2024

I am not able to reproduce :/
I get some memory leaks but no access errors:

$ LIBCUDF_CUFILE_POLICY=KVIKIO KVIKIO_COMPAT_MODE=on valgrind cpp/build/benchmarks/PARQUET_READER_NVBENCH

... 

==235376== Warning: set address range perms: large range [0x106000000, 0x118000000) (noaccess)
==235376== 
==235376== HEAP SUMMARY:
==235376==     in use at exit: 2,271,605,821 bytes in 650,420 blocks
==235376==   total heap usage: 62,614,049 allocs, 61,963,629 frees, 77,697,641,434 bytes allocated
==235376== 
==235376== LEAK SUMMARY:
==235376==    definitely lost: 40 bytes in 1 blocks
==235376==    indirectly lost: 1,536 bytes in 33 blocks
==235376==      possibly lost: 446,282,686 bytes in 3,312 blocks
==235376==    still reachable: 1,825,321,559 bytes in 647,074 blocks
==235376==         suppressed: 0 bytes in 0 blocks
==235376== Rerun with --leak-check=full to see details of leaked memory
==235376== 
==235376== For lists of detected and suppressed errors, rerun with: -s
==235376== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Can you try running until it segfaults with valgrind?

@rapids-bot rapids-bot bot closed this as completed in 8e1345f Sep 19, 2024
rjzamora pushed a commit to rjzamora/cudf that referenced this issue Sep 24, 2024
rapidsai#16787)

The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce:

```
./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1
```
 
The root cause is that some (1) `thread_local`  objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.

closes rapidsai#13229

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#16787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants