Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memcheck error in ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22 #11916

Closed
davidwendt opened this issue Oct 13, 2022 · 2 comments · Fixed by #11962
Closed

[BUG] Memcheck error in ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22 #11916

davidwendt opened this issue Oct 13, 2022 · 2 comments · Fixed by #11962
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@davidwendt
Copy link
Contributor

Found an out-of-bounds memory write in the PARQUET_TEST at ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22
The error does not occur for any other sizes 1,24 -- only for 22.

Partial output from compute-sanitizer:

========= COMPUTE-SANITIZER
Note: Google Test filter = ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ParquetDictionaryTest/ParquetSizedTest
[ RUN      ] ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22
========= Invalid __global__ write of size 1 bytes
=========     at 0xdfb0 in void cudf::io::parquet::gpu::gpuEncodePages<(int)128>(cudf::device_span<cudf::io::parquet::gpu::EncPage, (unsigned long)18446744073709551615>, cudf::device_span<cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>)
=========     by thread (92,0,0) in block (158,0,0)
=========     Address 0x7fd632539b50 is out of bounds
=========     and is 1 bytes after the nearest allocation at 0x7fd62c000000 of size 106142544 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x305c18]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x1402c]
=========                in /conda/envs/rapids/lib/libcudart.so.11.0
=========     Host Frame:cudaLaunchKernel [0x68808]
=========                in /conda/envs/rapids/lib/libcudart.so.11.0
=========     Host Frame:cudf::io::parquet::gpu::EncodePages(cudf::device_span<cudf::io::parquet::gpu::EncPage, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul>, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul>, 18446744073709551615ul>, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>, rmm::cuda_stream_view) [0x17117d3]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::encode_pages(cudf::detail::hostdevice_2dvector<cudf::io::parquet::gpu::EncColumnChunk>&, cudf::device_span<cudf::io::parquet::gpu::EncPage, 18446744073709551615ul>, unsigned long, unsigned int, unsigned int, unsigned int, unsigned int, cudf::io::statistics_chunk const*, cudf::io::statistics_chunk const*, cudf::io::statistics_chunk const*) [0x172e4e5]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x173275b]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1734388]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:cudf::io::write_parquet(cudf::io::parquet_writer_options const&, rmm::mr::device_memory_resource*) [0x164bc5c]
=========                in /conda/envs/rapids/lib/libcudf.so
=========     Host Frame:ParquetSizedTest_DictionaryTest_Test::TestBody() [0xad84f]
=========                in /cudf/cpp/build/gtests/PARQUET_TEST

Use the following command to reproduce the bug:

compute-sanitizer --tool memcheck gtests/PARQUET_TEST --gtest_filter=ParquetDictionaryTest/ParquetSizedTest.DictionaryTest/22 --rmm_mode=cuda

This is occurring on the current 22.12. I've not verified it occurs on a previous branch.

@davidwendt davidwendt added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Oct 13, 2022
@GregoryKimball GregoryKimball removed the Needs Triage Need team to review and classify label Oct 19, 2022
@GregoryKimball
Copy link
Contributor

@vuule would you please take a look at this memcheck failure?

rapids-bot bot pushed a commit that referenced this issue Oct 21, 2022
Closes #11916

cuda memcheck reports an OOB write in one of the tests. The root cause is an underallocated buffer for encoded pages.
This PR fixes the computation of the maximum size of data pages (RLE encoded) when dictionary encoding is used.
Other changes:
Refactored max RLE page size computation to avoid code repetition.
Use actual dictionary index width instead of (outdated) worst case.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #11962
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants