Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] memcheck and racecheck errors in avro reader with codec="deflate" #15216

Closed
wence- opened this issue Mar 4, 2024 · 2 comments
Closed
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@wence-
Copy link
Contributor

wence- commented Mar 4, 2024

Describe the bug

import cudf
import fastavro
import io

total_rows = num_rows = rows_per_block = 2048
total_bytes_per_block = rows_per_block * 7

records = [{"0": f"{i:0>6}"} for i in range(total_rows)]
schema = {
    "name": "root",
    "type": "record",
    "fields": [
        {"name": "0", "type": "string"},
    ],
}

buffer = io.BytesIO()
fastavro.writer(buffer, schema, records, sync_interval=total_bytes_per_block, codec="deflate")
buffer.seek(0)

actual_df = cudf.read_avro(buffer, skiprows=0, num_rows=num_rows)

Extracted from test_avro_reader_fastavro_integration.py::test_avro_reader_multiblock.

Neither

compute-sanitizer --tool=memcheck python bug.py

nor

compute-sanitizer --tool=racecheck python bug.py

are clean.

Exemplar stack traces:

memcheck
========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 1 bytes
=========     at 0x2080 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:807:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0x7f6078604cb3 is out of bounds
=========     and is 2,356 bytes after the nearest allocation at 0x7f6078601600 of size 11,648 bytes
=========     Device Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1109:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [0x6050]
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x332470]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x14fb4]
=========                in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x70aae]
=========                in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
=========     Host Frame:/home/coder/.conda/envs/rapids/targets/x86_64-linux/include/cuda_runtime.h:216:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0x12a5605]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/tmp/tmpxft_0003da43_00000000-6_gpuinflate.compute_90.cudafe1.stub.c:1:__device_stub__ZN4cudf2io14inflate_kernelILi128EEEvNS_11device_spanIKNS2_IKhLm18446744073709551615EEELm18446744073709551615EEENS2_IKNS2_IhLm18446744073709551615EEELm18446744073709551615EEENS2_INS0_18compression_resultELm18446744073709551615EEENS0_20gzip_header_includedE(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>&, cudf::io::gzip_header_included) [0x12a4de6]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/tmp/tmpxft_0003da43_00000000-6_gpuinflate.compute_90.cudafe1.stub.c:4:void cudf::io::__wrapper__device_stub_inflate_kernel<128>(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>&, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>&, cudf::io::gzip_header_included&) [0x12a4e1e]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1145:void cudf::io::inflate_kernel<128>(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>, cudf::io::gzip_header_included) [0x12a5598]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1214:cudf::io::gpuinflate(cudf::device_span<cudf::device_span<unsigned char const, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::device_span<unsigned char, 18446744073709551615ul> const, 18446744073709551615ul>, cudf::device_span<cudf::io::compression_result, 18446744073709551615ul>, cudf::io::gzip_header_included, rmm::cuda_stream_view) [0x12a49ef]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/avro/reader_impl.cu:227:cudf::io::detail::avro::decompress_data(cudf::io::datasource&, cudf::io::detail::avro::metadata&, rmm::device_buffer const&, rmm::cuda_stream_view) [0x123db3c]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:/home/coder/cudf/cpp/src/io/avro/reader_impl.cu:528:cudf::io::detail::avro::read_avro(std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >&&, cudf::io::avro_reader_options const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*) [0x123fa1f]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame:cudf::io::read_avro(cudf::io::avro_reader_options const&, rmm::mr::device_memory_resource*) [0x13019ee]
=========                in /home/coder/cudf/cpp/build/release/libcudf.so
=========     Host Frame: [0x2ba3c]
=========                in /home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/_lib/avro.cpython-310-x86_64-linux-gnu.so
=========     Host Frame: [0x2d29f]
=========                in /home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/_lib/avro.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:4181:_PyEval_EvalFrameDefault [0x139022]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Objects/call.c:342:_PyFunction_Vectorcall [0x1448cc]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:_PyEval_EvalFrameDefault [0x1357dc]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:_PyEval_Vector [0x1d7870]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/ceval.c:1135:PyEval_EvalCode [0x1d77b7]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1292:run_eval_code_obj [0x207d1a]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1313:run_mod [0x203123]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1208:pyrun_file.cold [0x9a4d1]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:456:_PyRun_SimpleFileObject [0x1fd60e]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Python/pythonrun.c:90:_PyRun_AnyFileObject [0x1fd1a4]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Modules/main.c:670:Py_RunMain [0x1fa39b]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.10.13/Modules/main.c:1091:Py_BytesMain [0x1cae17]
=========                in /home/coder/.conda/envs/rapids/bin/python
=========     Host Frame: [0x29d90]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x29e40]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame: [0x1cad11]
=========                in /home/coder/.conda/envs/rapids/bin/python
========= 
racecheck
========= COMPUTE-SANITIZER
========= Error: Race reported between Read access at 0xe00 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:789:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x1930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:543:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16132 hazards]
=========     and Write access at 0x5660 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:661:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16156 hazards]
========= 
========= Error: Race reported between Write access at 0xd90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:957:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0x33c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:590:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1144 hazards]
=========     and Read access at 0x5250 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:642:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [6592 hazards]
========= 
========= Error: Race reported between Read access at 0x810 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:954:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Write access at 0x59c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:665:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1032 hazards]
========= 
========= Error: Race reported between Read access at 0xa70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:784:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x5930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:663:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1028 hazards]
=========     and Write access at 0x5f90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:671:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0x11c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:793:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Read access at 0xf90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:523:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [500 hazards]
=========     and Read access at 0x5dd0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:670:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0xf60 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:962:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0xdb0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:522:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [272 hazards]
========= 
========= Error: Race reported between Write access at 0x5d70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1104:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Read access at 0x5d0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:951:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x3b0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:775:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x3000 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:826:cudf::io::process_symbols(cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x31a0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1068:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Write access at 0x4900 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1081:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [4 hazards]
========= 
========= Error: Race reported between Read access at 0xe00 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:789:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x1930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:543:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16132 hazards]
=========     and Write access at 0x5660 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:661:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [16156 hazards]
========= 
========= Error: Race reported between Write access at 0xd90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:957:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0x33c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:590:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1144 hazards]
=========     and Read access at 0x5250 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:642:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [6592 hazards]
========= 
========= Error: Race reported between Read access at 0x810 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:954:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Write access at 0x59c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:665:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1032 hazards]
========= 
========= Error: Race reported between Read access at 0xa70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:784:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x5930 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:663:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [1028 hazards]
=========     and Write access at 0x5f90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:671:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0x11c0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:793:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Read access at 0xf90 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:523:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [500 hazards]
=========     and Read access at 0x5dd0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:670:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [4 hazards]
========= 
========= Error: Race reported between Write access at 0xf60 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:962:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int)
=========     and Read access at 0xdb0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:522:cudf::io::decode_symbols(cudf::io::inflate_state_s *) [272 hazards]
========= 
========= Error: Race reported between Write access at 0x5d70 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1104:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Read access at 0x5d0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:951:cudf::io::prefetch_warp(volatile cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x3b0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:775:cudf::io::process_symbols(cudf::io::inflate_state_s *, int)
=========     and Write access at 0x3000 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:826:cudf::io::process_symbols(cudf::io::inflate_state_s *, int) [8 hazards]
========= 
========= Warning: Race reported between Read access at 0x31a0 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1068:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included)
=========     and Write access at 0x4900 in /home/coder/cudf/cpp/src/io/comp/gpuinflate.cu:1081:void cudf::io::inflate_kernel<(int)128>(cudf::device_span<const cudf::device_span<const unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::device_span<unsigned char, (unsigned long)18446744073709551615>, (unsigned long)18446744073709551615>, cudf::device_span<cudf::io::compression_result, (unsigned long)18446744073709551615>, cudf::io::gzip_header_included) [4 hazards]
========= 
========= RACECHECK SUMMARY: 18 hazards displayed (14 errors, 4 warnings)

I do not know if the racecheck warnings are as problematic as the memcheck ones, gpuinflate.cu is littered with volatile accesses to the inter-warp communication queue without (AFAICT) any synchronisation, but perhaps there are enough spin-waits that it is "OK"?

@wence- wence- added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Mar 4, 2024
@wence- wence- added this to libcudf Mar 4, 2024
@wence- wence- moved this to Needs owner in libcudf Mar 4, 2024
@GregoryKimball GregoryKimball moved this from Needs owner to To be revisited in libcudf Mar 9, 2024
rapids-bot bot pushed a commit that referenced this issue Mar 15, 2024
Issue #15216

Avoids an OOB read; the read was not causing bugs as the read data was never used.

Addresses the memcheck part of #15216

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: #15309
@vyasr
Copy link
Contributor

vyasr commented May 17, 2024

I can't reproduce the memcheck error right now. I do see the same racecheck issues though.

@davidwendt
Copy link
Contributor

Inclined to close this since the memcheck errors are gone I believe the racecheck warnings to be not an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: To be revisited
Development

No branches or pull requests

3 participants