Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] row_bit_count produces Illegal memory acces on some data sets #8938

Closed
revans2 opened this issue Aug 3, 2021 · 2 comments · Fixed by #9076
Closed

[BUG] row_bit_count produces Illegal memory acces on some data sets #8938

revans2 opened this issue Aug 3, 2021 · 2 comments · Fixed by #9076
Assignees
Labels
bug Something isn't working cuIO cuIO issue

Comments

@revans2
Copy link
Contributor

revans2 commented Aug 3, 2021

Describe the bug
If I try to do a row_bit_count on the attached file BIT_MAP_INPUT_1.parquet.gz (have to unzip it first), then I get Illegal memory access errors.

Steps/Code to reproduce bug
Apply the patch

diff --git a/cpp/tests/transform/row_bit_count_test.cu b/cpp/tests/transform/row_bit_count_test.cu
index 0081cf0d46..01850b0796 100644
--- a/cpp/tests/transform/row_bit_count_test.cu
+++ b/cpp/tests/transform/row_bit_count_test.cu
@@ -16,6 +16,7 @@
 
 #include <cudf/column/column.hpp>
 #include <cudf/column/column_view.hpp>
+#include <cudf/io/parquet.hpp>
 #include <cudf/transform.hpp>
 #include <cudf/types.hpp>
 #include <cudf_test/base_fixture.hpp>
@@ -192,6 +193,18 @@ TEST_F(RowBitCount, StringsWithNulls)
   CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, *result);
 }
 
+TEST_F(RowBitCount, MapOfStringToListOfStrings)
+{
+  cudf::io::parquet_reader_options opts =
+      cudf::io::parquet_reader_options::builder(cudf::io::source_info("./BIT_MAP_INPUT_1.parquet"))
+      .build();
+
+  cudf::io::table_with_metadata table_with_meta = cudf::io::read_parquet(opts);
+  auto result = cudf::row_bit_count(*table_with_meta.tbl);
+
+  CUDA_TRY(cudaDeviceSynchronize());
+}
+
 std::pair<std::unique_ptr<column>, std::unique_ptr<column>> build_struct_column()
 {
   std::vector<bool> struct_validity{0, 1, 1, 1, 1, 0};

and build it (but don't run).

unzip the attached file and place it in your current working directory. Then run ./gtests/TRANSFORM_TEST from the test suite.

Expected behavior
No crashes no Illegal memory access errors.

@revans2 revans2 added bug Something isn't working Needs Triage Need team to review and classify labels Aug 3, 2021
@harrism harrism added the cuIO cuIO issue label Aug 12, 2021
@mythrocks mythrocks self-assigned this Aug 16, 2021
@mythrocks
Copy link
Contributor

Here's the result from cuda-memcheck:

========= Invalid __shared__ write of size 8
=========     at 0x000002f0 in cudf::detail::_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d9::compute_row_sizes(cudf::device_span<cudf::column_device_view const , unsigned long=18446744073709551615>, cudf::device_span<cudf::
detail::_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d9::column_info const , unsigned long=18446744073709551615>, cudf::device_span<int, unsigned long=18446744073709551615>, int)
=========     by thread (198,0,0) in block (1,0,0)
=========     Address 0x00000e30 is out of bounds
=========     Device Frame:cudf::detail::_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d9::compute_row_sizes(cudf::device_span<cudf::column_device_view const , unsigned long=18446744073709551615>, cudf::device_span<cudf::deta
il::_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d9::column_info const , unsigned long=18446744073709551615>, cudf::device_span<int, unsigned long=18446744073709551615>, int) (cudf::detail::_GLOBAL__N__48_tmpxft_0014f314_000
00000_7_row_bit_count_cpp1_ii_c54bd4d9::compute_row_sizes(cudf::device_span<cudf::column_device_view const , unsigned long=18446744073709551615>, cudf::device_span<cudf::detail::_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d
9::column_info const , unsigned long=18446744073709551615>, cudf::device_span<int, unsigned long=18446744073709551615>, int) : 0x2f0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x24b9a8]
=========     Host Frame:/home/mithunr/anaconda3/envs/cudf-dev-11.2/lib/libcudart.so.11.0 [0x1102b]
=========     Host Frame:/home/mithunr/anaconda3/envs/cudf-dev-11.2/lib/libcudart.so.11.0 (cudaLaunchKernel + 0x1c0) [0x5a820]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/libcudf.so [0x9fc177b]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/libcudf.so (_Z271__device_stub__ZN4cudf6detail72_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d917compute_row_sizesENS_11device_spanIKNS_18column_device_vi
ewELm18446744073709551615EEENS2_IKNS1_11column_infoELm18446744073709551615EEENS2_IiLm18446744073709551615EEEiRN4cudf11device_spanIKNS_18column_device_viewELm18446744073709551615EEERNS0_IKNS_6detail72_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_
count_cpp1_ii_c54bd4d911column_infoELm18446744073709551615EEERNS0_IiLm18446744073709551615EEEi + 0x16d) [0x9fc0f59]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/libcudf.so (_ZN4cudf6detail72_GLOBAL__N__48_tmpxft_0014f314_00000000_7_row_bit_count_cpp1_ii_c54bd4d917compute_row_sizesENS_11device_spanIKNS_18column_device_viewELm18446744073709
551615EEENS2_IKNS1_11column_infoELm18446744073709551615EEENS2_IiLm18446744073709551615EEEi + 0x4a) [0x9fc0fcb]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/libcudf.so (_ZN4cudf6detail13row_bit_countERKNS_10table_viewEN3rmm16cuda_stream_viewEPNS4_2mr22device_memory_resourceE + 0x881) [0x9fc0b9c]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/libcudf.so (_ZN4cudf13row_bit_countERKNS_10table_viewEPN3rmm2mr22device_memory_resourceE + 0x42) [0x9fc0d74]
=========     Host Frame:gtests/TRANSFORM_TEST [0x2b2b83]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc + 0x69) [0xbda99]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc + 0x5e) [0xb61b1]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing4Test3RunEv + 0xf2) [0x90556]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8TestInfo3RunEv + 0x113) [0x90f41]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing9TestSuite3RunEv + 0x12d) [0x91699]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8internal12UnitTestImpl11RunAllTestsEv + 0x41d) [0x9d843]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc + 0x69) [0xbefc2]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc + 0x5e) [0xb73ef]
=========     Host Frame:/home/mithunr/workspace/dev/cudf/2/cpp/build/lib/libgtestd.so (_ZN7testing8UnitTest3RunEv + 0xcd) [0x9c04d]
=========     Host Frame:gtests/TRANSFORM_TEST [0x26f902]
=========     Host Frame:gtests/TRANSFORM_TEST [0x26c662]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf2) [0x28cb2]
=========     Host Frame:gtests/TRANSFORM_TEST [0x1af55e]
=========

@mythrocks
Copy link
Contributor

row_bit_count() fails on the list<struct<string, list<string>>>, and on the underlying struct<string, list<string>>, but not on the either of the struct members.

mythrocks added a commit to mythrocks/cudf that referenced this issue Aug 19, 2021
For input with a number of rows exceeding  `max_block_size`, `row_bit_count()` currently
reaches past the bounds of its shared-memory allocation, causing illegal memory access
errors like in (cudf/issues/8938)[rapidsai#8938].

This commit corrects the calculation of the branch stack's base address, and adds a
test for this case.
rapids-bot bot pushed a commit that referenced this issue Aug 23, 2021
Fixes #8938.

For input with a number of rows exceeding  `max_block_size`, `row_bit_count()` currently
reaches past the bounds of its shared-memory allocation, causing illegal memory access
errors like in [cudf/issues/8938](#8938).

This commit corrects the calculation of the branch stack's base address, and adds a
test for this case.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - https://github.com/nvdbaranec
  - Nghia Truong (https://github.com/ttnghia)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #9076
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants