Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Illegal memory access when writing an ORC file #13742

Closed
revans2 opened this issue Jul 24, 2023 · 2 comments · Fixed by #13745
Closed

[BUG] Illegal memory access when writing an ORC file #13742

revans2 opened this issue Jul 24, 2023 · 2 comments · Fixed by #13745
Assignees
Labels
bug Something isn't working cuIO cuIO issue Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Jul 24, 2023

Describe the bug
I was able to reproduce this recently on the latest 23.08 code. I am going to work on getting a repro case in c++ but for now this is what I have.

In spark when I read in the following file and write it out again as ORC I get an illegal memory access.

data.zip

When I do this with compute_sanitizer I see errors like the following...

========= Invalid __global__ read of size 4 bytes
=========     at 0x240 in cudf::io::orc::gpu::rowgroup_char_counts_kernel(cudf::detail::base_2dspan<int, cudf::device_span>, cudf::device_span<const cudf::io::orc::orc_column_device_view, (unsigned long)18446744073709551615>, cudf::detail::base_2dspan<const cudf::io::orc::rowgroup_rows, cudf::device_span>, cudf::device_span<const unsigned int, (unsigned long)18446744073709551615>)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x0 is out of bounds
=========     and is 115024592896 bytes before the nearest allocation at 0x1ac8000000 of size 134217728 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x23adbc]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x317d0eb]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame: [0x31bcd38]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:__device_stub__ZN4cudf2io3orc3gpu27rowgroup_char_counts_kernelENS_6detail11base_2dspanIiNS_11device_spanEEENS5_IKNS1_22orc_column_device_viewELm18446744073709551615EEENS4_IKNS1_13rowgroup_rowsES5_EENS5_IKjLm18446744073709551615EEE(cudf::detail::base_2dspan<int, cudf::device_span>&, cudf::device_span<cudf::io::orc::orc_column_device_view const, 18446744073709551615ul>&, cudf::detail::base_2dspan<cudf::io::orc::rowgroup_rows const, cudf::device_span>&, cudf::device_span<unsigned int const, 18446744073709551615ul>&) [0x1aa65c4]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:cudf::io::orc::gpu::rowgroup_char_counts_kernel(cudf::detail::base_2dspan<int, cudf::device_span>, cudf::device_span<cudf::io::orc::orc_column_device_view const, 18446744073709551615ul>, cudf::detail::base_2dspan<cudf::io::orc::rowgroup_rows const, cudf::device_span>, cudf::device_span<unsigned int const, 18446744073709551615ul>) [0x1aa65fe]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:cudf::io::detail::orc::(anonymous namespace)::set_rowgroup_char_counts(cudf::io::detail::orc::orc_table_view&, cudf::detail::base_2dspan<cudf::io::orc::rowgroup_rows const, cudf::device_span>, rmm::cuda_stream_view) [0x1ad4b31]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:cudf::io::detail::orc::(anonymous namespace)::convert_table_to_orc_data(cudf::table_view const&, cudf::io::table_input_metadata const&, cudf::io::detail::orc::stripe_size_limits, int, bool, cudf::io::orc::CompressionKind, unsigned long, cudf::io::statistics_freq, bool, cudf::io::detail::single_write_mode, cudf::io::data_sink const&, rmm::cuda_stream_view) [0x1ae3567]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:cudf::io::detail::orc::writer::impl::write(cudf::table_view const&) [0x1ae4a40]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:cudf::io::orc_chunked_writer::write(cudf::table_view const&) [0x1a1d019]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame:Java_ai_rapids_cudf_Table_writeORCChunk [0x122099e]
=========                in /tmp/cudf5670468847362030375.so
=========     Host Frame: [0x17c2b9ba97]
=========                in 

I think it has something to do with all of the empty strings being written out, and rowgroup_char_counts_kernel is not happy with it.

@revans2 revans2 added bug Something isn't working Needs Triage Need team to review and classify cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Jul 24, 2023
@GregoryKimball
Copy link
Contributor

GregoryKimball commented Jul 24, 2023

Looking into this issue, it could actually be that the file has something "invalid" about it that leads to something incorrect in the output from the ORC reader.

If I read and write the attached file with cuDF it crashes.

df = cudf.read_orc('part-00000-7f63f99e-2d40-41d9-9209-b468e5af4111-c000.snappy.orc')
df.to_orc('test.orc')
...
terminate called after throwing an instance of 'cudf::fatal_cuda_error'
  what():  Fatal CUDA error encountered at: /nfs/repo/cudf-23.08/cpp/include/cudf/detail/utilities/pinned_host_vector.hpp:172: 700 cudaErrorIllegalAddress an illegal memory access was encountered
Aborted (core dumped)

If I read and write the attached file with pandas it crashes. (!!)

df = pd.read_orc('part-00000-7f63f99e-2d40-41d9-9209-b468e5af4111-c000.snappy.orc')
df.to_orc('test.orc')
...
pyarrow.lib.ArrowNotImplementedError: Unknown or unsupported Arrow type: null

If I roundtrip the data through pandas, it succeeds, and can be written and read correctly.

df = cudf.read_orc('part-00000-7f63f99e-2d40-41d9-9209-b468e5af4111-c000.snappy.orc')
df = cudf.DataFrame(df.to_pandas()) 
df.to_orc('test.orc')

Something about this file makes it read as an invalid table for both cuDF-python and pyarrow

@vuule
Copy link
Contributor

vuule commented Jul 24, 2023

Opened #13745 with a potential fix, @revans2 please verify if this fixes the issue on your end.

rapids-bot bot pushed a commit that referenced this issue Jul 25, 2023
Closes #13742
Fixes an OOB access in `rowgroup_char_counts_kernel` when the input column has no rows. This can happen with string columns with a parent list column.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Karthikeyan (https://github.com/karthikeyann)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #13745
@GregoryKimball GregoryKimball removed this from libcudf Oct 26, 2023
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants