Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] writing binary data in a struct causes out of index reads #11506

Closed
revans2 opened this issue Aug 10, 2022 · 0 comments · Fixed by #11526
Closed

[BUG] writing binary data in a struct causes out of index reads #11506

revans2 opened this issue Aug 10, 2022 · 0 comments · Fixed by #11526
Assignees
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Aug 10, 2022

Describe the bug
As a part of trying to add binary reads to the Rapids Accelerator for Apache Spark I found that it would crash when the data was in a struct. I think it will also crash inside an array, but I am still debugging that.

Steps/Code to reproduce bug
In the Spark plugin with my still WIP patch I just had to run.

spark.range(100).selectExpr("CAST(id AS String) as s").selectExpr("CAST(S AS BINARY) as b").selectExpr("struct(b) as st").write.mode("overwrite").parquet("./target/TEST")

And it would crash. I ran compute sanitizer on it and got back 129 errors, with the first one being...

========= Invalid __global__ read of size 4 bytes
=========     at 0x15f0 in void cudf::io::parquet::gpu::gpuInitPageFragments<(int)512>(cudf::detail::base_2dspan<cudf::io::parquet::gpu::PageFragment, cudf::device_span>, cudf::device_span<const cudf::io::parquet::gpu::parquet_column_device_view, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::io::partition_info, (unsigned long)18446744073709551615>, cudf::device_span<const int, (unsigned long)18446744073709551615>, unsigned int)
=========     by thread (96,0,0) in block (0,0,0)
=========     Address 0x180 is out of bounds
=========     and is 139862455877248 bytes before the nearest allocation at 0x7f3444000000 of size 21474836480 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x23adbc]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x3d4702b]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame: [0x3d84778]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::parquet::gpu::InitPageFragments(cudf::detail::base_2dspan<cudf::io::parquet::gpu::PageFragment, cudf::device_span>, cudf::device_span<cudf::io::parquet::gpu::parquet_column_device_view const, 18446744073709551615ul>, cudf::device_span<cudf::io::partition_info const, 18446744073709551615ul>, cudf::device_span<int const, 18446744073709551615ul>, unsigned int, rmm::cuda_stream_view) [0x1c66217]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::init_page_fragments(cudf::detail::hostdevice_2dvector<cudf::io::parquet::gpu::PageFragment>&, cudf::device_span<cudf::io::parquet::gpu::parquet_column_device_view const, 18446744073709551615ul>, cudf::host_span<cudf::io::partition_info const, 18446744073709551615ul>, cudf::device_span<int const, 18446744073709551615ul>, unsigned int) [0x1c93db9]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1c9c31f]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1c9d008]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::parquet_chunked_writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1be74cc]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:Java_ai_rapids_cudf_Table_writeParquetChunk [0x147faa7]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame: [0x1524e7c867]
=========                in 

fails.zip holds a version of the file that I am trying to write, but it was written on the CPU.

@revans2 revans2 added bug Something isn't working Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Aug 10, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this Aug 10, 2022
@sameerz sameerz changed the title [BUG] writeing binary data in a struct causes out of index reads [BUG] writing binary data in a struct causes out of index reads Aug 12, 2022
rapids-bot bot pushed a commit that referenced this issue Aug 15, 2022
This fixes the crash described in the bug related to writing nested data in parquet with the binary flag set to write binary data as byte_arrays. We were incorrectly selecting the top-most node instead of the list<int8>, which resulted in a crash down in the kernels when the data pointer was null for those upper list columns.

closes #11506

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #11526
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants