[BUG] writing binary data in a struct causes out of index reads #11506

revans2 · 2022-08-10T16:08:49Z

Describe the bug
As a part of trying to add binary reads to the Rapids Accelerator for Apache Spark I found that it would crash when the data was in a struct. I think it will also crash inside an array, but I am still debugging that.

Steps/Code to reproduce bug
In the Spark plugin with my still WIP patch I just had to run.

spark.range(100).selectExpr("CAST(id AS String) as s").selectExpr("CAST(S AS BINARY) as b").selectExpr("struct(b) as st").write.mode("overwrite").parquet("./target/TEST")

And it would crash. I ran compute sanitizer on it and got back 129 errors, with the first one being...

========= Invalid __global__ read of size 4 bytes
=========     at 0x15f0 in void cudf::io::parquet::gpu::gpuInitPageFragments<(int)512>(cudf::detail::base_2dspan<cudf::io::parquet::gpu::PageFragment, cudf::device_span>, cudf::device_span<const cudf::io::parquet::gpu::parquet_column_device_view, (unsigned long)18446744073709551615>, cudf::device_span<const cudf::io::partition_info, (unsigned long)18446744073709551615>, cudf::device_span<const int, (unsigned long)18446744073709551615>, unsigned int)
=========     by thread (96,0,0) in block (0,0,0)
=========     Address 0x180 is out of bounds
=========     and is 139862455877248 bytes before the nearest allocation at 0x7f3444000000 of size 21474836480 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x23adbc]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x3d4702b]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame: [0x3d84778]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::parquet::gpu::InitPageFragments(cudf::detail::base_2dspan<cudf::io::parquet::gpu::PageFragment, cudf::device_span>, cudf::device_span<cudf::io::parquet::gpu::parquet_column_device_view const, 18446744073709551615ul>, cudf::device_span<cudf::io::partition_info const, 18446744073709551615ul>, cudf::device_span<int const, 18446744073709551615ul>, unsigned int, rmm::cuda_stream_view) [0x1c66217]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::init_page_fragments(cudf::detail::hostdevice_2dvector<cudf::io::parquet::gpu::PageFragment>&, cudf::device_span<cudf::io::parquet::gpu::parquet_column_device_view const, 18446744073709551615ul>, cudf::host_span<cudf::io::partition_info const, 18446744073709551615ul>, cudf::device_span<int const, 18446744073709551615ul>, unsigned int) [0x1c93db9]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::impl::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1c9c31f]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::detail::parquet::writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1c9d008]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:cudf::io::parquet_chunked_writer::write(cudf::table_view const&, std::vector<cudf::io::partition_info, std::allocator<cudf::io::partition_info> > const&) [0x1be74cc]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame:Java_ai_rapids_cudf_Table_writeParquetChunk [0x147faa7]
=========                in /tmp/cudf131060217416882481.so
=========     Host Frame: [0x1524e7c867]
=========                in

fails.zip holds a version of the file that I am trying to write, but it was written on the CPU.

The text was updated successfully, but these errors were encountered:

This fixes the crash described in the bug related to writing nested data in parquet with the binary flag set to write binary data as byte_arrays. We were incorrectly selecting the top-most node instead of the list<int8>, which resulted in a crash down in the kernels when the data pointer was null for those upper list columns. closes #11506 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Bradley Dice (https://github.com/bdice) URL: #11526

revans2 added bug Something isn't working Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Aug 10, 2022

hyperbolic2346 self-assigned this Aug 10, 2022

hyperbolic2346 mentioned this issue Aug 12, 2022

Adding optional parquet reader schema #11524

Merged

3 tasks

sameerz changed the title ~~[BUG] writeing binary data in a struct causes out of index reads~~ [BUG] writing binary data in a struct causes out of index reads Aug 12, 2022

hyperbolic2346 mentioned this issue Aug 13, 2022

Fixing crash when writing binary nested data in parquet #11526

Merged

3 tasks

rapids-bot bot closed this as completed in #11526 Aug 15, 2022

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] writing binary data in a struct causes out of index reads #11506

[BUG] writing binary data in a struct causes out of index reads #11506

revans2 commented Aug 10, 2022

[BUG] writing binary data in a struct causes out of index reads #11506

[BUG] writing binary data in a struct causes out of index reads #11506

Comments

revans2 commented Aug 10, 2022