-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Illegal memory access when writing an ORC file #13742
Comments
Looking into this issue, it could actually be that the file has something "invalid" about it that leads to something incorrect in the output from the ORC reader. If I read and write the attached file with cuDF it crashes.
If I read and write the attached file with pandas it crashes. (!!)
If I roundtrip the data through pandas, it succeeds, and can be written and read correctly.
Something about this file makes it read as an invalid table for both cuDF-python and pyarrow |
Closes #13742 Fixes an OOB access in `rowgroup_char_counts_kernel` when the input column has no rows. This can happen with string columns with a parent list column. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Karthikeyan (https://github.com/karthikeyann) - Mike Wilson (https://github.com/hyperbolic2346) - Vyas Ramasubramani (https://github.com/vyasr) URL: #13745
Describe the bug
I was able to reproduce this recently on the latest 23.08 code. I am going to work on getting a repro case in c++ but for now this is what I have.
In spark when I read in the following file and write it out again as ORC I get an illegal memory access.
data.zip
When I do this with compute_sanitizer I see errors like the following...
I think it has something to do with all of the empty strings being written out, and
rowgroup_char_counts_kernel
is not happy with it.The text was updated successfully, but these errors were encountered: