Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to remove int8 column support in parquet binary code path. #11536

Closed
hyperbolic2346 opened this issue Aug 15, 2022 · 1 comment · Fixed by #11539
Closed

Need to remove int8 column support in parquet binary code path. #11536

hyperbolic2346 opened this issue Aug 15, 2022 · 1 comment · Fixed by #11539
Assignees
Labels
bug Something isn't working cuIO cuIO issue

Comments

@hyperbolic2346
Copy link
Contributor

hyperbolic2346 commented Aug 15, 2022

"Does Spark differentiate binary types stored as int8 and uint8? Are both of these equivalent from Spark’s perspective? Can Spark normalize inputs to use only one or the other? It feels very suspicious to me that we’ve decided to accept either one across the binary code paths."

Originally posted by @bdice in #11526 (comment)

It was pointed out that the code path in parquet support both INT8 columns and UINT8 columns. We should simply be opinionated here and after discussion it was decided to use UINT8.

@hyperbolic2346 hyperbolic2346 added bug Something isn't working cuIO cuIO issue labels Aug 15, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this Aug 15, 2022
@hyperbolic2346 hyperbolic2346 changed the title Need to remove unsigned int8 references in parquet binary code path. Need to remove unsigned uint8 references in parquet binary code path. Aug 15, 2022
@hyperbolic2346 hyperbolic2346 changed the title Need to remove unsigned uint8 references in parquet binary code path. Need to remove unsigned int8 references in parquet binary code path. Aug 16, 2022
@hyperbolic2346 hyperbolic2346 changed the title Need to remove unsigned int8 references in parquet binary code path. Need to remove int8 column support in parquet binary code path. Aug 24, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Oct 18, 2022
As suggested in #11526 and captured in issue #11536 the usage of both INT8 and UINT8 as supported types for byte_arrays is unnecessary and adds complexity to the code. This change removes INT8 as an option and only allows UINT8 columns to be written out as byte_arrays. ~~This matches with cudf string columns which contain an INT8 column for data.~~

closes #11536

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Tobias Ribizel (https://github.com/upsj)
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)
  - MithunR (https://github.com/mythrocks)
  - Bradley Dice (https://github.com/bdice)

URL: #11539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant