-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cudf.read_orc reads incorrect data for one row #5440
Comments
trstovall
added
Needs Triage
Need team to review and classify
bug
Something isn't working
labels
Jun 10, 2020
kkraus14
added
cuIO
cuIO issue
and removed
Needs Triage
Need team to review and classify
labels
Jun 10, 2020
Tried to read with pyarrow and it works. import cudf
import pyarrow.orc as orc
df = cudf.read_orc("to_orc_bug.orc")
df.upc_nbr[(df.visit_nbr == 14600028) & (df.store_nbr == 47)] = 681131184420
df.to_orc("to_orc_bug2.orc", compression="snappy")
pdf = orc.ORCFile("to_orc_bug2.orc").read().to_pandas()
print(pdf[(pdf.visit_nbr == 14600028) & (pdf.store_nbr == 47)])
Seems to be a reader issue. |
kkraus14
changed the title
[BUG] cudf.to_orc writes incorrect data for one row
[BUG] cudf.read_orc reads incorrect data for one row
Jun 11, 2020
Relabeled issue as such |
This was an easy fix but I'm still trying to figure out how to properly add tests for this, or in general, anything in cuIO. |
devavret
added a commit
to devavret/cudf
that referenced
this issue
Jun 15, 2020
Fixes the narrowing conversion in bytestream reading in patched RLE
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
A clear and concise description of what the bug is.
Steps/Code to reproduce bug
Expected behavior
Returned value should be
681131184420
, not2526351652
.Environment overview (please complete the following information)
to_orc_bug.orc.zip
The text was updated successfully, but these errors were encountered: