-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in dask_cudf.read_parquet for index=False #9453
Fix bug in dask_cudf.read_parquet for index=False #9453
Conversation
tmpdir = str(tmpdir) | ||
path = os.path.join(tmpdir, "test.parquet") | ||
|
||
df2 = ddf.reset_index(drop=True).compute() | ||
df2.to_parquet(path, engine="pyarrow") | ||
|
||
ddf3 = dask_cudf.read_parquet(path, index=False) | ||
dd.assert_eq(df2, ddf3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmpdir = str(tmpdir) | |
path = os.path.join(tmpdir, "test.parquet") | |
df2 = ddf.reset_index(drop=True).compute() | |
df2.to_parquet(path, engine="pyarrow") | |
ddf3 = dask_cudf.read_parquet(path, index=False) | |
dd.assert_eq(df2, ddf3) | |
bytes_buf = BytesIO() | |
df2 = ddf.reset_index(drop=True).compute() | |
df2.to_parquet(bytes_buf, engine="pyarrow") | |
ddf3 = dask_cudf.read_parquet(bytes_buf, index=False) | |
dd.assert_eq(df2, ddf3) |
Can we use BytesIO
instead of interacting with Filesystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo - I like your thinking here, but we cannot pass a BytesIO object to dask_cudf.read_parquet without making some much larger changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I thought dask_cudf.read_parquet
accepts Bytes like object similar to pyarrow
/cudf
/pandas
. Then lets not do it now.
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9453 +/- ##
================================================
+ Coverage 10.79% 10.82% +0.03%
================================================
Files 116 117 +1
Lines 18869 19454 +585
================================================
+ Hits 2036 2106 +70
- Misses 16833 17348 +515
Continue to review full report at Codecov.
|
@gpucibot merge |
Simple fix for case that index=False in
dask_cudf.read_parquet
(the default/general case in NVTabular).