-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose pack/unpack API to Python #8153
Expose pack/unpack API to Python #8153
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8153 +/- ##
==============================================
Coverage 10.62% 10.62%
==============================================
Files 109 109
Lines 18246 18627 +381
==============================================
+ Hits 1938 1980 +42
- Misses 16308 16647 +339
Continue to review full report at Codecov.
|
Yeah, this PR doesn't do anything to change how dataframes are currently serialized - I imagined that the task of adding pack/unpack as an option for serialization would probably bring up a larger conversation about how we want to make cuDF configurable, which I thought would be better for a follow up PR |
Yep that makes a lot of sense. Thanks for working on this btw 🙂 |
rerun tests |
Not sure why this is failing on mypy in the style checks, seems to pass locally |
The style checks should resolve with #8595. |
rerun tests |
1 similar comment
rerun tests |
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
It appears we are seeing a testing failure related to this change 20:43:20 ___________________ ERROR collecting cudf/tests/test_pack.py ___________________
20:43:20 ImportError while importing test module '/workspace/python/cudf/cudf/tests/test_pack.py'.
20:43:20 Hint: make sure your test modules/packages have valid Python names.
20:43:20 Traceback:
20:43:20 /opt/conda/envs/rapids/lib/python3.7/importlib/__init__.py:127: in import_module
20:43:20 return _bootstrap._gcd_import(name[level:], package, level)
20:43:20 cudf/tests/test_pack.py:23: in <module>
20:43:20 from cudf.tests.utils import assert_eq
20:43:20 E ModuleNotFoundError: No module named 'cudf.tests' |
Yup it looks like |
@gpucibot merge |
Contributes to #17317 Also I found that `PackedColumns` was not being use anywhere. It appears it was added back in #8153 for dask_cudf but I cannot see it being used there anymore Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17548
Closes #7601
Adds a Python API for
pack
/unpack
, so that we might be able to pack/unpack DataFrames in serialization:PackedColumns
is a Python representation of thecudf::packed_columns
struct containing the struct itself along with some Python metadata for the DataFrame being packed; supports Dask/pickle serializationpack()
takes in aTable
and returns aPackedColumns
unpack()
takes in aPackedColumns
and returns aTable
cc @brandon-b-miller