-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding functionality for spilling to packed device memory #601
Comments
From a "standard" spilling perspective, today things work as follows: Of course, all of the above is meant as an initial implementation for evaluation of the functional aspects. We would probably want to evaluate whether combining some of these layers would lead to improved performance and a simple codebase too. EDIT: To answer on your specific question, I feel that a simple |
That idea of a new layer makes sense - I seem to recall from the last stand up the potential to maybe spill to packed memory first, so something like My biggest concern here is the return type of |
What about adding this directly to |
I'm not sure I understand - do you mean implement host<->device spilling for DataFrames on cuDF's end, or device<->packed? |
Am referring to these methods. Ideally packing/unpacking would be done there and Dask wouldn't need to know anything about it |
Ah thanks for the clarification - that makes sense. Would that mean that we wouldn't have an unpacked device layer, and that spilling DataFrames to/from device would always pack/unpack? If so, I'm wondering if we would want to keep the ability to serialize without packing by making separate functions for this (i.e. |
Yeah this came up in issue ( rapidsai/cudf#7601 ). Basically we might want a config for cuDF ( rapidsai/cudf#5311 ), which could enable/disable packing |
I agree with @jakirkham 's idea too, having this in cuDF itself is a much better alternative, and allows Dask-CUDA to abstract that change, as well as letting pure Dask users to enjoy that compression mechanism as well. |
Same here - one question, is the serialization/deserialization of DataFrames used outside of Dask contexts? I think the config module for cuDF is a good idea, but trying to gauge if this pack/unpack option could go into Dask's config module along with RMM/UCX stuff. If it is used outside of Dask, then it should definitely go in a cuDF specific configuration. |
cuDF uses these functions when pickling objects as well, which is not Dask specific. So it probably makes more sense to have the config somewhere cuDF users can control |
Closing this as the solution forward here is making changes to cuDF 🙂 |
I am currently working on a Python API for cuDF pack/unpack (rapidsai/cudf#8153), with the goal being to add functionality here to spill to packed device memory.
Currently, this API allows for something like:
Is there anything else we would want in this API to open it up for use here?
cc @madsbk @pentschev
The text was updated successfully, but these errors were encountered: