[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch #6561

revans2 · 2022-09-14T20:07:57Z

Is your feature request related to a problem? Please describe.
Currently SpillableColumnarBatch does some things that are far from ideal for the spill code. When you get a batch it will lock the underlying spill id, create the ColumnarBatch, and then release the spill id. After that the regular reference counting is used to keep the buffers that make up the ColumnarBatch around until they are no longer needed.

The problem with this is that the RapidsBufferCatalog thinks that all of the buffers are spillable, even when reference counts prevent them from actually being freed. Ideally as long as someone sill holds a reference to the underlying buffer we would not release the spill id. I think we can do this, but we would need to add a layer of indirection at the DeviceMemoryBuffer layer. We could create a new SpillableMemoryBuffer that would hold both a DeviceMemoryBuffer and the buffer/spill id. It would have a set of reference counts separate from the DeviceMemoryBuffer. When the SpillableMemoryBuffer reaches a reference count of 0, then it would release the spill id. Then the spill code would be allowed to actually free the underlying DeviceMemoryBuffer when spilling.

abellina · 2022-10-17T15:51:35Z

Discussed with @revans2 and @jlowe today on this. @revans2 proposed an idea that would add callbacks likely in DeviceMemoryBuffer that could be used by the spill framework to register a function that would actually mark the buffer as spillable (e.g. release the ref count in the spillable framework).

Another topic mentioned is the potential for collisions where the same buffer (the same contig split buffer) has been registered twice with the spill framework. Say an upstream exec makes a buffer spillable, and then the same buffer is returned as part of next(), only to be added again to the spill framework. Since these buffers have IDs, perhaps there could be some smarts built into the catalog to deal with this to de-duplicate redundant registrations.

abellina · 2022-11-02T21:52:49Z

I am interested in picking this up after my current tasks as this is related to the "maximum live memory" question we are trying to answer with changes to cuDF and plugin.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Sep 14, 2022

sameerz removed the ? - Needs Triage Need team to review and classify label Sep 20, 2022

This was referenced Oct 17, 2022

[TASK] Run without fatal OOMs #6746

Closed

Cuda.deviceSynchronize as a last resort if we cannot spill enough #6849

Merged

[BUG] Spilling logic can spill data that cannot be freed #6864

Closed

jlowe mentioned this issue Oct 28, 2022

Track stacktrace of maximum unspillable memory #6946

Open

abellina self-assigned this Nov 4, 2022

abellina mentioned this issue Jan 24, 2023

Enables spillable/unspillable state for RapidsBuffer and allow buffer sharing #7572

Merged

5 tasks

sameerz removed the feature request New feature or request label Jan 29, 2023

abellina closed this as completed Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch #6561

[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch #6561

revans2 commented Sep 14, 2022

abellina commented Oct 17, 2022 •

edited

Loading

abellina commented Nov 2, 2022

[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch #6561

[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch #6561

Comments

revans2 commented Sep 14, 2022

abellina commented Oct 17, 2022 • edited Loading

abellina commented Nov 2, 2022

abellina commented Oct 17, 2022 •

edited

Loading