Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prototype] Rapids spilling manager #10746

Closed
wants to merge 79 commits into from

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Apr 27, 2022

This is part of the effort to implement seamlessly spilling in cuDF and is just for testing for now.

The idea is to have a new column accessor, SpillableColumnAccessor, that can serialize and deserialize its columns in-place and a new manager that order column serializations triggered by rmm.mr.FailureCallbackResourceAdaptor.

As a demonstration, I have included python/cudf/cudf/spilling-demo.py that continues to allocate random dataframes until running out of device memory at which point spilling are triggered.

Output of running `spilling-demo.py` on my workstation
$ python python/cudf/cudf/spilling-demo.py 
Initial state - device:  1.918 GB, host:  1.154 GB
[ 0] dataframes:  2.235 GB, device:  6.437 GB, host:  1.168 GB
[ 1] dataframes:  4.470 GB, device:  8.675 GB, host:  1.168 GB
[ 2] dataframes:  6.706 GB, device: 10.911 GB, host:  1.168 GB
[ 3] dataframes:  8.941 GB, device: 13.148 GB, host:  1.168 GB
[ 4] dataframes: 11.176 GB, device: 15.395 GB, host:  1.168 GB
[ 5] dataframes: 13.411 GB, device: 17.633 GB, host:  1.168 GB
[ 6] dataframes: 15.646 GB, device: 19.871 GB, host:  1.168 GB
[ 7] dataframes: 17.881 GB, device: 22.109 GB, host:  1.168 GB
[ 8] dataframes: 20.117 GB, device: 24.348 GB, host:  1.168 GB
[ 9] dataframes: 22.352 GB, device: 26.586 GB, host:  1.168 GB
[10] dataframes: 24.587 GB, device: 28.824 GB, host:  1.169 GB
[11] dataframes: 26.822 GB, device: 31.062 GB, host:  1.169 GB
[12] dataframes: 29.057 GB, device: 29.562 GB, host:  4.883 GB
[13] dataframes: 31.292 GB, device: 29.562 GB, host:  7.118 GB
[14] dataframes: 33.528 GB, device: 29.562 GB, host:  9.353 GB
Spill all device memory
Spilling column: 0.745 GB, device: 28.815 GB, host: 10.098 GB
Spilling column: 0.745 GB, device: 28.069 GB, host: 10.843 GB
Spilling column: 0.745 GB, device: 27.332 GB, host: 11.588 GB
Spilling column: 0.745 GB, device: 26.582 GB, host: 12.333 GB
Spilling column: 0.745 GB, device: 25.836 GB, host: 13.078 GB
Spilling column: 0.745 GB, device: 25.090 GB, host: 13.823 GB
Spilling column: 0.745 GB, device: 24.344 GB, host: 14.568 GB
Spilling column: 0.745 GB, device: 23.605 GB, host: 15.313 GB
Spilling column: 0.745 GB, device: 22.859 GB, host: 16.058 GB
Spilling column: 0.745 GB, device: 22.109 GB, host: 16.804 GB
Spilling column: 0.745 GB, device: 21.363 GB, host: 17.549 GB
Spilling column: 0.745 GB, device: 20.621 GB, host: 18.294 GB
Spilling column: 0.745 GB, device: 19.875 GB, host: 19.039 GB
Spilling column: 0.745 GB, device: 19.125 GB, host: 19.784 GB
Spilling column: 0.745 GB, device: 18.379 GB, host: 20.529 GB
Spilling column: 0.745 GB, device: 17.633 GB, host: 21.274 GB
Spilling column: 0.745 GB, device: 16.891 GB, host: 22.019 GB
Spilling column: 0.745 GB, device: 16.145 GB, host: 22.764 GB
Spilling column: 0.745 GB, device: 15.398 GB, host: 23.509 GB
Spilling column: 0.745 GB, device: 14.652 GB, host: 24.254 GB
Spilling column: 0.745 GB, device: 13.906 GB, host: 24.999 GB
Spilling column: 0.745 GB, device: 13.156 GB, host: 25.744 GB
Spilling column: 0.745 GB, device: 12.414 GB, host: 26.489 GB
Spilling column: 0.745 GB, device: 11.668 GB, host: 27.235 GB
Spilling column: 0.745 GB, device: 10.922 GB, host: 27.979 GB
Spilling column: 0.745 GB, device: 10.176 GB, host: 28.724 GB
Spilling column: 0.745 GB, device:  9.426 GB, host: 29.470 GB
Spilling column: 0.745 GB, device:  8.680 GB, host: 30.215 GB
Spilling column: 0.745 GB, device:  7.934 GB, host: 30.960 GB
Spilling column: 0.745 GB, device:  7.188 GB, host: 31.705 GB
Spilling column: 0.745 GB, device:  6.441 GB, host: 32.450 GB
Spilling column: 0.745 GB, device:  5.699 GB, host: 33.195 GB
Spilling column: 0.745 GB, device:  4.949 GB, host: 33.940 GB
Spilling column: 0.745 GB, device:  4.203 GB, host: 34.685 GB
Spilling column: 0.745 GB, device:  3.461 GB, host: 35.430 GB
Spilling column: 0.745 GB, device:  2.715 GB, host: 36.175 GB
Spilling column: 0.745 GB, device:  1.969 GB, host: 36.920 GB
Finished spilling - device:  1.969 GB, host: 36.920 GB
Access spilled dataframes
[ 0] dataframe access, device:  8.680 GB, host: 32.450 GB
[ 1] dataframe access, device: 10.918 GB, host: 30.215 GB
[ 2] dataframe access, device: 13.156 GB, host: 27.980 GB
[ 3] dataframe access, device: 15.399 GB, host: 25.744 GB
[ 4] dataframe access, device: 17.634 GB, host: 23.509 GB
[ 5] dataframe access, device: 19.872 GB, host: 21.274 GB
[ 6] dataframe access, device: 22.096 GB, host: 19.039 GB
[ 7] dataframe access, device: 24.318 GB, host: 16.804 GB
[ 8] dataframe access, device: 26.563 GB, host: 14.569 GB
[ 9] dataframe access, device: 28.805 GB, host: 12.333 GB
[10] dataframe access, device: 29.552 GB, host: 11.588 GB
[11] dataframe access, device: 29.552 GB, host: 11.588 GB
[12] dataframe access, device: 29.547 GB, host: 11.588 GB
[13] dataframe access, device: 29.543 GB, host: 11.588 GB
[14] dataframe access, device: 29.543 GB, host: 11.588 GB
Deleting dataframes - device: 1.937 GB, host: 1.157 GB
Initial/end state delta - device: 0.019531 GB, host: 0.003410 GB

cc. @shwina @quasiben

@madsbk madsbk added 2 - In Progress Currently a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 27, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 27, 2022
@@ -0,0 +1,99 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is only a demonstration and should be moved or removed before release

@madsbk madsbk force-pushed the rapids_spilling_manager branch from 77c9aa4 to eeee952 Compare April 27, 2022 12:05
@madsbk madsbk removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 27, 2022
@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@bad00d7). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head f7d42ec differs from pull request most recent head d1317a6. Consider uploading reports for the commit d1317a6 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.08   #10746   +/-   ##
===============================================
  Coverage                ?   85.90%           
===============================================
  Files                   ?      147           
  Lines                   ?    23123           
  Branches                ?        0           
===============================================
  Hits                    ?    19864           
  Misses                  ?     3259           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bad00d7...d1317a6. Read the comment docs.

@jrhemstad
Copy link
Contributor

Is this dependent on the hacks @shwina did in #10592 or did we find some way around needing to do that?

@shwina
Copy link
Contributor

shwina commented Apr 27, 2022

Sync'd offline with Jake, but for completeness: It's not independent. We're going to have to incorporate those changes here eventually. This is not ready for review.

@madsbk madsbk force-pushed the rapids_spilling_manager branch 3 times, most recently from 95418c5 to 36e40c4 Compare May 4, 2022 14:00
@madsbk madsbk force-pushed the rapids_spilling_manager branch 3 times, most recently from c2146eb to ad31306 Compare May 10, 2022 07:52
@madsbk madsbk force-pushed the rapids_spilling_manager branch from a57f1da to 2c0d15a Compare May 19, 2022 10:06
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 19, 2022
*/
column_view_base(data_type type,
size_type size,
void const* data,
bitmask_type const* null_mask = nullptr,
size_type null_count = UNKNOWN_NULL_COUNT,
size_type offset = 0);
size_type offset = 0,
std::shared_ptr<void> owner = nullptr);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhemstad do you think that a owner argument could be accepted in libcudf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More likely we're going to do something like subclass column_view and use that from the Python side (or use composition v/s inheritance). There are other alternatives, but for now, let's keep this and evaluate once we're past the POC stage.

@madsbk madsbk force-pushed the rapids_spilling_manager branch 2 times, most recently from 4c2abcc to 82e9529 Compare May 20, 2022 07:34
@madsbk madsbk force-pushed the rapids_spilling_manager branch from df39a76 to 7c15ab6 Compare May 25, 2022 07:46
rapids-bot bot pushed a commit that referenced this pull request May 25, 2022
This PR makes `Buffer.ptr` read-only and introduce `Buffer.from_buffer`:
```python 
@classmethod
def from_buffer(cls, buffer: Buffer, size: int = None, offset: int = 0):
    """
    Create a buffer from another buffer

    Parameters
    ----------
    buffer : Buffer
        The base buffer, which will also be set as the owner of
        the memory allocation.
    size : int, optional
        Size of the memory allocation (default: `buffer.size`).
    offset : int, optional
        Start offset relative to `buffer.ptr`.
    """
```

This is mainly motivated by my work on [spilling](#10746) by making it a bit easier to reason about the relationship between buffers.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ashwin Srinath (https://github.com/shwina)

URL: #10872
@github-actions
Copy link

github-actions bot commented Aug 7, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this pull request Aug 11, 2022
This PR introduces factory functions to create `Buffer` instances, which makes it possible to change the returned buffer type based on a configuration option in a follow-up PR.

Beside simplifying the code base a bit, this is motivated by the spilling work in #10746. We would like to introduce a new spillable Buffer class that requires minimal changes to the existing code and is only used when enabled explicitly. This way, we can introduce spilling in cuDF as an experimental feature with minimal risk to the existing code.

@shwina and I discussed the possibility to let `Buffer.__new__` return different class type instances instead of using factory functions but we concluded that having `Buffer()` return anything other than an instance of `Buffer` is simply too surprising :)

**Notice**, this is breaking because it removes unused methods such as `Buffer.copy()` and `Buffer.nbytes`. 
~~However, we still support creating a buffer directly by calling `Buffer(obj)`. AFAIK, this is the only way `Buffer` is created outside of cuDF, which [a github search seems to confirm](https://github.com/search?l=&q=cudf.core.buffer+-repo%3Arapidsai%2Fcudf&type=code).~~
This PR doesn't change the signature of `Buffer.__init__()` anymore.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)
  - https://github.com/brandon-b-miller

URL: #11447
@madsbk madsbk added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Aug 17, 2022
madsbk added a commit to madsbk/cudf that referenced this pull request Aug 22, 2022
@madsbk
Copy link
Member Author

madsbk commented Oct 3, 2022

Closed in favor of #11553

@madsbk madsbk closed this Oct 3, 2022
@madsbk madsbk deleted the rapids_spilling_manager branch October 26, 2022 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants