Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Superseded] Spilling to host memory seamlessly #11553

Closed
wants to merge 124 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
9da6b5b
column_view: add owner
madsbk Aug 11, 2022
89a9c30
Port of https://github.com/rapidsai/cudf/pull/10746
madsbk Aug 16, 2022
90bd9f1
TODO: DelayedPointerTuple not supported by PyTorch
madsbk Aug 16, 2022
f774fc8
tracking other buffers
madsbk Aug 17, 2022
8902df5
Impl. ptr_restricted() and ExposeToken
madsbk Aug 17, 2022
7349fe1
doc
madsbk Aug 17, 2022
6ae8602
build.sh: fixing test paths
madsbk Aug 17, 2022
04a69ce
SpillableBuffer: add exposed buffers to the manager
madsbk Aug 18, 2022
a1a7017
spill_device_memory(): fix possible deadlock
madsbk Aug 18, 2022
12753f9
Intorduce SpillLock
madsbk Aug 22, 2022
f28d147
Revert "column_view: add owner"
madsbk Aug 22, 2022
f426365
ptr_raw(): fix offset when spill_lock is None
madsbk Aug 22, 2022
dc9f7a5
Impl. CUDF_SPILL_STAT_EXPOSE
madsbk Aug 23, 2022
16eaadb
Print expose statistics when OOM
madsbk Aug 23, 2022
f5c75ea
table_view_from_table(): added the spill_lock argument
madsbk Aug 23, 2022
477f953
Grabbing spill locks
madsbk Aug 23, 2022
d3364e1
test_expose_statistics(): fix ordering of the stats
madsbk Aug 23, 2022
8f0dc1d
ptr_restricted(): fixed type hint
madsbk Aug 24, 2022
e966ec6
Buffer: added a readonly argument
madsbk Aug 24, 2022
f37ef91
Support of dask and cuda serialize
madsbk Aug 24, 2022
b06de72
Revert "Buffer: added a readonly argument"
madsbk Aug 24, 2022
c70a4b1
rename move_inplace() => __spill__()
madsbk Aug 31, 2022
40a8d1d
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Aug 31, 2022
31bec82
spill-on-demand: don't call gc.collect()
madsbk Aug 31, 2022
cb7758c
Add stub doc file for spill manager
shwina Sep 6, 2022
b103423
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Sep 7, 2022
4ffa8d5
Adding more spill locks
madsbk Sep 7, 2022
ee66a14
oom: collect garbage once
madsbk Sep 7, 2022
6f3aa3d
Typo
madsbk Sep 13, 2022
e733aa2
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Sep 13, 2022
0d4dae7
Typo
madsbk Sep 13, 2022
a8ca2e5
DelayedPointerTuple: added TODO
madsbk Sep 13, 2022
8ec9c67
doc
madsbk Sep 13, 2022
35bf6f8
merged ptr_raw() and ptr_restricted() into get_ptr()
madsbk Sep 13, 2022
de7c12f
Merge branch 'cudf_spilling' of https://github.com/madsbk/cudf into c…
shwina Sep 13, 2022
5ab2d32
doc
madsbk Sep 13, 2022
56014f4
removed the shared_ptr/expose implementation
madsbk Sep 14, 2022
25f361f
Moved to Pure Python
madsbk Sep 14, 2022
e00f344
Impl. SpillableBufferView
madsbk Sep 14, 2022
ca498ea
SpillableBuffer inherit from Buffer
madsbk Sep 14, 2022
ec7e416
clean up
madsbk Sep 14, 2022
bd16d6d
Removed expose_counter
madsbk Sep 14, 2022
a18c963
buffer: implements getitem()
madsbk Sep 15, 2022
69cee88
from_column_view(): fix access to ._ptr
madsbk Sep 15, 2022
b9a86b6
Renamed getitem() => _getitem()
madsbk Sep 20, 2022
e1b5b6f
typo
madsbk Sep 20, 2022
6ea6898
moved spill_to_device_limit() calls to the manager
madsbk Sep 20, 2022
e6ac1f3
spill_to_device_limit(): count others
madsbk Sep 20, 2022
0c28913
from_column_view(): expose data_owner permanently when returning a Bu…
madsbk Sep 20, 2022
7afdc16
Only use lookup_address_range() as a sanity check
madsbk Sep 21, 2022
3aa71b4
Avoid registrering multiple OOM handles
madsbk Sep 21, 2022
a017540
removed register_spill_on_demand()
madsbk Sep 21, 2022
e572f80
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Sep 21, 2022
75c593e
Merge branch 'cudf_spilling' of https://github.com/madsbk/cudf into c…
shwina Sep 21, 2022
080bbf6
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Sep 22, 2022
abb6663
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
shwina Sep 22, 2022
da9cbd9
Merge branch 'cudf_spilling' of https://github.com/madsbk/cudf into c…
shwina Sep 22, 2022
9064066
Try adding some print
shwina Sep 22, 2022
90684df
Style
shwina Sep 22, 2022
fa0d286
conda list
shwina Sep 23, 2022
dbf0531
conda list -e
shwina Sep 23, 2022
7000824
Use as_device_buffer_like
shwina Sep 23, 2022
232669c
No xdist
shwina Sep 23, 2022
972b072
Add more prints and make sure CI outputs.
vyasr Sep 24, 2022
7b4c4d4
Don't release the GIL?
shwina Sep 26, 2022
f2d6f27
Try syncing the stream
shwina Sep 26, 2022
d729405
sort?
shwina Sep 26, 2022
5394420
Sync after, not before?
shwina Sep 26, 2022
7c0d3a5
Don't run libcudf tests
shwina Sep 26, 2022
1c5b352
Comment out most of to_string_view_array
shwina Sep 26, 2022
399b0ed
Add back call to cpp_to_string_view_array
shwina Sep 26, 2022
0332bd0
Just print the size of the column_view
shwina Sep 26, 2022
f71ab7f
Revert "Just print the size of the column_view"
shwina Sep 27, 2022
f33e4f5
Revert "Add back call to cpp_to_string_view_array"
shwina Sep 27, 2022
6f1ee1a
Revert "Comment out most of to_string_view_array"
shwina Sep 27, 2022
1d27b8a
Revert "Don't run libcudf tests"
shwina Sep 27, 2022
e529dc5
Revert "Sync after, not before?"
shwina Sep 27, 2022
a4ace29
Revert "sort?"
shwina Sep 27, 2022
5146546
Revert "Try syncing the stream"
shwina Sep 27, 2022
3bbc258
Revert "Don't release the GIL?"
shwina Sep 27, 2022
ca38355
Revert "Add more prints and make sure CI outputs."
shwina Sep 27, 2022
701419f
Revert "No xdist"
shwina Sep 27, 2022
a9a5d07
Revert "Use as_device_buffer_like"
shwina Sep 27, 2022
ca6c2ae
Revert "conda list -e"
shwina Sep 27, 2022
9377f62
Revert "conda list"
shwina Sep 27, 2022
3bf8561
Revert "Style"
shwina Sep 27, 2022
6c6f716
Revert "Try adding some print"
shwina Sep 27, 2022
bdee374
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
shwina Sep 27, 2022
60cd589
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into cudf_spi…
madsbk Sep 28, 2022
52e7a31
reorg
madsbk Sep 28, 2022
04a7a79
more reorg
madsbk Sep 28, 2022
6603165
Use as_device_buffer_like() always
madsbk Sep 29, 2022
b902834
fixed import typo
madsbk Sep 29, 2022
3b79e24
Buffer(): raise FutureWarning when given an integer
madsbk Oct 3, 2022
b58c6d0
removed "others"
madsbk Oct 3, 2022
38a5a3a
Moved the global manager to the module level
madsbk Oct 3, 2022
5e3c49e
Rename SpillableBufferView => SpillableBufferSlice
madsbk Oct 3, 2022
b769f15
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into cudf_spi…
madsbk Oct 3, 2022
9b5e5bc
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into cudf_spi…
madsbk Oct 4, 2022
d9803ec
CI: use --cov-append --cov=cudf
madsbk Oct 5, 2022
d7a0439
Impl. and use ensure_buffer_like()
madsbk Oct 6, 2022
2a5bd58
Impl. SpillableBufferSlice.deserialize()
madsbk Oct 6, 2022
7130088
doc
madsbk Oct 6, 2022
df750bc
impl. Statistics
madsbk Oct 10, 2022
89f9498
Statistics: impl. expose stats
madsbk Oct 10, 2022
97c2cd6
SpillableBuffer(): avoid copy of __array_interface__ input
madsbk Oct 11, 2022
3ffbe91
SpillableBufferSlice(): use view size when returning a memoryview
madsbk Oct 11, 2022
2e8f588
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into cudf_spi…
madsbk Oct 12, 2022
426fa7d
Impl. with_spill_lock()
madsbk Oct 14, 2022
c2b9b5c
with_spill_lock: impl. the read_only_columns argument
madsbk Oct 14, 2022
02fe12f
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into cudf_spi…
madsbk Oct 24, 2022
ec63775
remove integer argument warnings
madsbk Oct 24, 2022
c9ae32d
revert Copyright update
madsbk Oct 24, 2022
0eab0d3
with_spill_lock: quick return, if spilling is disabled.
madsbk Oct 24, 2022
a172627
Typo
madsbk Oct 25, 2022
34db7bc
Merge branch 'cudf_spilling' of github.com:madsbk/cudf into cudf_spil…
madsbk Oct 26, 2022
5882d4f
library-design: added "Spilling to host memory"
madsbk Oct 26, 2022
769f37f
added test_get_spill_lock_no_manager()
madsbk Oct 26, 2022
9b6b87f
clean up
madsbk Oct 26, 2022
64ac420
host memory cannot be exposed
madsbk Oct 26, 2022
fe3095f
spill_to_device_limit(): doc
madsbk Oct 26, 2022
508309c
renamed to `[get|set]_global_manager()`
madsbk Oct 26, 2022
3a66e1d
typing
madsbk Oct 26, 2022
e31cc6a
clean up
madsbk Oct 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ else
# copied by CI from the upstream 11.5 jobs into $CONDA_ARTIFACT_PATH
gpuci_logger "Installing cudf, dask-cudf, cudf_kafka, and custreamz"
gpuci_mamba_retry install cudf dask-cudf cudf_kafka custreamz -c "${CONDA_BLD_DIR}" -c "${CONDA_ARTIFACT_PATH}"

gpuci_logger "Check current conda environment"
conda list --show-channel-urls

Expand Down Expand Up @@ -282,6 +282,10 @@ conda list
gpuci_logger "Python py.test for cuDF"
py.test -n 8 --cache-clear --basetemp="$WORKSPACE/cudf-cuda-tmp" --ignore="$WORKSPACE/python/cudf/cudf/benchmarks" --junitxml="$WORKSPACE/junit-cudf.xml" -v --cov-config="$WORKSPACE/python/cudf/.coveragerc" --cov=cudf --cov-report=xml:"$WORKSPACE/python/cudf/cudf-coverage.xml" --cov-report term --dist=loadscope tests

gpuci_logger "Python py.tests for cuDF with spilling (CUDF_SPILL_DEVICE_LIMIT=1)"
# Due to time concerns, we only run a limited set of tests
CUDF_SPILL=on CUDF_SPILL_DEVICE_LIMIT=1 py.test -n 8 --cache-clear --basetemp="$WORKSPACE/cudf-cuda-tmp" --ignore="$WORKSPACE/python/cudf/cudf/benchmarks" -v --cov-config="$WORKSPACE/python/cudf/.coveragerc" --cov-append --cov=cudf --cov-report=xml:"$WORKSPACE/python/cudf/cudf-coverage.xml" --cov-report term --dist=loadscope tests/test_binops.py tests/test_dataframe.py tests/test_buffer.py tests/test_onehot.py tests/test_reshape.py

cd "$WORKSPACE/python/dask_cudf"
gpuci_logger "Python py.test for dask-cudf"
py.test -n 8 --cache-clear --basetemp="$WORKSPACE/dask-cudf-cuda-tmp" --junitxml="$WORKSPACE/junit-dask-cudf.xml" -v --cov-config=.coveragerc --cov=dask_cudf --cov-report=xml:"$WORKSPACE/python/dask_cudf/dask-cudf-coverage.xml" --cov-report term dask_cudf
Expand All @@ -290,7 +294,6 @@ cd "$WORKSPACE/python/custreamz"
gpuci_logger "Python py.test for cuStreamz"
py.test -n 8 --cache-clear --basetemp="$WORKSPACE/custreamz-cuda-tmp" --junitxml="$WORKSPACE/junit-custreamz.xml" -v --cov-config=.coveragerc --cov=custreamz --cov-report=xml:"$WORKSPACE/python/custreamz/custreamz-coverage.xml" --cov-report term custreamz


# only install strings_udf after cuDF is finished testing without its presence
gpuci_logger "Installing strings_udf"
gpuci_mamba_retry install strings_udf -c "${CONDA_BLD_DIR}" -c "${CONDA_ARTIFACT_PATH}"
Expand Down
22 changes: 14 additions & 8 deletions docs/cudf/source/developer_guide/library_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,24 @@ Finally we tie these pieces together to provide a more holistic view of the proj
% class RangeIndex
% class DataFrame
% class Series
%
%
% Frame <|-- IndexedFrame
%
%
% Frame <|-- SingleColumnFrame
%
%
% SingleColumnFrame <|-- Series
% IndexedFrame <|-- Series
%
%
% IndexedFrame <|-- DataFrame
%
%
% BaseIndex <|-- RangeIndex
%
%
% BaseIndex <|-- MultiIndex
% Frame <|-- MultiIndex
%
%
% BaseIndex <|-- GenericIndex
% SingleColumnFrame <|-- GenericIndex
%
%
% @enduml


Expand Down Expand Up @@ -212,6 +212,12 @@ Conversely, when constructed from a host object,
The data is then copied from the host object into the newly allocated device memory.
You can read more about [device memory allocation with RMM here](https://github.com/rapidsai/rmm).

### Spilling to host memory

Setting the environment variable `CUDF_SPILL=on` enables automatic spilling (and "unspilling") of buffers from
device to host to enable out-of-memory computation, i.e., computing on objects that occupy more memory than is
available on the GPU.

## The Cython layer

The lowest level of cuDF is its interaction with `libcudf` via Cython.
Expand Down
19 changes: 12 additions & 7 deletions python/cudf/cudf/_lib/binaryop.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ from cudf._lib.cpp.types cimport data_type, type_id
from cudf._lib.types cimport dtype_to_data_type, underlying_type_t_type_id

from cudf.api.types import is_scalar, is_string_dtype
from cudf.core.buffer.spillable_buffer import SpillLock

cimport cudf._lib.cpp.binaryop as cpp_binaryop
from cudf._lib.cpp.binaryop cimport binary_operator

import cudf


Expand Down Expand Up @@ -102,9 +104,9 @@ class BinaryOperation(IntEnum):

cdef binaryop_v_v(Column lhs, Column rhs,
binary_operator c_op, data_type c_dtype):
cdef column_view c_lhs = lhs.view()
cdef column_view c_rhs = rhs.view()

slock = SpillLock()
cdef column_view c_lhs = lhs.view(slock)
cdef column_view c_rhs = rhs.view(slock)
cdef unique_ptr[column] c_result

with nogil:
Expand All @@ -122,7 +124,8 @@ cdef binaryop_v_v(Column lhs, Column rhs,

cdef binaryop_v_s(Column lhs, DeviceScalar rhs,
binary_operator c_op, data_type c_dtype):
cdef column_view c_lhs = lhs.view()
slock = SpillLock()
cdef column_view c_lhs = lhs.view(slock)
cdef const scalar* c_rhs = rhs.get_raw_ptr()

cdef unique_ptr[column] c_result
Expand All @@ -142,7 +145,8 @@ cdef binaryop_v_s(Column lhs, DeviceScalar rhs,
cdef binaryop_s_v(DeviceScalar lhs, Column rhs,
binary_operator c_op, data_type c_dtype):
cdef const scalar* c_lhs = lhs.get_raw_ptr()
cdef column_view c_rhs = rhs.view()
slock = SpillLock()
cdef column_view c_rhs = rhs.view(slock)

cdef unique_ptr[column] c_result

Expand Down Expand Up @@ -213,8 +217,9 @@ def binaryop_udf(Column lhs, Column rhs, udf_ptx, dtype):
has to be specified in `dtype`, a numpy data type.
Currently ONLY int32, int64, float32 and float64 are supported.
"""
cdef column_view c_lhs = lhs.view()
cdef column_view c_rhs = rhs.view()
slock = SpillLock()
cdef column_view c_lhs = lhs.view(slock)
cdef column_view c_rhs = rhs.view(slock)

cdef type_id tid = (
<type_id> (
Expand Down
12 changes: 8 additions & 4 deletions python/cudf/cudf/_lib/column.pxd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.

from libcpp cimport bool
from libcpp.memory cimport unique_ptr
Expand All @@ -23,12 +23,16 @@ cdef class Column:
cdef object _mask
cdef object _null_count

cdef column_view _view(self, size_type null_count) except *
cdef column_view view(self) except *
cdef column_view _view(
self, size_type null_count, spill_lock
) except *
cdef column_view view(self, spill_lock=*) except *
cdef mutable_column_view mutable_view(self) except *

@staticmethod
cdef Column from_unique_ptr(unique_ptr[column] c_col)
cdef Column from_unique_ptr(
unique_ptr[column] c_col, bint data_ptr_exposed=*
)

@staticmethod
cdef Column from_column_view(column_view, object)
Expand Down
Loading