[QST] Move `cudf.Buffer` to `rmm` #227

shwina · 2020-01-07T20:58:32Z

Question regarding moving cudf.Buffer to rmm:

rmm.DeviceBuffer is a Cython wrapper around the C++ class rmm::device_buffer.
cudf.Buffer more generally represents an untyped device memory allocation:

buf = Buffer(data=ptr, size=size, owner=python_obj)

# buf represents a device memory allocation
# with address `ptr`, of size `size` bytes,
# and keeps a reference to `python_obj`
# who is the owner of that memory.
# For RMM allocated memory, the owner
# is a `rmm.DeviceBuffer

A Buffer can be constructed from any object exposing __array_interface__ or __cuda_array_interface__, e.g., CuPy arrays, numpy arrays, etc.,

Does it make sense for Buffer to be moved to RMM?

The text was updated successfully, but these errors were encountered:

jakirkham · 2020-01-07T21:35:28Z

Current cuDF implementation is here for context.

jrhemstad · 2020-01-07T21:56:49Z

Granted it's on the Python level where I'm not too concerned, but this kinda sounds like scope creep to me.

My hope would be to keep RMM as narrowly focused as we can.

jakirkham · 2020-01-07T22:48:12Z

Maybe a little. Though am less worried about that personally. Where else would you imagine it living (if not RMM)?

jrhemstad · 2020-01-07T23:03:57Z

I'm a big fan of "Do One Thing". Scope creep is how libraries become complicated behemoths that are difficult to maintain and change.

This issue is similar to why I pushed back on #220. RMM shouldn't be concerned about device memory allocated by anything other than RMM. Anything beyond that is the concern of another layer or library.

I don't have sufficient Python expertise to know where Buffer should live, but I'm not a big fan of just tacking it onto RMM simply because it is convenient.

kkraus14 · 2020-01-09T20:51:01Z

From my view I'm not viewing this as tacking this onto rmm because it's convenient, in fact it's a little bit inconvenient if anything. I view this as managing memory that rapids libraries could potentially use, it just so happens that the memory wasn't allocated by the RMM allocator.

github-actions · 2021-02-16T17:29:44Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions · 2021-02-16T17:30:08Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

harrism · 2021-02-16T23:35:17Z

Can you just replace cudf.Buffer with rmm.DeviceBuffer?

shwina · 2021-02-17T00:52:58Z

Unfortunately no, because a cudf.Buffer could contain arbitrary device memory backed objects underneath it, such as CuPy/Numba arrays, whereas a rmm.DeviceBuffer is owning.

github-actions · 2021-03-19T00:59:00Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2021-11-18T18:01:31Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

harrism · 2021-11-18T22:14:30Z

Unfortunately no, because a cudf.Buffer could contain arbitrary device memory backed objects underneath it, such as CuPy/Numba arrays, whereas a rmm.DeviceBuffer is owning.

This sounds like what in C++ we call span.

jakirkham · 2021-11-18T23:21:14Z

Yeah that seems similar.

Basically cudf.Buffer is a non-owning view onto the memory. It holds a reference to the object that owns the memory.

vyasr · 2021-11-18T23:58:47Z

I hadn't seen this issue before, but I recently started thinking about this after a conversation with @shwina and @jrhemstad about GPU memory management in Python. It seems like what we're looking for is a generic gpumemoryview with similar semantics to Python's memoryview, i.e. a standard representation for a non-owning view into memory. This object would be agnostic to the source of the underlying allocation. Most libraries could operate on the view alone (similar to libcudf C++ algorithms) and Python's reference counting logic would handle issues of scope. If a library needed to allocate memory internally (e.g. upon creation of a cudf.DataFrame) it could create the required rmm buffers then immediately generate the view and store it under the dataframe's columns exactly as cudf.Buffer is used right now. This approach would allow us to think bigger than RAPIDS alone so that we could also naturally consume data owned by CuPy, PyTorch, TensorFlow, and others.

If we went this route there would be a number of open questions to address here. Such an object could work directly with __cuda_array_interface__, but that would differ from how Python currently works: a memoryview is designed to work with the Python buffer protocol at the C level, while numpy (and numpy-like array libraries) are what consume __array_interface__. On the other hand, developing a GPU analog to the buffer protocol is probably unnecessary given the degree to which the ecosystem has centralized around CAI, so that semantic inconsistency is probably preferable to trying to overengineer a C API. There's also the original question on this thread of where such an object might live. I would probably advocate for a standalone package at least to start with, but I could see a case for rolling it into CUDA Python or something related to it.

CC @gmarkall @leofang

jakirkham · 2021-11-19T00:15:08Z

Yeah we already did this in ucx-py with Array. We could refactor this out into a separate thing we depend on.

cc @pentschev @madsbk @quasiben

leofang · 2021-11-19T04:45:41Z

I have been devising an interface for C/C++ libraries to share a memory pool, and I would need a way to expose this interface all the way to Python so that users can set it from Python. This sounds like a good idea worth exploring.

jrhemstad · 2021-11-19T13:46:27Z

i.e. a standard representation for a non-owning view into memory.

I thought that was the purpose of CAI? What's the difference between gpumemoryview and CAI?

jakirkham · 2021-11-19T16:20:06Z

It can help to have something in Cython as it can expose Python & C APIs

vyasr · 2021-11-19T16:51:19Z

That's true, but in my view the distinction is a little more fundamental. CAI is purely descriptive, so objects implementing the CAI must always be converted into some internal representation like cudf.Buffer that can be operated on programmatically. A gpumemoryview would replace that so that various libraries don't need to reinvent the wheel. To @jrhemstad's (much earlier) point about scope creep, I agree that the idea of a generic view into memory is out of scope for RMM because the underlying allocation need not come from RMM. CAI was developed so that different libraries don't need to implement a standard API in order to be interoperable. Having a standalone object representation of the allocation represented by CAI would improve that interoperability and provide a standard interchange object rather than a format alone. As long as the object itself supported the CAI we could pass it to libraries that support CAI even if they were unaware of gpumemoryview and it would just work, but libraries could optionally build around gpumemoryview (instead of e.g. cudf.Buffer) and avoid needing to perform any preprocessing when provided one.

jakirkham · 2021-11-19T17:33:15Z

Yep I understand. The Array object in UCX-Py does both those things. We can rip it out into a stand alone library

vyasr · 2021-11-19T22:41:04Z

@jakirkham Sorry, didn't mean that as a correction, more as an extended explanation to answer @jrhemstad's question.

madsbk · 2021-11-22T13:58:56Z

Yep I understand. The Array object in UCX-Py does both those things. We can rip it out into a stand alone library

I am very much in favor of a stand alone library implementing something like Array and all its nice-to-have utility functions.

harrism · 2021-11-22T21:26:15Z

Happy my comment helped kickstart this discussion again! Sounds like there is agreement on a direction forward.

github-actions · 2021-12-23T18:01:08Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

vyasr · 2022-01-05T00:44:38Z

I'm going to work on documenting next steps for this sometime soon.

github-actions · 2022-02-04T01:20:40Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-05-05T01:31:06Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

bdice · 2025-01-20T21:17:59Z

There is some recent work in cuda.core on StridedMemoryView that seems relevant to this issue.

NVIDIA/cuda-python#87
NVIDIA/cuda-python#180
https://github.com/NVIDIA/cuda-python/issues?q=stridedmemoryview

shwina added the doc Documentation label Jan 7, 2020

shwina changed the title ~~[FEA] Move cudf.Buffer to rmm~~ [QST] Move cudf.Buffer to rmm Jan 7, 2020

shwina added question Further information is requested and removed doc Documentation labels Jan 7, 2020

jakirkham mentioned this issue Jan 8, 2020

[REVIEW] Python redesign for libcudf++ rapidsai/cudf#3254

Merged

2 tasks

github-actions bot added the inactive-90d label Feb 16, 2021

github-actions bot added the inactive-30d label Feb 16, 2021

github-actions bot removed inactive-30d inactive-90d labels Feb 17, 2021

github-actions bot added the inactive-30d label Mar 19, 2021

github-actions bot added the inactive-90d label Nov 18, 2021

github-actions bot removed inactive-90d inactive-30d labels Nov 18, 2021

gmarkall mentioned this issue Dec 6, 2021

Compare DeviceArray and CuPy arrays numba/numba#7624

Open

github-actions bot added the inactive-30d label Dec 23, 2021

jakirkham removed the inactive-30d label Jan 4, 2022

vyasr self-assigned this Jan 5, 2022

github-actions bot added the inactive-30d label Feb 4, 2022

github-actions bot added the inactive-90d label May 5, 2022

vyasr mentioned this issue May 17, 2022

Buffer: make .ptr read-only rapidsai/cudf#10872

Merged

shwina mentioned this issue Aug 9, 2022

[REVIEW] Refactor the Buffer class rapidsai/cudf#11447

Merged

3 tasks

jarmak-nv added this to RMM Project Board Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Move `cudf.Buffer` to `rmm` #227

[QST] Move `cudf.Buffer` to `rmm` #227

shwina commented Jan 7, 2020

jakirkham commented Jan 7, 2020

jrhemstad commented Jan 7, 2020

jakirkham commented Jan 7, 2020

jrhemstad commented Jan 7, 2020 •

edited

Loading

kkraus14 commented Jan 9, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

harrism commented Feb 16, 2021

shwina commented Feb 17, 2021 •

edited

Loading

github-actions bot commented Mar 19, 2021

github-actions bot commented Nov 18, 2021

harrism commented Nov 18, 2021

jakirkham commented Nov 18, 2021

vyasr commented Nov 18, 2021 •

edited

Loading

jakirkham commented Nov 19, 2021

leofang commented Nov 19, 2021

jrhemstad commented Nov 19, 2021

jakirkham commented Nov 19, 2021

vyasr commented Nov 19, 2021

jakirkham commented Nov 19, 2021

vyasr commented Nov 19, 2021

madsbk commented Nov 22, 2021

harrism commented Nov 22, 2021

github-actions bot commented Dec 23, 2021

vyasr commented Jan 5, 2022

github-actions bot commented Feb 4, 2022

github-actions bot commented May 5, 2022

bdice commented Jan 20, 2025

[QST] Move cudf.Buffer to rmm #227

[QST] Move cudf.Buffer to rmm #227

Comments

shwina commented Jan 7, 2020

jakirkham commented Jan 7, 2020

jrhemstad commented Jan 7, 2020

jakirkham commented Jan 7, 2020

jrhemstad commented Jan 7, 2020 • edited Loading

kkraus14 commented Jan 9, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

harrism commented Feb 16, 2021

shwina commented Feb 17, 2021 • edited Loading

github-actions bot commented Mar 19, 2021

github-actions bot commented Nov 18, 2021

harrism commented Nov 18, 2021

jakirkham commented Nov 18, 2021

vyasr commented Nov 18, 2021 • edited Loading

jakirkham commented Nov 19, 2021

leofang commented Nov 19, 2021

jrhemstad commented Nov 19, 2021

jakirkham commented Nov 19, 2021

vyasr commented Nov 19, 2021

jakirkham commented Nov 19, 2021

vyasr commented Nov 19, 2021

madsbk commented Nov 22, 2021

harrism commented Nov 22, 2021

github-actions bot commented Dec 23, 2021

vyasr commented Jan 5, 2022

github-actions bot commented Feb 4, 2022

github-actions bot commented May 5, 2022

bdice commented Jan 20, 2025

[QST] Move `cudf.Buffer` to `rmm` #227

[QST] Move `cudf.Buffer` to `rmm` #227

jrhemstad commented Jan 7, 2020 •

edited

Loading

shwina commented Feb 17, 2021 •

edited

Loading

vyasr commented Nov 18, 2021 •

edited

Loading