Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Better CUDF/Nvstrings Spill over to Disk/Memory #99

Closed
VibhuJawa opened this issue Jul 30, 2019 · 17 comments
Closed

[FEA] Better CUDF/Nvstrings Spill over to Disk/Memory #99

VibhuJawa opened this issue Jul 30, 2019 · 17 comments

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Jul 30, 2019

[FEA] Better CUDF/Nvstrings Spill over to Disk/Memory

We still have workflows that are limited by better spill over with cudf as it currently only works with limited workflows.

A example where spill over fails is: #65 (comment)

According to #65 (comment), we need more changes on cudf side to support this.

We now expose the device memory used with nvstrings: rapidsai/custrings#395

Can you please list the changes we still require so that we can track and get them completed asap to unblock these workflows and enable a better spill over.

CC: @pentschev
CC: @randerzander

@pentschev
Copy link
Member

We now expose the device memory used with nvstrings

On the dask side, knowing how much memory is used isn't sufficient. It still can't spill that memory to host, since it has no control over it.

Can you please list the changes we still require so that we can track and get them completed asap to unblock these workflows and enable a better spill over.

I see two options here:

  1. Libraries should take care of spilling the memory themselves (may be less robust, libraries won't know about memory used by other libraries); or
  2. Expose internal C++ memory to Python and allow Python libraries (such as Dask) to take control of that memory. This is certainly a major change to RAPIDS libraries in general.

I've just opened rapidsai/rmm#112 with a proposal on option 2. Note that this is would be a major change, and will probably take some time (assuming the proposal is acceptable).

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

Can't we do the same as we currently do for other cudf datatypes and cupy arrays?

We don't need to manage the GPU memory. We just need to be able to copy it to host. (I think?)

From

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

Can you please list the changes we still require so that we can track and get them completed asap to unblock these workflows and enable a better spill over.

I think that you just need to implement serialize/deserialize methods on these objects. This methods already exist on current cudf objects like Series, Column, Buffer, and so on, and so there might be good models for you there. I think that @quasiben has also been trying to clean this code up a bit as well.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

so that we can track and get them completed asap

My apologies for the delay here by the way

@pentschev
Copy link
Member

No, the problem here regards the memory on the C++ side, which is not at all exposed to Python, Dask can't serialize what it doesn't know about.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

To be clear, Dask isn't serializing things here, cuDF is.

Is it possible for cuDF or nvstrings to return something like a Numba device array with the relevant bytes, along with perhaps a small Python dictionary with metadata if necessary?

@pentschev
Copy link
Member

Is it possible for cuDF or nvstrings to return something like a Numba device array with the relevant bytes, along with perhaps a small Python dictionary with metadata if necessary?

What do you mean by "return"? Do you mean returning something at the end of each cuDF call or give access to memory buffers to Python? If you mean the first, it will still have memory that is used for temporary computations that won't be ever known by Dask, and thus, won't be spilled (which is what we have today). For the latter situation, that's what the generic solution I'm proposing in rapidsai/rmm#112 is, permitting exposure of internal C++ memory to Python.

IMO, the way we expose it (via a Numba array or else) is irrelevant at this time.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

I mean having a .serialize method on nvstrings objects that returns a header and device-bytes objects, similar to how the other cudf .serialize methods do this. I'm inclined to let the cudf devs think about the right way to do this. I get the sense that they understand the development constraints here better than I do.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

cc @kkraus14

@pentschev
Copy link
Member

I mean having a .serialize method on nvstrings objects that returns a header and device-bytes objects, similar to how the other cudf .serialize methods do this.

Yes, and I'm talking, if it runs out-of-memory before that is returned, there's nothing Dask can do, and this is the case today.

I'm inclined to let the cudf devs think about the right way to do this. I get the sense that they understand the development constraints here better than I do.

That's also fine, I just thought would be useful to have a starting point for the discussion. I don't know if my RMM proposal is the best solution, but is probably one solution.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

Yes, and I'm talking, if it runs out-of-memory before that is returned, there's nothing Dask can do, and this is the case today.

That's fine with me. That's the case today anyway. I don't think that this has much to do with the concrete request here of supporting nvstrings in the same way that we support other data types.

@pentschev
Copy link
Member

That's fine with me. That's the case today anyway.

Indeed, that's the case today, and it limits the usability for @VibhuJawa.

I don't think that this has much to do with the concrete request here of supporting nvstrings in the same way that we support other data types.

Admittedly, I don't know much about nvstrings, but it seems to take a device pointer as input (chunk of a timeseries) and return some other device pointer. I think this is already generally supported, since a cuDF series/chunk is a Numba device array. The problem still seems to me that the footprint of nvstrings is too big, and a potential solution for that would be allowing the spilling of memory internally used by C++ implementation.

@mrocklin
Copy link
Contributor

mrocklin commented Aug 2, 2019

Perhaps. I get the sense that @VibhuJawa 's question is "what should I be asking for in order to get the same spill to disk behavior that I have with cudf, but with string columns"

I think that the answer to this question is, you should ask for cudf's serialize/deserialize methods work with string columns. I think that cudf devs will understand this request and will handle it from there.

@VibhuJawa
Copy link
Member Author

VibhuJawa commented Aug 3, 2019

Thanks a for the responses to this thread @mrocklin , @pentschev .

Sincerest apologies for not being clear enough regarding my question.

Perhaps. I get the sense that @VibhuJawa 's question is "what should I be asking for in order to get the same spill to disk behavior that I have with cudf, but with string columns"

@mrocklin , I believe serialize/deserialize all ready methods work with string columns, see link.
I should have been clearer in my request, apologies .

On the dask side, knowing how much memory is used isn't sufficient. It still can't spill that memory to host, since it has no control over it.

@pentschev , Thanks for clarifying this, this was essentially the root of my confusion. I wrongly assumed that dask could spill over memory to host when it knew how much memory was being used .

The problem still seems to me that the footprint of nvstrings is too big, and a potential solution for that would be allowing the spilling of memory internally used by C++ implementation.

I think this is a major problem too as we have non trivial amounts of intermittent memory spikes (2x in some cases) which makes spilling over much more difficult with strings .

Current Issues :

In general, to make spill over work in workflows with spill-over, we have to:

  • Reduce the device memory limit by a quite a bit.

    In both examples, we have to set 10,000 MiB on a 16,000 MiB card (about 62 percent)

We can also look at changing default spill fraction for cuda workers but this might have performance implications for non cudf users (cupy etc)

  • Use much smaller chunk sizes .

Both of these have considerable performance implications as well as makes usability difficult as one has to fiddle with both of them to tweak for performance or even make it work in some cases.

I've just opened rapidsai/rmm#112 with a proposal on option 2. Note that this is would be a major change, and will probably take some time (assuming the proposal is acceptable).

FWIW , I feel this is a step in the right direction to start a discussion around this topic.

CC: @randerzander

@jrhemstad
Copy link

jrhemstad commented Aug 6, 2019

Could you try configuring RMM to use managed memory and see how that works?

See: https://github.com/rapidsai/rmm/blob/dfa8740883735e57bc9ebb95ed56a1321141a8b0/README.md#handling-rmm-options-in-python-code

You would use

from librmm_cffi import librmm_config as rmm_cfg
rmm_cfg.use_managed_memory = true

before import cudf

For a managed memory pool, you can do:

from librmm_cffi import librmm_config as rmm_cfg
rmm_cfg.use_managed_memory = true
rmm_cfg.use_pool_allocator = True

@pentschev
Copy link
Member

This ended up to be mostly a duplicate of #57 , therefore I'm closing it in favor of that one. Please feel free to reopen if there's still anything here that is not comprised in #57 .

@jakirkham
Copy link
Member

Admittedly this was done a few weeks back, but it is worth noting that we improved serialization of cuDF objects with PR ( rapidsai/cudf#4101 ) and nvstrings objects specifically with PR ( rapidsai/cudf#4111 ). This avoids acquiring things like CUDA contexts, which slowed this down previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants