Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Currently the IO writers aren't using pinned memory #4020

Closed
razajafri opened this issue Jan 31, 2020 · 4 comments
Closed

[FEA] Currently the IO writers aren't using pinned memory #4020

razajafri opened this issue Jan 31, 2020 · 4 comments
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@razajafri
Copy link
Contributor

Is your feature request related to a problem? Please describe.
We would like the writers to use pinned memory if possible before falling back to the default

Describe the solution you'd like
@jlowe: "provide an abstracted writer object API that is passed and used during writing to allocate output memory, or maybe output isn't directly placed contiguously in host memory but instead a vector of <addrspace, addr, size> tuples/structs are returned so the caller can fetch the data themselves. addrspace in this context is host vs. device memory."

@razajafri razajafri added feature request New feature or request Needs Triage Need team to review and classify labels Jan 31, 2020
@sameerz sameerz added the Spark Functionality that helps Spark RAPIDS label Jan 31, 2020
@OlivierNV
Copy link
Contributor

Seems related to 4019 (at least, it seems like an abstract interface would solve both issues).
Can the cpp layer call back an object whose functions are implemented on the cython or java side ?

@jlowe
Copy link
Member

jlowe commented Jan 31, 2020

Agree this is related to 4019. We just need more control over how the memory is allocated. I don't think it's as simple as a boolean "use pinned or not" concept. Pinned memory is a critical resource that often takes a long time to allocate, so we tend to pool it to amortize the cost. The caller doesn't know how big the output will be, so it can't always make a good decision up-front whether to use pinned memory or not. Sometimes the answer will be: "use pinned memory if it fits" or maybe even "use pinned memory for the parts that can fit".

Can the cpp layer call back an object whose functions are implemented on the cython or java side ?

I think that's a perfectly acceptable solution. Another alternative is to not copy the output data anywhere but simply inform the caller where the output data has been generated. For example, returning a vector of address,size pairs that when considered in order compose the desired output data. Then the caller can decide how to extract them, be that copy to pinned memory, transfer to another device, etc.

Both the callable object and the memory block descriptors approach address not only this request but also the ownership request in #4019. Therefore I propose we close this issue and add pinned memory to the discussion to #4019.

@harrism
Copy link
Member

harrism commented Feb 10, 2020

Also, @jrhemstad is working on a host_memory_resource for RMM that would allow allocating buffers of host pinned memory via RMM. I think a fix for this should use that functionality. rapidsai/rmm#260

@harrism harrism added cuIO cuIO issue RMM and removed Needs Triage Need team to review and classify labels Feb 10, 2020
@GregoryKimball
Copy link
Contributor

Closed by #4231 as discussed in #4019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

6 participants