-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support copy
and device
keywords in from_dlpack
#741
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Thanks Leo! Should I take over the first two commits in gh-602? |
Let's just review/merge #602 first, this branch was based on the one there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a timeline issue here as well? from_dlpack
should do nothing with the new copy
and device
keywords beyond forwarding them to the producer. So if the producer doesn't support that functionality yet, what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving two notes below.
First, in the call today we discussed one issue in this PR. When two libraries exchange data via copying, potentially across devices, there might not be a common stream
object that can be recognized and used by both libraries. The discussed solutions involve different ways of requesting synchronous copies (see HackMD).
See the 2nd note below.
In that discussion I was perhaps missing a more precise problem statement. Also note that that HackMD link isn't public, so here is what it says for potential solutions: @seberg suggested: The important thing is that @leofang suggested: piggyback on I had a look at what PyTorch does, and it's basically the same as @seberg's suggestion I believe. From https://pytorch.org/docs/master/notes/cuda.html#asynchronous-execution: In general, the effect of asynchronous computation is invisible to the caller, because (1) each device executes operations in the order they are queued, and (2) PyTorch automatically performs necessary synchronization when copying data between CPU and GPU or between two GPUs. Hence, computation will proceed as if every operation was executed synchronously. It goes on to explain a
Since we only want to enable copying to host (where there is no stream concept) at this moment, do we really need to go into this for the purposes of getting this PR in? It looks to me like this can be deferred. |
Well, there are two things we need, I guess:
Streams really should always have been fully type safe, but OK (which would resolve this, we could just say: one of them, figure it out based on type). Maybe I should change my proposal to, this: Streams belong to the target device unless the producer knows it cannot possibly belong to it (the stream is typed to be a |
I'd say that:
Since only
That sounds good to add. Needs to avoid the string |
Well, the more I think about it the more I think it makes sense to say: If That covers what we need, and in principle allows a Now there might be a point for more complex synchronization schemes in the future. But maybe that just needs a future extension. (The annoyance is that it would be nice to not need a try/except for such a future extension, but beggars can't be choosers) |
I see that you are continuing the discussion we had during the meeting on the subject
I think Sebastian is right to conclude that
The Given no major concern with this PR based on the discussions in the original issue, in the call, and above, let me proceed to make necessary changes to refine this PR.
Interesting, I didn't know that. I thought we just need to log in HackMD to view it? Footnotes
|
The new introspection API should allow that, but I agree that it's probably not worth it to go there.
No, HackMD has permissions. It's also transient; for any issue/PR with a relevant discussion we should be posting a summary on GitHub.
Sounds right. One thing that stood out to me related to this in this PR is that the "only CPU/host is supported" is part of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is starting to look close to ready - a few more comments.
In the later revision, this was clarified that a
Indeed, yes. I was thinking we want to limit user-visible effects. If array library maintainers think they can get creative in |
That's the part I am not sure... The contents in both the dlpack PR and this PR are cross referenced and intertwined. I am fine to go either way, once @tqchen has time to take a look and offer us feedbacks. I can also break the dlpack PR into two (one is to only add the new flag, another is updating the C/Python docs) if it helps, but I wouldn't do so unless Tianqi agrees that it would help.
Applied in commit 338742a. |
(Requested Aaron's review from the array-api-compat perspective.) |
Thanks for looping me in. Personally, I find there are two goals in here which are slightly intermingled
My main thought is that we should keep API support for G0 to be simple and ensure Because G1 might indeed opens up significant implementation overhead, as well as different possible ways of synchronization that I think would merit a different API( My main question is how to phase out the API, so most libraries can start with G0. But happy to hear about others thoughts as well |
Nobody is required to implement the cross device API, we could bump minor API (so consumers can know whether or not the flag should be set). Flagging For the Array-API, this is a bit different: because I think there the intention is to additionally require |
That is the right framing I think indeed.
I think a separate API would be more problematic to do, because:
I think one should always start there if that's an option. Does it really matter though if this is using |
Thanks @rgommers for the comments. I think as long as packages can phase their implementation status this would be fine. |
Thanks @tqchen @rgommers @seberg for the comments. Indeed, G0 (zero copy) has been the goal and we've achieved that since the first version of the standard. What's new with this PR is
|
Q: Does anyone know why the doc build has a warning?
I thought |
Those warnings are a real pain. I pushed a fix. For why it was needed here and not in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me now. It looks like all comments were addressed; I plan to merge this within the next day or two unless there are more comments.
Reminder to all: Remember to review dmlc/dlpack#136 too, which goes hand-in-hand with this PR. |
The accompanying DLPack PR was merged, and DLPack 1.0rc was released. Let's merge this. Thanks everyone for the discussion! |
Blocked by #602.Close #626. As discussed there, we expand the DLPack data exchange protocol to formalize copies. In particular, we allow two common needs:
copy
argument). The default is stillNone
("may copy") to preserve backward compatibility and align with other constructors (ex:asarray
).device
argument, plus settingcopy=True
)In principle, arbitrary cross-device copies could be allowed too, but the consensus in #626 was that limiting to device-to-host copies is enough for now. This also avoids the needs of
actualexhaustive library support and clarifying unforeseen semantics issues.This PR requires an accompanying PR to the DLPack repo (dmlc/dlpack#136), in order for the producer to signal if a copy has been made (through a new bit mask), so please review both together.