-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudaStreamSynchronize when a new device buffer is added to the spill framework #5485
Add cudaStreamSynchronize when a new device buffer is added to the spill framework #5485
Conversation
…n requirements Signed-off-by: Alessandro Bellina <[email protected]>
The fix 5ad4d43 helps with some queries I saw an obvious regression (NDS q20 specifically). This makes it so that contig split is followed by a sync in the write path for the RapidsShuffleManager. I am still not done assessing the performance impacts of this, I should have numbers Monday. |
All tests I have executed with this PR show results in the noise, so I do not see blockers. |
build |
ok I have related test failures, due to some mock object matchers I had used. I'll fix that soon |
build |
…ill framework (NVIDIA#5485) * Adds a stream synchronize in addBuffer to ensure we safely spill * Small cleanup in copyBuffer, add note about createBuffer synchronation requirements Signed-off-by: Alessandro Bellina <[email protected]> * Remove extra nvtx range * When adding contiguous_split buffers in RapidsShuffleManager, synchronize once * Fix RapidsShuffleTestHelper * Fix RapidsShuffleClientSuite
Closes #4818.
This is to fix the stream-ordered violation we could see currently in the spill framework: a buffer is allocated and acted on in stream A, and added to the store without any synchronization. Then stream B runs OOM and spills such buffer, not having waited on stream A.
The idea here is that if a device buffer is added to the store, we force in most cases a call to cudaStreamSynchronize, which should take care of the cases where we return from cuDF and have not stream synchronized there. There is a lot of stream synchronization in cuDF (via thrift usually) so it seems like this is an unlikely issue. That said, adding this synchronize, would be more robust as now the spill framework knows for a fact that the buffer can be freely copied (spilled).
Note that I added a flag to not synchronize in some cases, this is used for UCX since we synchronize on writes after compressing buffers, and also on reads after copying from the bounce buffer, so those two cases we can reliably say we have synchronized. I don't know of other cases where the synchronize in the spill framework can be skipped.
I have been measuring the impact of this in our benchmarks. I do not see any big differences so far in UCX or non-UCX runs. I am going to continue testing it today. Adding as WIP for comments.