Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

reduce cudaDeviceSynchronize calls #1255

Merged
merged 1 commit into from
Aug 17, 2020
Merged

Conversation

rongou
Copy link
Collaborator

@rongou rongou commented Aug 13, 2020

While profiling RAPIDS cuDF applications, I saw a lot of cudaDeviceSynchronize calls, which may be detrimental to multi-threaded clients using per-thread default stream. It seems to be a bug to be making these calls. This PR simply moves the logic in execute_on_stream_base to the base entry point so that all callers can benefit. With this change I no longer see cudaDeviceSynchronize calls in the application I'm profiling.

All tests passed locally.

@brycelelbach @allisonvacanti

@alliepiper
Copy link
Collaborator

Launched CI on 28935982.

@alliepiper alliepiper added testing: gpuCI passed Passed gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Aug 15, 2020
@alliepiper alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Aug 17, 2020
@alliepiper alliepiper merged commit 6727f2a into NVIDIA:main Aug 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants