Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stream usage in segmented_gather() #9679

Merged

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Nov 13, 2021

detail::segmented_gather() inadvertently uses cuda_default_stream in some parts of its implementation, while using the user-specified stream in others.

This applies to the calls to copy_range_in_place(), allocate_like(), and make_lists_column(). This might produce race conditions, which might explain NVIDIA/spark-rapids/issues/4060. It's a rare failure that's quite hard to reproduce. This might lead to over-synchronization, though bad output is unlikely.

The commit here should sort this out, by switching to the detail APIs corresponding to the calls above.

@mythrocks mythrocks requested a review from a team as a code owner November 13, 2021 00:20
@mythrocks mythrocks self-assigned this Nov 13, 2021
@mythrocks mythrocks requested a review from jrhemstad November 13, 2021 00:20
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 13, 2021
@mythrocks mythrocks added bug Something isn't working non-breaking Non-breaking change labels Nov 13, 2021
@mythrocks mythrocks changed the title Fix thread-sync in segmented_gather() Fix stream usage in segmented_gather() Nov 13, 2021
@mythrocks
Copy link
Contributor Author

mythrocks commented Nov 13, 2021

Apologies. I realize it's late to include this in 21.12. It took me a little while to find.

I can move this to 22.02 if it's preferable.

@codecov
Copy link

codecov bot commented Nov 13, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@157a8c3). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-22.02   #9679   +/-   ##
==============================================
  Coverage                ?   8.71%           
==============================================
  Files                   ?     119           
  Lines                   ?   24691           
  Branches                ?       0           
==============================================
  Hits                    ?    2151           
  Misses                  ?   22540           
  Partials                ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 157a8c3...b242e94. Read the comment docs.

@hyperbolic2346
Copy link
Contributor

Thank you for doing this work and finding this issue.

@harrism
Copy link
Member

harrism commented Nov 15, 2021

Using the default stream should result in oversynchronization with non-default streams, not undersynchronization.

@mythrocks
Copy link
Contributor Author

I'll have to keep hunting for the list extraction problem.
In the meantime, maybe this PR should be redirected to the 22.02 release.

@mythrocks mythrocks changed the base branch from branch-21.12 to branch-22.02 November 23, 2021 21:57
@mythrocks
Copy link
Contributor Author

Rebased for branch-22.02.

@mythrocks
Copy link
Contributor Author

Rerun tests

1 similar comment
@mythrocks
Copy link
Contributor Author

Rerun tests

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 50acf07 into rapidsai:branch-22.02 Dec 2, 2021
@mythrocks
Copy link
Contributor Author

This change has now been merged. The description was modified to indicate that corrupted output is unlikely.
Thank you for the reviews, all.

@karthikeyann karthikeyann mentioned this pull request Dec 7, 2021
rapids-bot bot pushed a commit that referenced this pull request Dec 9, 2021
fix missing `stream` argument in default argument of functions.
And also in some cases, `mr` on returned objects creation.

This cleanup is done as a follow up after PR #9679

Almost all of libcudf functions usages of stream arg are cleaned up.
Missing `mr` still might need another clean up.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - https://github.com/nvdbaranec

URL: #9767
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants