Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block synchronisation after reset of CG-level counters #337

Merged
merged 2 commits into from
Jul 31, 2023

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Jul 26, 2023

This is needed because if we go round loops again, we might read before things have been reset.

This is needed because if we go round loops again, we might reader
before things have been reset.
@rapids-bot
Copy link

rapids-bot bot commented Jul 26, 2023

Pull requests from external contributors require approval from a NVIDIA organization member with write permissions or greater before CI can begin.

Copy link
Contributor Author

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure all of these syncs are necessary, but have commented on my rationale for them.

@PointKernel PointKernel added type: bug Something isn't working helps: rapids Helps or needed by RAPIDS topic: static_multimap Issue related to the static_multimap labels Jul 26, 2023
@PointKernel
Copy link
Member

@wence- Thanks for putting up the PR. Is this ready for review?

@wence-
Copy link
Contributor Author

wence- commented Jul 26, 2023

@wence- Thanks for putting up the PR. Is this ready for review?

I think so. I think I don't understand fully the semantics of some of the early returns in some of the device functions, but I think these bits are right

@PointKernel PointKernel marked this pull request as ready for review July 26, 2023 21:36
@PointKernel PointKernel added the Needs Review Awaiting reviews before merging label Jul 26, 2023
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a closer look at the proposed changes and find they might not be necessary since CG vote functions imply synchronizations. @wence- Does the race check pass after adding those syncs?

@wence-
Copy link
Contributor Author

wence- commented Jul 28, 2023

I take a closer look at the proposed changes and find they might not be necessary since CG vote functions imply synchronizations. @wence- Does the race check pass after adding those syncs?

The modifications in kernels.cuh (to sync after the loop) are the ones necessary to pass race check. The modifications in device_view_impl.inl are not necessary for that test, but I think still necessary for correctness (I've responded to your comments to explain why I think they are necessary: basically, the warp-level vote functions don't provide memory barriers).

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- You are right. I was missing the point that warp vote functions don't communicate through memory and they don't guarantee any memory ordering.

To uncover more similar issues like this, I build your branch locally and run compute-sanitizer on multimap unit tests:

 compute-sanitizer --tool racecheck ./STATIC_MULTIMAP_TEST

The race check shows that there are 13 warnings with the current code. We can fix them all in this PR if you have the bandwidth. Otherwise, I will merge the PR as it is and put up follow-up PRs to solve the issue. I'm happy with either option.

@wence-
Copy link
Contributor Author

wence- commented Jul 28, 2023

@wence- You are right. I was missing the point that warp vote functions don't communicate through memory and they don't guarantee any memory ordering.

To uncover more similar issues like this, I build your branch locally and run compute-sanitizer on multimap unit tests:

 compute-sanitizer --tool racecheck ./STATIC_MULTIMAP_TEST

The race check shows that there are 13 warnings with the current code. We can fix them all in this PR if you have the bandwidth. Otherwise, I will merge the PR as it is and put up follow-up PRs to solve the issue. I'm happy with either option.

Let me have a go next week...

@wence-
Copy link
Contributor Author

wence- commented Jul 31, 2023

I have pushed fixes. STATIC_MULTIMAP_TEST is now racecheck clean with cuda 11.8 and 12.2, it is not clean with 12.0 which I think means either there are compute-sanitizer false positives or else bugs in 12.0.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@PointKernel PointKernel merged commit fd7263c into NVIDIA:dev Jul 31, 2023
@wence- wence- deleted the wence/fix/issue-336 branch July 31, 2023 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS Needs Review Awaiting reviews before merging topic: static_multimap Issue related to the static_multimap type: bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: data race in static_multimap pair_retrieve
2 participants