-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add block synchronisation after reset of CG-level counters #337
Conversation
This is needed because if we go round loops again, we might reader before things have been reset.
Pull requests from external contributors require approval from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure all of these syncs are necessary, but have commented on my rationale for them.
@wence- Thanks for putting up the PR. Is this ready for review? |
I think so. I think I don't understand fully the semantics of some of the early returns in some of the device functions, but I think these bits are right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take a closer look at the proposed changes and find they might not be necessary since CG vote functions imply synchronizations. @wence- Does the race check pass after adding those syncs?
The modifications in kernels.cuh (to sync after the loop) are the ones necessary to pass race check. The modifications in device_view_impl.inl are not necessary for that test, but I think still necessary for correctness (I've responded to your comments to explain why I think they are necessary: basically, the warp-level vote functions don't provide memory barriers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- You are right. I was missing the point that warp vote functions don't communicate through memory and they don't guarantee any memory ordering.
To uncover more similar issues like this, I build your branch locally and run compute-sanitizer
on multimap unit tests:
compute-sanitizer --tool racecheck ./STATIC_MULTIMAP_TEST
The race check shows that there are 13 warnings with the current code. We can fix them all in this PR if you have the bandwidth. Otherwise, I will merge the PR as it is and put up follow-up PRs to solve the issue. I'm happy with either option.
Let me have a go next week... |
I have pushed fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
This is needed because if we go round loops again, we might read before things have been reset.