-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix syncing mechanism in raft-ann-bench
C++ search
#1961
Fix syncing mechanism in raft-ann-bench
C++ search
#1961
Conversation
@@ -340,6 +349,11 @@ void bench_search(::benchmark::State& state, | |||
double actual_recall = static_cast<double>(match_count) / static_cast<double>(total_count); | |||
state.counters.insert({{"Recall", actual_recall}}); | |||
} | |||
std::cout << "Last thread about to acquire lock" << std::endl; | |||
// std::unique_lock lk(init_mutex); | |||
processed.store(false, std::memory_order_acq_rel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @divyegala, this goes in the good direction! This line is only executed as part of the conditional block if (state.thread_index() == state.threads() - 1)
, so the last thread
resets the processed
variable and therefore signals thread 0
that it can load the next index.
But do we have any guarantee that the last thread
is executed last? I think we cannot assume that. If (for example) thread 4
is stalled and last thread
finishes before that, then we have again run into a problem: thread 0
reloading index while thread 4
still using it.
I think we need counter to keep track how many threads have finished computation, and only reset the processed
var once counter reaches state.threads()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gbench guarantees that all threads enter the function at the same time, right? So thread 0
will not reload until all threads have finished computation. Is the all-thread synchronization only a one time thing at the start of the benchmark loop, or every single time bench_search
is invoked by gbench threads? I assumed it was the latter. If not, I can add the counter logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cjnolet who might have a quick answer here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfeher I reworked the logic such that either case will work now IMO, whether thread sync has been achieved at the start of the function bench_search
or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guarantees are outlined here. I guess it's not fully guaranteed that the threads will enter bench_search()
at the same time. I wonder, though, could we just move the 2 lines of code that read the pointers written by thread0 into the benchmarks loop itself? That would then guarantee all the threads enter the benchmarks loop together. I can't imagine just reading these two variable would have an effect on perf. What do you guys think?
@divyegala if we do this right and do an atomic counter for the number of threads that have entered the sync portion, we will have effectively created our own barrier, which I think we could add to the code here and reuse. The major benefit to this is that once we do update to c++20, it'll be a single line code change to go from raft::barrier() to std::barrier().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjnolet I agree, please see my latest commit. Hopefully that seems like an iron-clad solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@divyegala you are right, the current implementation is sufficient. The next benchmark cannot start until all threads finished the current benchmark.
Note that this is not a documented feature of gbech, it is only visible if we look into look into the actual implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking that @tfeher. I changed the implementation anyway that will now work even if gbench didn't synchronize the threads before calling the function.
/merge |
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1961
No description provided.