Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syncing mechanism in raft-ann-bench C++ search #1961

Merged
merged 6 commits into from
Nov 5, 2023

Conversation

divyegala
Copy link
Member

No description provided.

@divyegala divyegala added bug Something isn't working non-breaking Non-breaking change labels Nov 4, 2023
@divyegala divyegala self-assigned this Nov 4, 2023
@github-actions github-actions bot added the cpp label Nov 4, 2023
@@ -340,6 +349,11 @@ void bench_search(::benchmark::State& state,
double actual_recall = static_cast<double>(match_count) / static_cast<double>(total_count);
state.counters.insert({{"Recall", actual_recall}});
}
std::cout << "Last thread about to acquire lock" << std::endl;
// std::unique_lock lk(init_mutex);
processed.store(false, std::memory_order_acq_rel);
Copy link
Contributor

@tfeher tfeher Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @divyegala, this goes in the good direction! This line is only executed as part of the conditional block if (state.thread_index() == state.threads() - 1), so the last thread resets the processed variable and therefore signals thread 0 that it can load the next index.

But do we have any guarantee that the last thread is executed last? I think we cannot assume that. If (for example) thread 4 is stalled and last thread finishes before that, then we have again run into a problem: thread 0 reloading index while thread 4 still using it.

I think we need counter to keep track how many threads have finished computation, and only reset the processed var once counter reaches state.threads()

Copy link
Member Author

@divyegala divyegala Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gbench guarantees that all threads enter the function at the same time, right? So thread 0 will not reload until all threads have finished computation. Is the all-thread synchronization only a one time thing at the start of the benchmark loop, or every single time bench_search is invoked by gbench threads? I assumed it was the latter. If not, I can add the counter logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cjnolet who might have a quick answer here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfeher I reworked the logic such that either case will work now IMO, whether thread sync has been achieved at the start of the function bench_search or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantees are outlined here. I guess it's not fully guaranteed that the threads will enter bench_search() at the same time. I wonder, though, could we just move the 2 lines of code that read the pointers written by thread0 into the benchmarks loop itself? That would then guarantee all the threads enter the benchmarks loop together. I can't imagine just reading these two variable would have an effect on perf. What do you guys think?

@divyegala if we do this right and do an atomic counter for the number of threads that have entered the sync portion, we will have effectively created our own barrier, which I think we could add to the code here and reuse. The major benefit to this is that once we do update to c++20, it'll be a single line code change to go from raft::barrier() to std::barrier().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjnolet I agree, please see my latest commit. Hopefully that seems like an iron-clad solution here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyegala you are right, the current implementation is sufficient. The next benchmark cannot start until all threads finished the current benchmark.

Note that this is not a documented feature of gbech, it is only visible if we look into look into the actual implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking that @tfeher. I changed the implementation anyway that will now work even if gbench didn't synchronize the threads before calling the function.

@divyegala divyegala marked this pull request as ready for review November 5, 2023 22:21
@divyegala divyegala requested a review from a team as a code owner November 5, 2023 22:21
@cjnolet
Copy link
Member

cjnolet commented Nov 5, 2023

/merge

@rapids-bot rapids-bot bot merged commit bafd2a8 into rapidsai:branch-23.12 Nov 5, 2023
57 checks passed
benfred pushed a commit to benfred/raft that referenced this pull request Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants