-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using intrinsics to optimize counting HyperLogLog trailing bits #846
Conversation
Signed-off-by: mwish <[email protected]>
61e0bc4
to
2e3178b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #846 +/- ##
============================================
- Coverage 70.59% 70.44% -0.16%
============================================
Files 112 114 +2
Lines 61512 61725 +213
============================================
+ Hits 43427 43481 +54
- Misses 18085 18244 +159
|
Signed-off-by: mwish <[email protected]>
311a9cd
to
4ea2bd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can you try to do some benchmarks? I guess it may not be measurable, or the improvement is negligible.
Minor adjustments. Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
fix format and trigger the extra ci Signed-off-by: Binbin <[email protected]>
I don't have end-to-end benchmark #include <random>
#include <cstdint>
#include <bit>
constexpr int n = 1000;
static void LoopBased(benchmark::State& state) {
std::mt19937 rng(0);
std::uniform_int_distribution<int64_t> uniform_dist(1, 1125899906842624);
std::vector<int64_t> value;
for (int i = 0; i < n; ++i) {
value.push_back(uniform_dist(rng));
}
int cnt = 0;
for (auto _ : state) {
int count = 1;
int64_t hash = value[cnt % n];
int bit = 1;
while ((hash & bit) == 0) {
count++;
bit <<= 1;
}
::benchmark::DoNotOptimize(count);
++cnt;
}
}
// Register the function as a benchmark
BENCHMARK(LoopBased);
static void CtzBased(benchmark::State& state) {
std::mt19937 rng(0);
std::uniform_int_distribution<int64_t> uniform_dist(1, 1125899906842624);
std::vector<int64_t> value;
for (int i = 0; i < n; ++i) {
value.push_back(uniform_dist(rng));
}
int cnt = 0;
for (auto _ : state) {
int count = 1;
int64_t hash = value[cnt % n];
count += std::countl_zero(static_cast<uint64_t>(hash));
::benchmark::DoNotOptimize(count);
++cnt;
}
}
BENCHMARK(CtzBased); Quickbench has some error message and I run it on my x86 3800x cpu with gcc12 and Release -O2
|
Also this is quickbench in x86 platform: https://quick-bench.com/q/7dlzmaBJDz9Xnfzhsk-1Iw7TP0I |
LGTM, @zuiderkwast @PingXie any other ideas before the merge? |
It might make sense to have a #define for __builtin_ctzll incase it's not supported in the compiler. |
#if defined(__clang__) || defined(__GNUC__)
return static_cast<int>(__builtin_clzll(value));
#elif defined(_MSC_VER)
unsigned long index;
i_BitScanReverse64(&index, value);
return 63 - static_cast<int>(index);
#else
int bitpos = 0;
while (value != 0) {
value >>= 1;
++bitpos;
}
return 64 - bitpos;
#endif I do this because server uses Line 1624 in 7424620
Line 836 in 7424620
I also find some code checks that: valkey/deps/hdr_histogram/hdr_histogram.c Line 164 in 7424620
So I don't know the idiom here :-( @madolson do you think something like this is ok? ( borrowed from https://github.com/apache/arrow/blob/b33f040640c7ccb3e6a8406e4d3158608c597025/cpp/src/arrow/util/bit_util.h#L198 ) |
Yeah, using
We need an indirection in
|
Signed-off-by: mwish <[email protected]>
7bb0823
to
fd6f283
Compare
Signed-off-by: mwish <[email protected]>
@PingXie I've add an ad-hoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: mwish <[email protected]>
You should be able to move this function to I would highly recommend moving any intrinsic function wrappers or their "emulation" to |
No, I don't mind. |
So, from the discussion here, I need add a My remaining questions:
|
2500f10
to
42849b0
Compare
Signed-off-by: mwish <[email protected]>
Signed-off-by: mwish <[email protected]>
3755406
to
7f13f22
Compare
@PingXie I've added the intrinsics.h now |
Signed-off-by: mwish <[email protected]>
Signed-off-by: mwish <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mapleFU!
6507ddc
to
df3ebca
Compare
Signed-off-by: mwish <[email protected]>
df3ebca
to
82a8038
Compare
@PingXie Comment resolved, mind take a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me. I do have a question about one of your comments but I don't think it would affect the correctness. Thanks for your patience @mapleFU!
@enjoy-binbin @madolson could we move forward and check in this? Or there're more things I need to do? |
I think @PingXie will merge it, although there are a bunch of test failures I am looking through. (Seem like some old issues, let me re-trigger them) |
Signed-off-by: Madelyn Olson <[email protected]>
There are still some test failures but I don't think they are related. |
Thanks all! |
Godbolt link: https://godbolt.org/z/3YPvxsr5s __builtin_ctz would generate shorter code than hand-written loop. --------- Signed-off-by: mwish <[email protected]> Signed-off-by: Binbin <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Godbolt link: https://godbolt.org/z/3YPvxsr5s __builtin_ctz would generate shorter code than hand-written loop. --------- Signed-off-by: mwish <[email protected]> Signed-off-by: Binbin <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Godbolt link: https://godbolt.org/z/3YPvxsr5s
__builtin_ctz would generate shorter code than hand-written loop.