-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark: CKMS slow + excessive memory #32
Comments
Here's the benchmark source code.
|
Here are the raw results from the benchmark on my Macbook. How do the algorithms scale with number of values?
How did we settle on "error" parameter for GK/CKMS?
How did we settle on "batch" and "max-size" parameters for TDigest?
|
I wrote a benchmark to test the performance of CKMS and GK.
Findings: CKMS error=0.0001 delivers better and faster results than the error=0.001 suggested in its doc-comment. However, CKMS doesn't have any "sweet spot" - over the entire range where CKMS is feasible, it's slower and more space-intensive than Gk and even than just blindly storing every single value. This is at odds with what I expected from the paper, and also with the claimed memory bounds, so I wonder if there's an implementation bug? (Also, if we can live with just P99, then "store the top 1% of values in a priority queue" is competitive up to 10M values!!)
Method: The benchmark does
ckms./gk.insert(value)
a number of times then obtains quantiles. I measured wall-time usingstd::time::Instant::now() / .elapsed()
, and I measured heap memory withstats_alloc::Region::new(&GLOBAL) / .change().bytes_allocated - bytes_deallocated + bytes_reallocated
. I ran it withcargo run --release
on my Macbook. I tried with a normal distribution in the range -0.5 to 1.5, and a pareto distribution in the range 5.0 to 20.0. As a baseline, I added another algorithm "ALL" which keeps every single value in memory - this tells me "perfect" expected values of min/P50/P99/max to judge how accurate GK/CKMS are, and there's no justification in taking more memory than this!VARYING "ERROR" PARAMETER... (1M values)
VARYING NUMBER OF VALUES... (error_gk=0.001, error_ckms=0.0001)
The text was updated successfully, but these errors were encountered: