Add support for high frequency live observations #2

kvc0 · 2023-01-09T02:35:23Z

add benchmark for tdigest merge
update ordered-float dependency
add OnlineTdigest

Background

I want to use t-digests with grpc services. In a service rpc handler,
you typically encounter each measured event once per invocation.
Using the TDigest that is in lib for this is very costly unless
updates to the digest are amortized. Doing this manually for each
digest is onerous and error-prone; and it seems like a fairly common
need (in my experience).

The idea

Goals:

very simple to use
minimize memory usage
minimize extra allocations (going to use this with object-pool so maximize the effect)
minimize latency recording 1 observation of a value
allow reporting at any time
Send and Sync, internal mutability
less than 1/10 the latency of updating the digest each call

TDigest seems fine - its mutation functions are a little more "functional"
and return a copy. That's nice, but can be expensive when you're using it
as a tool to control memory cost in a server!
So let's add a wrapper around it: Support retrieving a TDigest at any time,
but make it convenient to record a number with optimally low overhead.

Storing the amortized records in an array lowers the allocation burden versus
a vector, and lets us bypass clear() to reuse. It costs bookkeeping for 1 index
but promises to give the most benefit possible from an object_pool::Pool and it
lowers the cost of the periodic flush to the backing tdigest (though only a small
amount).

Testing

Updating a TDigest for each invocation of a service api costs around
840ns according to some simple Criterion benchmarking:

observe 1 via merge     time:   [832.06 ns 836.12 ns 840.71 ns]
                        change: [-0.3243% +1.6089% +3.2155%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Results

for plain usage like:

let digest = tdigest::online::OnlineTdigest::default();

[...]

for i in 1..1000 {
  digest.observe(i);
}

You get performance that beats the goal at nearly 1/20 the latency of
the naive unary tdigest update:

observe 1 via online wrapper
                        time:   [42.548 ns 42.801 ns 43.074 ns]
                        change: [-4.4928% -1.7819% +0.4381%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

If you have mut you can further improve to better than 1/25 the naive
strategy for unary tdigest updates:

observe 1 via online wrapper &mut
                        time:   [33.226 ns 33.430 ns 33.650 ns]
                        change: [-1.6854% -0.0649% +1.4163%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

* add benchmark for tdigest merge * update ordered-float dependency * add OnlineTdigest # Background I want to use t-digests with grpc services. In a service rpc handler, you typically encounter each measured event once per invocation. Using the TDigest that is in `lib` for this is very costly unless updates to the digest are amortized. Doing this manually for each digest is onerous and error-prone; and it seems like a fairly common need (in my experience). # The idea Goals: * very simple to use * minimize memory usage * minimize extra allocations (going to use this with object-pool so maximize the effect) * minimize latency recording 1 observation of a value * allow reporting at any time * Send and Sync, internal mutability * less than 1/10 the latency of updating the digest each call TDigest seems fine - its mutation functions are a little more "functional" and return a copy. That's nice, but can be expensive when you're using it as a tool to control memory cost in a server! So let's add a wrapper around it: Support retrieving a TDigest at any time, but make it convenient to record a number with optimally low overhead. Storing the amortized records in an array lowers the allocation burden versus a vector, and lets us bypass clear() to reuse. It costs bookkeeping for 1 index but promises to give the most benefit possible from an object_pool::Pool and it lowers the cost of the periodic flush to the backing tdigest (though only a small amount). # Testing Updating a TDigest for each invocation of a service api costs around 840ns according to some simple Criterion benchmarking: ``` observe 1 via merge time: [832.06 ns 836.12 ns 840.71 ns] change: [-0.3243% +1.6089% +3.2155%] (p = 0.08 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe ``` # Results for plain usage like: ```rust let digest = tdigest::online::OnlineTdigest::default(); [...] for i in 1..1000 { digest.observe(i); } ``` You get performance that beats the goal at nearly 1/20 the latency of the naive unary tdigest update: ``` observe 1 via online wrapper time: [42.548 ns 42.801 ns 43.074 ns] change: [-4.4928% -1.7819% +0.4381%] (p = 0.18 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe ``` If you have `mut` you can further improve to better than 1/25 the naive strategy for unary tdigest updates: ``` observe 1 via online wrapper &mut time: [33.226 ns 33.430 ns 33.650 ns] change: [-1.6854% -0.0649% +1.4163%] (p = 0.94 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe ```

kvc0 · 2023-01-09T04:10:58Z

I tried porting the u64 for counts change over in https://github.com/timescale/timescaledb-toolkit/blob/main/crates/t-digest/src/lib.rs . It actually slowed down all 3 benchmarks by 9-15%. This library is pretty trim - doing u64 -> f64 conversions for derived data adds significant runtime cost at the level of optimization in here! It's interesting to see what is better, what is worse and what doesn't matter at all with these benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for high frequency live observations #2

Add support for high frequency live observations #2

kvc0 commented Jan 9, 2023

kvc0 commented Jan 9, 2023

Add support for high frequency live observations #2

Are you sure you want to change the base?

Add support for high frequency live observations #2

Conversation

kvc0 commented Jan 9, 2023

Background

The idea

Testing

Results

kvc0 commented Jan 9, 2023