opentelemetry: remove per-update allocation & global lock #1

hawkw · 2022-07-06T20:44:16Z

This commit makes some changes that should significantly reduce the
performance overhead of the OpenTelemetry metrics layer. In particular:

The current code will allocate a String with the metric name every
time a metric value is recorded, even if this value already exists.
This is in order to use the HashMap::entry API. However, the
performance cost of allocating a String and copying the metric
name's bytes into that string is almost certainly worse than
performing the hashmap lookup a second time, and that overhead occurs
every time a metric is recorded.

This commit changes the code for recording metrics to perform hashmap
lookups by reference. This way, in the common case (where the metric
already exists), we won't allocate. The allocation only occurs when a
new metric is added to the map, which is infrequent.
The current code uses a RwLock to protect the map of metrics.
However, because the current code uses the HashMap::entry API,
every metric update must acquire a write lock, since it may insert a
new metric. This essentially reduces the RwLock to a Mutex ---
since every time a value is recorded, we must acquire a write lock, we
are forcing global synchronization on every update, the way a Mutex
would. However, an OpenTelemetry metric can have its value updated
through an &self reference (presumably metrics are represented as
atomic values?). This means that the write lock is not necessary when
a metric has already been recorded once, and multiple metric updates
can occur without causing all threads to synchronize.

This commit changes the code for updating metrics so that the read
lock is acquired when attempting to look up a metric. If that metric
exists, it is updated through the read lock, allowing multiple metrics
to be updated without blocking every thread. In the less common case
where a new metric is added, the write lock is acquired to update the
hashmap.
Currently, a single RwLock guards the entire Instruments
structure. This is unfortunate. Any given metric event will only touch
one of the hashmaps for different metric types, so two distinct types
of metric should be able to be updated at the same time. The big
lock prevents this, as a global write lock is acquired that prevents
any type of metric from being updated.

This commit changes this code to use more granular locks, with one
around each different metric type's HashMap. This way, updating (for
example) an f64 counter does not prevent a u64 value recorder from
being updated at the same time, even when both metrics are inserted
into the map for the first time.

This commit makes some changes that should significantly reduce the performance overhead of the OpenTelemetry metrics layer. In particular: * The current code will allocate a `String` with the metric name _every_ time a metric value is recorded, even if this value already exists. This is in order to use the `HashMap::entry` API. However, the performance cost of allocating a `String` and copying the metric name's bytes into that string is almost certainly worse than performing the hashmap lookup a second time, and that overhead occurs *every* time a metric is recorded. This commit changes the code for recording metrics to perform hashmap lookups by reference. This way, in the common case (where the metric already exists), we won't allocate. The allocation only occurs when a new metric is added to the map, which is infrequent. * The current code uses a `RwLock` to protect the map of metrics. However, because the current code uses the `HashMap::entry` API, *every* metric update must acquire a write lock, since it may insert a new metric. This essentially reduces the `RwLock` to a `Mutex` --- since every time a value is recorded, we must acquire a write lock, we are forcing global synchronization on every update, the way a `Mutex` would. However, an OpenTelemetry metric can have its value updated through an `&self` reference (presumably metrics are represented as atomic values?). This means that the write lock is not necessary when a metric has already been recorded once, and multiple metric updates can occur without causing all threads to synchronize. This commit changes the code for updating metrics so that the read lock is acquired when attempting to look up a metric. If that metric exists, it is updated through the read lock, allowing multiple metrics to be updated without blocking every thread. In the less common case where a new metric is added, the write lock is acquired to update the hashmap. * Currently, a *single* `RwLock` guards the *entire* `Instruments` structure. This is unfortunate. Any given metric event will only touch one of the hashmaps for different metric types, so two distinct types of metric *should* be able to be updated at the same time. The big lock prevents this, as a global write lock is acquired that prevents *any* type of metric from being updated. This commit changes this code to use more granular locks, with one around each different metric type's `HashMap`. This way, updating (for example) an `f64` counter does not prevent a `u64` value recorder from being updated at the same time, even when both metrics are inserted into the map for the first time.

bryangarza · 2022-07-07T00:07:23Z

Thanks a lot for the feedback and the PR :) Especially the explanation about the RwLock

hawkw and others added 2 commits July 6, 2022 13:32

Merge branch 'opentelemetry-metrics' into eliza/opentelemetry-metrics

6743acb

bryangarza merged commit 73558c8 into bryangarza:opentelemetry-metrics Jul 7, 2022

bryangarza deleted the eliza/opentelemetry-metrics branch July 7, 2022 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentelemetry: remove per-update allocation & global lock #1

opentelemetry: remove per-update allocation & global lock #1

hawkw commented Jul 6, 2022

bryangarza commented Jul 7, 2022

opentelemetry: remove per-update allocation & global lock #1

opentelemetry: remove per-update allocation & global lock #1

Conversation

hawkw commented Jul 6, 2022

bryangarza commented Jul 7, 2022