Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry: remove per-update allocation & global lock #1

Conversation

hawkw
Copy link

@hawkw hawkw commented Jul 6, 2022

This commit makes some changes that should significantly reduce the
performance overhead of the OpenTelemetry metrics layer. In particular:

  • The current code will allocate a String with the metric name every
    time a metric value is recorded, even if this value already exists.
    This is in order to use the HashMap::entry API. However, the
    performance cost of allocating a String and copying the metric
    name's bytes into that string is almost certainly worse than
    performing the hashmap lookup a second time, and that overhead occurs
    every time a metric is recorded.

    This commit changes the code for recording metrics to perform hashmap
    lookups by reference. This way, in the common case (where the metric
    already exists), we won't allocate. The allocation only occurs when a
    new metric is added to the map, which is infrequent.

  • The current code uses a RwLock to protect the map of metrics.
    However, because the current code uses the HashMap::entry API,
    every metric update must acquire a write lock, since it may insert a
    new metric. This essentially reduces the RwLock to a Mutex ---
    since every time a value is recorded, we must acquire a write lock, we
    are forcing global synchronization on every update, the way a Mutex
    would. However, an OpenTelemetry metric can have its value updated
    through an &self reference (presumably metrics are represented as
    atomic values?). This means that the write lock is not necessary when
    a metric has already been recorded once, and multiple metric updates
    can occur without causing all threads to synchronize.

    This commit changes the code for updating metrics so that the read
    lock is acquired when attempting to look up a metric. If that metric
    exists, it is updated through the read lock, allowing multiple metrics
    to be updated without blocking every thread. In the less common case
    where a new metric is added, the write lock is acquired to update the
    hashmap.

  • Currently, a single RwLock guards the entire Instruments
    structure. This is unfortunate. Any given metric event will only touch
    one of the hashmaps for different metric types, so two distinct types
    of metric should be able to be updated at the same time. The big
    lock prevents this, as a global write lock is acquired that prevents
    any type of metric from being updated.

    This commit changes this code to use more granular locks, with one
    around each different metric type's HashMap. This way, updating (for
    example) an f64 counter does not prevent a u64 value recorder from
    being updated at the same time, even when both metrics are inserted
    into the map for the first time.

hawkw and others added 2 commits July 6, 2022 13:32
This commit makes some changes that should significantly reduce the
performance overhead of the OpenTelemetry metrics layer. In particular:

* The current code will allocate a `String` with the metric name _every_
  time a metric value is recorded, even if this value already exists.
  This is in order to use the `HashMap::entry` API. However, the
  performance cost of allocating a `String` and copying the metric
  name's bytes into that string is almost certainly worse than
  performing the hashmap lookup a second time, and that overhead occurs
  *every* time a metric is recorded.

  This commit changes the code for recording metrics to perform hashmap
  lookups by reference. This way, in the common case (where the metric
  already exists), we won't allocate. The allocation only occurs when a
  new metric is added to the map, which is infrequent.

* The current code uses a `RwLock` to protect the map of metrics.
  However, because the current code uses the `HashMap::entry` API,
  *every* metric update must acquire a write lock, since it may insert a
  new metric. This essentially reduces the `RwLock` to a `Mutex` ---
  since every time a value is recorded, we must acquire a write lock, we
  are forcing global synchronization on every update, the way a `Mutex`
  would. However, an OpenTelemetry metric can have its value updated
  through an `&self` reference (presumably metrics are represented as
  atomic values?). This means that the write lock is not necessary when
  a metric has already been recorded once, and multiple metric updates
  can occur without causing all threads to synchronize.

  This commit changes the code for updating metrics so that the read
  lock is acquired when attempting to look up a metric. If that metric
  exists, it is updated through the read lock, allowing multiple metrics
  to be updated without blocking every thread. In the less common case
  where a new metric is added, the write lock is acquired to update the
  hashmap.

* Currently, a *single* `RwLock` guards the *entire* `Instruments`
  structure. This is unfortunate. Any given metric event will only touch
  one of the hashmaps for different metric types, so two distinct types
  of metric *should* be able to be updated at the same time. The big
  lock prevents this, as a global write lock is acquired that prevents
  *any* type of metric from being updated.

  This commit changes this code to use more granular locks, with one
  around each different metric type's `HashMap`. This way, updating (for
  example) an `f64` counter does not prevent a `u64` value recorder from
  being updated at the same time, even when both metrics are inserted
  into the map for the first time.
@bryangarza bryangarza merged commit 73558c8 into bryangarza:opentelemetry-metrics Jul 7, 2022
@bryangarza bryangarza deleted the eliza/opentelemetry-metrics branch July 7, 2022 00:06
@bryangarza
Copy link
Owner

Thanks a lot for the feedback and the PR :) Especially the explanation about the RwLock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants