Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Track memory footprint of metrics buckets [INGEST-1132] #1284

Merged
merged 8 commits into from
Jun 3, 2022

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jun 2, 2022

To be able to limit the memory footprint of metrics buckets in the aggregator, we need to keep track of the number of elements we store. Instead of measuring the actual memory consumption, we apply a simple model, roughly measuring the bytes needed to encode a bucket:

  • counter buckets: 8 bytes (f64)
  • set buckets: number of unique elements * 4 (f32)
  • distribution buckets: number of unique elements * 12 (f64 + u32)
  • gauge: 40 bytes (5 * f64)

To avoid iterating over all the buckets every time we want to query the memory footprint, we keep a map of counters per project key (plus one total count) that is incremented with the footprint delta on every insert.

Not in this PR: Enforcements of limits.

/// This is very similar to [`relative_size`], which can possibly be removed.
pub fn cost(&self) -> usize {
match self {
Self::Counter(c) => std::mem::size_of_val(c),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we talked about tracking memory allocation, but thinking about it more I would actually prefer to hardcode numbers here. If we accidentally increase the struct size, i don't think this should immediately and implicitly be reflected in abuse limits. If we change abuse limits and how they are calculated I think it's probably better to do so explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

/// because datastructures might have a memory overhead.
///
/// This is very similar to [`relative_size`], which can possibly be removed.
pub fn cost(&self) -> usize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if cost is the best name for what we're modeling here. Open for suggestions.

struct CostTracker {
total_cost: usize,
// Choosing a BTreeMap instead of a HashMap here, under the assumption that a BTreeMap
// is still more efficient for the number of project keys we store.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning is up for discussion.

let cost_before = bucket_value.cost();
value.merge_into(bucket_value)?;
let cost_after = bucket_value.cost();
added_cost = cost_after.saturating_sub(cost_before);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably optimize this by making merge_into return the actual cost delta, but I decided against it for the sake of simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to just add the cost of the single value here and have merge_into return whether or not something was added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that could work.

@jjbayer jjbayer marked this pull request as ready for review June 2, 2022 13:25
@jjbayer jjbayer requested a review from a team June 2, 2022 13:25
@jjbayer jjbayer merged commit acb94fa into master Jun 3, 2022
@jjbayer jjbayer deleted the feat/track-metrics-footprint branch June 3, 2022 07:49
/// This is very similar to [`BucketValue::relative_size`], which can possibly be removed.
pub fn cost(&self) -> usize {
match self {
Self::Counter(_) => 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding, use std::mem::size_of with typedefs for the values.

Also, note that this is wrong. The size of a bucket value is always size_of::<BucketValue> + the allocations that happen within. So a more correct implementation would be:

const DIST_SIZE: usize = mem::size_of::<f64>() + mem::size_of::<Count>();

let allocations = match self {
    Self::Counter(_) => 0,
    Self::Set(s) => s.len() * mem::size_of::<u32>(), // better to typedef this to `SetValue` now
    Self::Gauge(_) => 0,
    Self::Distribution(m) => m.internal_size() * DIST_SIZE,
};

mem::size_of::<Self>() + allocations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an implementation that used size_of (at least partially), but @untitaker argued that having an explicit model that has to be changed manually is better: #1284 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still hared-coded and explicit, it just doesn't use magic numbers to be more self-explanatory.

let cost_before = bucket_value.cost();
value.merge_into(bucket_value)?;
let cost_after = bucket_value.cost();
added_cost = cost_after.saturating_sub(cost_before);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to just add the cost of the single value here and have merge_into return whether or not something was added?

jjbayer added a commit that referenced this pull request Jun 7, 2022
#1284 introduced a cost model for measuring the memory footprint of
metrics buckets stored in the aggregator. It has two flaws:

- It did not take into account the fixed size overhead of a BucketValue
(only looked at the values inside).
- It did not take into account the
size overhead of storing the BucketKey. This PR attempts to fix both
issues.
jan-auer added a commit that referenced this pull request Jun 9, 2022
* master:
  ref(metrics): Stop logging relative bucket size (#1302)
  fix(metrics): Rename misnamed aggregator option (#1298)
  fix(server): Avoid a panic in the Sentry middleware (#1301)
  build: Update dependencies with known vulnerabilities (#1294)
  fix(metrics): Stop logging statsd metric per project key (#1295)
  feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] (#1287)
  fix(metrics): Track memory footprint more accurately (#1288)
  build(deps): Bump dependencies (#1293)
  feat(aws): Add relay-aws-extension crate which implements AWS extension as an actor (#1277)
  fix(meta): Update codeowners for the release actions (#1286)
  feat(metrics): Track memory footprint of metrics buckets (#1284)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants