Skip to content
This repository has been archived by the owner on Jun 19, 2024. It is now read-only.

[prometheus] too many metrics #195

Closed
joel-u410 opened this issue Apr 10, 2024 · 1 comment
Closed

[prometheus] too many metrics #195

joel-u410 opened this issue Apr 10, 2024 · 1 comment

Comments

@joel-u410
Copy link
Contributor

I've noticed a very large number of metrics published to prometheus. On digging into it, I see lots of duplicated/repeated lines such as this:

db_save_block_count{save_block="21874",status="Ok"} 1
db_save_block_count{save_block="10294",status="Ok"} 1
db_save_block_count{save_block="17514",status="Ok"} 1
db_save_block_count{save_block="25058",status="Ok"} 1
db_save_block_count{save_block="10293",status="Ok"} 1
db_save_block_count{save_block="26436",status="Ok"} 1
...

The longer the indexer service runs, the more such lines accumulate. Essentially, a new metric gets created for every block processed, and they stay / accumulate for the entire duration of the service process. As time goes on, this grows and grows without bound.

This is caused by code such as the following:

        let labels = [
            ("save_block", block.header.height.value().to_string()),
            ("status", status),
        ];

        histogram!(DB_SAVE_BLOCK_DURATION, dur.as_secs_f64() * 1000.0, &labels);

        if res.is_ok() {
            // update our counter for processed blocks since service started.
            increment_counter!(DB_SAVE_BLOCK_COUNTER, &labels);

Because of the save_block label with the value block.header.height.value(), prior metrics for e.g. DB_SAVE_BLOCK_DURATION and DB_SAVE_BLOCK_COUNTER are never overwritten, but instead, with every block processed, a new metric is created.

Is this the desired behavior? Or can we remove the save_block label here so that e.g. DB_SAVE_BLOCK_COUNTER would actually function as a single counter, counting up with each block rather than creating a new counter with value 1 for every block?

If you agree with this change, I'm happy to submit a PR. Does anyone rely on that save_block label?

@rllola
Copy link
Contributor

rllola commented Apr 11, 2024

Indeed I don't think it is the wanted behavior. I have looked at your PR and it looks ok to me. Thanks for submitting it.

I will take a bit more time to re-familiar myself with the metrics. There is things we can improve there.

rllola pushed a commit that referenced this issue Apr 11, 2024
rllola pushed a commit that referenced this issue Apr 14, 2024
…as a separate histogram (#199)

This helps reduce some more of the redundant metrics as described in
#195 and
#198.

COMPATIBILITY NOTE: this also renames the `db_save_` metrics to use
common prefixes `db_save_duration_`, `db_save_batch_size_`, etc. instead
of putting the metric type at the end of the name.
@rllola rllola closed this as completed Apr 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants