Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create metrics endpoint for prom-scraper consumption #2648

Closed
sethboyles opened this issue Jan 27, 2022 · 4 comments
Closed

Create metrics endpoint for prom-scraper consumption #2648

sethboyles opened this issue Jan 27, 2022 · 4 comments

Comments

@sethboyles
Copy link
Member

sethboyles commented Jan 27, 2022

Currently CCNG uses statsd-injector to gather Statsd style metrics and forward them to loggregator.

We should build a /metrics endpoint to supply those same metrics, so we can use prom-scraper to forward the metrics instead.

Doing this will allow us to remove the EventMachine/Thin code we have:

EM.add_periodic_timer(600) { catch_error { record_user_count } }
EM.add_periodic_timer(30) { catch_error { update_job_queue_length } }
EM.add_periodic_timer(30) { catch_error { update_thread_info } }

and consequently address #2254 and move CCNG from Thin to Puma.

@sethboyles
Copy link
Member Author

Currently we emit metrics named with periods, e.g. cc.job_queue_length.total. However the Prometheus data model prohibits periods and most other non-alphanumeric characters in metric names, so to migrate to prom-scraper would mean having to rename almost all of our metrics. So cc.job_queue_length.total would become cc_job_queue_length_total or similar.

@stephanme @philippthun do you have any thoughts on how we would handle an upgrade here?

A few options:

  1. Do a hard cut where we switch from period-named metrics to underscore metrics in one release, causing a breaking change where monitoring tools would have to be updated
  2. Introduce the new non-period metric names in a release to be emitted alongside the period metric names, giving operators a deprecation window to update their monitoring tools
  3. Make a configurable option that, when enabled, stops emitting the period metric names, and emits the new metric names

Would appreciate your input here.

@philippthun
Copy link
Member

Option 1 would work for us; we would need to adapt our monitoring tools beforehand to support both formats and then change them a second time to clean this up (i.e. support only the new format).

But I'm not sure if this works for others as well, thus I would prefer option 2. This also has the advantage that we only have to adapt our monitoring tools once without any disruption.

There might be even an option 4, that combines 2 and 3; i.e. a config parameter that specifies what to emit: either the old, or the new format, or both.

But given the fact that we are not talking about hundreds of metrics, that might be over-engineered; so simply having both formats for one release seems to be the best/easiest solution.

@sethboyles
Copy link
Member Author

Sounds good. Having one release that contains both I think makes the most sense. We will move forward with that idea.

@sethboyles
Copy link
Member Author

Closing as this has been supported for a while: #2781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants