Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cc.requests.outstanding.gauge when using puma web server #3841

Conversation

Samze
Copy link
Contributor

@Samze Samze commented Jun 12, 2024

#1312 introduced cc.requests.outstanding.gauge which holds the counter in memory. With the introduction of puma there may be multiple processes, so each would emit its own value for this metric. This would cause the gauge to flop between values. This metric is listed as an important kpi for capi scaling https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html#cloud_controller_ng.

This fix for puma will instead uses Redis for the gauge.

  • Refactor requests_metric to use statsd_updater
  • Statsd_updater now contains statsd requests logic
  • Add Redis/Inmemory store to statsd_updager
  • Add a missing mutex to counter for thread safety for the in memory implementation.
  • Chose to prefix the entry in redis with metrics: but open to feedback here.

Inspired by 4539e59

An alternative considered, was to read the prometheus metric and re-emit that to StatsD, however we observed performance degradation. Presumably because of the number of reads from disk for the DirectFileStorage to aggregate the metric across processes and so Redis seemed the best approach.

cc @sethboyles / @pivotalgeorge


  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@Samze Samze force-pushed the puma_cc.requests.outstanding.gauge_redis_refactor branch 2 times, most recently from cb4dbaf to 8a280d7 Compare June 12, 2024 21:44
cloudfoundry#1312 introduced `cc.requests.outstanding.gauge` which holds the counter in memory. With the introduction of puma there may be multiple processes, so each would emit its own value for this metric. This would cause the gauge to flop between values. This metric is listed as an important kpi for capi scaling https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html#cloud_controller_ng.

This fix for puma will instead uses Redis for the gauge.

* Move requests_metric to use statsd_updater
* Statsd_updater now contains statsd request logic
* Add Redis/Inmemory store to statsd_updager
* Add a missing mutux to counter for thread safety for the in memory implementation.
* Chose to prefix the entry in redis with `metrics:` but open to feedback here.

Inspired by cloudfoundry@4539e59

An alternative considered, was to read the prometheus metric and re-emit that to StatsD, however we observed performance degradation. Presumably because of the number of reads from disk for the [DirectFileStorage](https://github.com/prometheus/client_ruby?tab=readme-ov-file#directfilestore-caveats-and-things-to-keep-in-mind) to aggregate the metric across processes.
@Samze Samze force-pushed the puma_cc.requests.outstanding.gauge_redis_refactor branch from 8a280d7 to 9b0ed8f Compare June 12, 2024 21:46
@tcdowney tcdowney requested a review from a team June 13, 2024 22:30
@sethboyles
Copy link
Member

Code looks good, I'll do acceptance tomorrow

Copy link
Contributor

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, my only concern here would be that this store can only save one metric, but looking at the statsd_updater it looks like we only have 1 type of metric so it should be fine.

@sethboyles sethboyles merged commit 82db58e into cloudfoundry:main Jun 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants