Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus on Thin #3445

Merged
merged 26 commits into from
Nov 20, 2023
Merged

Conversation

svkrieger
Copy link
Contributor

@svkrieger svkrieger commented Sep 21, 2023

A short explanation of the proposed change:

Adjust Prometheus endpoint:

  • Remove deprecated metrics and metrics which have been found not useful according to discussions in the community
  • Make use of the DependencyLocator for retrieving a singleton of the PrometheusUpdater and PeriodicUpdater
  • Change vitals_uptime to vitals_started_at
  • Emit cc_staging_requests_total metric
  • Apply prometheus best practices like naming, base units, using labels, initialising metrics for discoverability
  • Use counter metrics for metrics which do not decrease
  • Remove metrics, which are emitted on the scheduler VM. Those metrics currently cannot be collected and will be still emitted via statsd

This implementation is meant for experimental usage. It is difficult to decide on the best metric type or bucket size initially, without having it used in productive environments. The metrics emitted via Prometheus can be collected and displayed in dashboards while still using the statsd metrics for alert/real monitoring. We should communicate that breaking changes are likely, as long as this is treated as a experimental feature.

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

- Use base formats like seconds instead of milliseconds
- Change requests_completed to counter metric
- Use one metric for queue lengths and use labels for different queues
- Register metrics and initialize them for discoverability
- Change histogram buckets
The metrics `report_diego_cell_sync_duration`, `report_deployment_duration`, `update_synced_invalid_lrps` are being emitted on the scheduler VM.
There is no web server and therefore also no endpoint, which could serve those metrics.
For now we decided to remove those prometheus metrics and just keep the statsd metrics.
If those metrics should be also available through prometheus in the future, we probably have to deploy additional jobs on the scheduler VM, which
take care of publishing the metrics, so they can be collected by the prom_scraper job.
@svkrieger svkrieger marked this pull request as ready for review October 13, 2023 11:49
@philippthun philippthun self-assigned this Nov 6, 2023
lib/cloud_controller/metrics/periodic_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/runner.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/prometheus_updater.rb Outdated Show resolved Hide resolved
lib/cloud_controller/runner.rb Outdated Show resolved Hide resolved
lib/cloud_controller/metrics/periodic_updater.rb Outdated Show resolved Hide resolved
@philippthun philippthun merged commit aadd26d into cloudfoundry:main Nov 20, 2023
rroberts2222 pushed a commit to loggregator/cloud_controller_ng that referenced this pull request Dec 19, 2023
- Remove deprecated metrics and metrics which have been found not useful according to discussions in the community
- Make use of the DependencyLocator for retrieving a singleton of the PrometheusUpdater and PeriodicUpdater
- Change vitals_uptime to vitals_started_at
- Emit cc_staging_requests_total metric
- Apply prometheus best practices like naming, base units, using labels, initialising metrics for discoverability
- Use counter metrics for metrics which do not decrease
- Remove metrics, which are emitted on the scheduler VM. Those metrics currently cannot be collected and will be still emitted via statsd
Co-authored-by: Andrew Crump <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants