Prometheus on Thin #3445

svkrieger · 2023-09-21T15:10:19Z

A short explanation of the proposed change:

Adjust Prometheus endpoint:

Remove deprecated metrics and metrics which have been found not useful according to discussions in the community
Make use of the DependencyLocator for retrieving a singleton of the PrometheusUpdater and PeriodicUpdater
Change vitals_uptime to vitals_started_at
Emit cc_staging_requests_total metric
Apply prometheus best practices like naming, base units, using labels, initialising metrics for discoverability
Use counter metrics for metrics which do not decrease
Remove metrics, which are emitted on the scheduler VM. Those metrics currently cannot be collected and will be still emitted via statsd

This implementation is meant for experimental usage. It is difficult to decide on the best metric type or bucket size initially, without having it used in productive environments. The metrics emitted via Prometheus can be collected and displayed in dashboards while still using the statsd metrics for alert/real monitoring. We should communicate that breaking changes are likely, as long as this is treated as a experimental feature.

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

- Remove http_status - Remove log_count - Remove outstanding_requests

- Adjusted tests to be more specific

- Removed one allow for a method, which is no longer called on PromUpdater - Removed unnecessary allows for updaters where updaters are not being tested

- Use base formats like seconds instead of milliseconds - Change requests_completed to counter metric - Use one metric for queue lengths and use labels for different queues - Register metrics and initialize them for discoverability - Change histogram buckets

The metrics `report_diego_cell_sync_duration`, `report_deployment_duration`, `update_synced_invalid_lrps` are being emitted on the scheduler VM. There is no web server and therefore also no endpoint, which could serve those metrics. For now we decided to remove those prometheus metrics and just keep the statsd metrics. If those metrics should be also available through prometheus in the future, we probably have to deploy additional jobs on the scheduler VM, which take care of publishing the metrics, so they can be collected by the prom_scraper job.

app/controllers/internal/metrics_controller.rb

app/jobs/diego/sync.rb

lib/cloud_controller/deployment_updater/scheduler.rb

lib/cloud_controller/metrics/periodic_updater.rb

lib/cloud_controller/metrics/prometheus_updater.rb

spec/unit/lib/cloud_controller/metrics/prometheus_updater_spec.rb

lib/cloud_controller/dependency_locator.rb

spec/unit/lib/cloud_controller/metrics/prometheus_updater_spec.rb

lib/cloud_controller/metrics/periodic_updater.rb

lib/cloud_controller/metrics/prometheus_updater.rb

lib/cloud_controller/runner.rb

lib/cloud_controller/metrics/prometheus_updater.rb

spec/unit/lib/cloud_controller/metrics/prometheus_updater_spec.rb

lib/cloud_controller/metrics/periodic_updater.rb

lib/cloud_controller/metrics/prometheus_updater.rb

lib/cloud_controller/runner.rb

app/controllers/internal/metrics_controller.rb

lib/cloud_controller/metrics/periodic_updater.rb

- Remove deprecated metrics and metrics which have been found not useful according to discussions in the community - Make use of the DependencyLocator for retrieving a singleton of the PrometheusUpdater and PeriodicUpdater - Change vitals_uptime to vitals_started_at - Emit cc_staging_requests_total metric - Apply prometheus best practices like naming, base units, using labels, initialising metrics for discoverability - Use counter metrics for metrics which do not decrease - Remove metrics, which are emitted on the scheduler VM. Those metrics currently cannot be collected and will be still emitted via statsd Co-authored-by: Andrew Crump <[email protected]>

cf-gitbot added the unscheduled label Sep 21, 2023

svkrieger added 9 commits October 5, 2023 15:15

Remove metrics from prom endpoint

32bd336

- Remove http_status - Remove log_count - Remove outstanding_requests

Use DependencyLocator to get PrometheusUpdater singleton

3b4c994

Remove vitals.cpu metric from Prom endpoint

0dd670d

Emit cc_staging_requested metric with prometheus

b6d56ef

Collect fresh metrics when metrics endpoint is called

6abf297

- Adjusted tests to be more specific

Remove unnecessary monkey patch in test

0409a53

Adjust periodic updater spec

9396960

- Removed one allow for a method, which is no longer called on PromUpdater - Removed unnecessary allows for updaters where updaters are not being tested

Remove log_count metrics from prom updater

35d0221

Use decrement and increment for gauge metric

716a3e2

svkrieger force-pushed the prometheus-on-thin branch from 5eb2588 to 716a3e2 Compare October 5, 2023 13:26

svkrieger added 3 commits October 5, 2023 15:44

Remove log_counts and vitals.cpu tests in integration test

38dff13

Clean up metrics spec

e7c2316

Fix formatting issues

fd3f37f

svkrieger force-pushed the prometheus-on-thin branch from 7a4f401 to 174ed46 Compare October 13, 2023 07:24

svkrieger added 4 commits October 13, 2023 09:54

Adjust unit tests

ffac886

Rename some prometheus metrics and add/adjust unit tests

b6ed2b7

svkrieger force-pushed the prometheus-on-thin branch from 174ed46 to b6ed2b7 Compare October 13, 2023 09:58

Change prometheus metric vitals_uptime to started_at

2f6b645

svkrieger commented Oct 13, 2023

View reviewed changes

svkrieger marked this pull request as ready for review October 13, 2023 11:49

svkrieger added needs_review and removed unscheduled labels Oct 19, 2023

johha reviewed Nov 2, 2023

View reviewed changes

lib/cloud_controller/dependency_locator.rb Outdated Show resolved Hide resolved

philippthun self-assigned this Nov 6, 2023

philippthun reviewed Nov 6, 2023

View reviewed changes

philippthun reviewed Nov 7, 2023

View reviewed changes

lib/cloud_controller/metrics/periodic_updater.rb Show resolved Hide resolved

philippthun reviewed Nov 7, 2023

View reviewed changes

lib/cloud_controller/metrics/prometheus_updater.rb Show resolved Hide resolved

svkrieger added 4 commits November 7, 2023 15:46

Rename some metrics

b7d4458

Move vitals filter into periodic updater

c691b71

Fix comment in test and ordering

b059990

Refactor prom metric registration

630b624

svkrieger force-pushed the prometheus-on-thin branch from f37fd5f to 630b624 Compare November 9, 2023 12:40

philippthun reviewed Nov 13, 2023

View reviewed changes

svkrieger added 5 commits November 14, 2023 15:38

Use constants instead of methods

702b153

Change byte calculation

6078602

Fix typo

d8d32b1

Instantiate periodic_updater in DependencyLocator

427278c

Rename cc_staging_requested_total to cc_staging_requests_total

bdffc42

philippthun approved these changes Nov 20, 2023

View reviewed changes

philippthun merged commit aadd26d into cloudfoundry:main Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus on Thin #3445

Prometheus on Thin #3445

svkrieger commented Sep 21, 2023 •

edited by philippthun

Loading

Prometheus on Thin #3445

Prometheus on Thin #3445

Conversation

svkrieger commented Sep 21, 2023 • edited by philippthun Loading

svkrieger commented Sep 21, 2023 •

edited by philippthun

Loading