Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a monitor for the OpenMetrics endpoint #584

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

willmostly
Copy link
Contributor

Add a monitor for the OpenMetrics endpoint. This populates the running and queued query metrics for active load balancing, and allows defining health using minimum and maximum values for arbitrary metrics.

Description

Add a monitor for the /metrics endpoint. These metrics are equivalent to those under v1/jmx, however some organizations choose to standardize on OpenMetrics instead.

A custom definition of backend health can be configured through the metricMinimumValues and metricMaximumValues settings in monitorConfiguration. Each of these takes metricName, float pairs. If the returned metric is below the min or greater than the max, the backend is considered unhealthy.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x ) Release notes are required, with the following suggested text:

* Add OpenMetrics health monitor with customizable health definition

@cla-bot cla-bot bot added the cla-signed label Jan 6, 2025
@willmostly willmostly changed the title Add a monitor for the OpenMetrics endpoint. This populates the runnin… Add a monitor for the OpenMetrics endpoint Jan 6, 2025
@willmostly willmostly requested a review from vishalya January 7, 2025 14:37
Copy link
Member

@oneonestar oneonestar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick skim.

}
}
else {
log.error(e, "Health check failed with non-retryable response. %s", e.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think e.toString() is better than e.getMessage() .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnexpectedResponseException.toString() does not print the message unfortunately. I'll open a PR in airlift to add it and temporarily log both until it is merged and we upgrade. The message contains details about any issues with missing metrics, which could be a common issue given the long names. The error logged with e.getMessage() will be something like

2025-01-08T18:19:41.384+0000 SEVERE Health check failed with non-retryable response. Request is missing required keys: 
trino_execution_name_QueryManager_RunningQueries
trino_execution_name_QueryManager_QueuedQueries
trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount
in response: 'Basic authentication or X-Trino-Original-User or X-Trino-User must be sent'
UnexpectedResponseException{request=GET http://localhost:32835/metrics?name%5B%5D=trino_execution_name_QueryManager_RunningQueries&name%5B%5D=trino_execution_name_QueryManager_QueuedQueries&name%5B%5D=trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount, statusCode=401, headers={Date=[Wed, 08 Jan 2025 18:19:41 GMT], Vary=[Accept-Encoding], Content-Type=[text/plain;charset=UTF-8], WWW-Authenticate=[Basic realm="Trino"], Content-Length=[74]}}

(I removed the user header to trigger an error).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willmostly willmostly force-pushed the will/metrics-monitor branch 2 times, most recently from 1a4519d to 43465b3 Compare January 9, 2025 02:05
…g and queued query metrics for active load balancing, and allows defining health using minimum and maximum values for arbitrary metrics
@willmostly willmostly force-pushed the will/metrics-monitor branch from 43465b3 to d9e5068 Compare January 9, 2025 02:06
@willmostly willmostly requested a review from oneonestar January 9, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants