-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balance: Instrument metrics in pool balancer #2558
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change updates the PoolQueue and P2CPool with metrics that track endpoint update and queue runtime metrics, including: * The count of queue gate state changes (as driven by failfast). * The timestamp of the last gate state change (i.e. so it's possible to determine how long a balancer has been in failfast). * The number of requests that enter the queue. * The distribution of in-queue latencies. * The current length of the queue. * The number of endpoint updates by update type. * The current number of endpoints.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2558 +/- ##
==========================================
- Coverage 67.73% 67.24% -0.50%
==========================================
Files 331 333 +2
Lines 14838 14985 +147
==========================================
+ Hits 10050 10076 +26
- Misses 4788 4909 +121
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Dec 13, 2023
This change culminates recent work to restructure the balancer to use a PoolQueue so that balancer changes may occur independently of request processing. This replaces independent discovery buffering so that the balancer task is responsible for polling discovery streams without independent buffering. Requests are buffered and processed as soon as the pool has available backends. Fail-fast circuit breaking is enforced on the balancer's queue so that requests can't get stuck in a queue indefinitely. In general, the new balancer is instrumented directly with metrics, and the relevant metric name prefix and labelset is provided by the stack. In addition to detailed queue metrics including request (in-queue) latency histograms, but also failfast states, discovery updates counts, and balancer endpoint pool sizes. --- * outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539) * metrics: Remove unused features (linkerd/linkerd2-proxy#2542) * Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540) * ci: Fixup codecov config (linkerd/linkerd2-proxy#2545) * ci: Cancel prior runs (linkerd/linkerd2-proxy#2546) * ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547) * deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544) * build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549) * metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551) * balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541) * metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552) * linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538) * outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554) * app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555) * vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557) * build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556) * balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558) * Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559) Signed-off-by: Oliver Gould <[email protected]>
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Dec 14, 2023
This change culminates recent work to restructure the balancer to use a PoolQueue so that balancer changes may occur independently of request processing. This replaces independent discovery buffering so that the balancer task is responsible for polling discovery streams without independent buffering. Requests are buffered and processed as soon as the pool has available backends. Fail-fast circuit breaking is enforced on the balancer's queue so that requests can't get stuck in a queue indefinitely. In general, the new balancer is instrumented directly with metrics, and the relevant metric name prefix and labelset is provided by the stack. In addition to detailed queue metrics including request (in-queue) latency histograms, but also failfast states, discovery updates counts, and balancer endpoint pool sizes. --- * outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539) * metrics: Remove unused features (linkerd/linkerd2-proxy#2542) * Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540) * ci: Fixup codecov config (linkerd/linkerd2-proxy#2545) * ci: Cancel prior runs (linkerd/linkerd2-proxy#2546) * ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547) * deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544) * build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549) * metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551) * balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541) * metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552) * linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538) * outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554) * app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555) * vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557) * build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556) * balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558) * Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559) Signed-off-by: Oliver Gould <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change updates the PoolQueue and P2CPool with metrics that track endpoint update and queue runtime metrics, including: