Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control: Enforce timeouts on response stream #2587

Merged
merged 3 commits into from
Dec 27, 2023
Merged

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Dec 26, 2023

When connecting to a control plane API, the API server can return an
HTTP response long before it returns the first stream response. To bound
this time, we now enforce timeouts so that failures may result in attempting
to use an alternate controller instances.

All controller response streams now use a generic gRPC middlware with
initial, idle, and lifetime timeouts. When an initial timeout is
encounterd, a DeadlineExceeded grpc status is synthesized. When the
other timeouts are encountered, the stream terminates gracefully.

These timeouts are configurable by the proxy injector. Timeouts are not
enabled without configuration:

  • LINKERD2_PROXY_CONTROL_STREAM_INITIAL_TIMEOUT
  • LINKERD2_PROXY_CONTROL_STREAM_IDLE_TIMEOUT
  • LINKERD2_PROXY_CONTROL_STREAM_LIFETIME

Each of these parameters is optional.

@olix0r olix0r requested a review from a team as a code owner December 26, 2023 20:05
Copy link

codecov bot commented Dec 26, 2023

Codecov Report

Merging #2587 (bcfd140) into main (e2bb210) will decrease coverage by 0.02%.
Report is 12 commits behind head on main.
The diff coverage is 75.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2587      +/-   ##
==========================================
- Coverage   68.02%   68.00%   -0.02%     
==========================================
  Files         329      330       +1     
  Lines       14904    14989      +85     
==========================================
+ Hits        10139    10194      +55     
- Misses       4765     4795      +30     
Files Coverage Δ
linkerd/app/inbound/src/policy/config.rs 30.43% <100.00%> (ø)
linkerd/app/outbound/src/lib.rs 64.55% <100.00%> (ø)
linkerd/app/src/dst.rs 96.29% <100.00%> (+0.46%) ⬆️
linkerd/app/src/lib.rs 88.67% <100.00%> (+0.10%) ⬆️
linkerd/app/src/policy.rs 100.00% <100.00%> (ø)
linkerd/service-profiles/src/client.rs 100.00% <100.00%> (ø)
linkerd/app/inbound/src/server.rs 90.00% <66.66%> (-2.86%) ⬇️
linkerd/app/inbound/src/policy/api.rs 81.81% <66.66%> (+0.56%) ⬆️
linkerd/app/outbound/src/policy/api.rs 78.37% <66.66%> (+0.60%) ⬆️
linkerd/app/src/env.rs 59.54% <77.77%> (+1.08%) ⬆️
... and 2 more

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2bb210...bcfd140. Read the comment docs.

@olix0r olix0r force-pushed the ver/profile-timeout branch from bffcb6f to 37bc6a1 Compare December 27, 2023 06:44
When connecting to a control plane API, the API server can return an
HTTP response long before it returns the first stream response. To bound
this time, we now enforce timeouts so that failures may result in attempting
to use an alternate controller instances.

All controller response streams now use a generic gRPC middlware with
initial, idle, and lifetime timeouts. When an initial timeout is
encounterd, a DeadlineExceeded grpc status is synthesized. When the
other timeouts are encountered, the stream terminates gracefully.

These timeouts are configurable by the proxy injector. Timeouts are not
enabled without configuration:

* LINKERD2_PROXY_CONTROL_STREAM_INITIAL_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_IDLE_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_LIFETIME

Each of these parameters is optional.
@olix0r olix0r force-pushed the ver/profile-timeout branch from 37bc6a1 to 2a3c234 Compare December 27, 2023 06:48
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 27, 2023
linkerd/linkerd2-proxy#2587 adds configuration parameters that bound the
lifetime and idle times of control plane streams. This change helps to
mitigate imbalanced control plane replica usage and to generally prevent
scenarios where a stream becomes "stuck," as has been observed when a
control plane replica is unhealthy.

This change adds helm values to control this behavior. Default values
are provided.
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 27, 2023
linkerd/linkerd2-proxy#2587 adds configuration parameters that bound the
lifetime and idle times of control plane streams. This change helps to
mitigate imbalanced control plane replica usage and to generally prevent
scenarios where a stream becomes "stuck," as has been observed when a
control plane replica is unhealthy.

This change adds helm values to control this behavior. Default values
are provided.
@olix0r olix0r merged commit abd7e86 into main Dec 27, 2023
95 checks passed
@olix0r olix0r deleted the ver/profile-timeout branch December 27, 2023 17:53
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 27, 2023
When connecting to a control plane API, the API server can return an
HTTP response long before it returns the first stream response. To bound
this time, we now enforce timeouts so that failures may result in attempting
to use an alternate controller instances.

All controller response streams now use a generic gRPC middleware with
initial, idle, and lifetime timeouts. When an initial timeout is
encountered, a DeadlineExceeded grpc status is synthesized. When the
other timeouts are encountered, the stream terminates gracefully.

These timeouts are configurable by the proxy injector. Timeouts are not
enabled without configuration:

* LINKERD2_PROXY_CONTROL_STREAM_INITIAL_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_IDLE_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_LIFETIME

Each of these parameters is optional.

---

* build(deps): bump semver from 1.0.17 to 1.0.20 (linkerd/linkerd2-proxy#2576)
* build(deps): bump memchr from 2.5.0 to 2.6.4 (linkerd/linkerd2-proxy#2577)
* build(deps): bump arbitrary from 1.2.3 to 1.3.2 (linkerd/linkerd2-proxy#2578)
* build(deps): bump data-encoding from 2.3.3 to 2.5.0 (linkerd/linkerd2-proxy#2579)
* build(deps): bump tj-actions/changed-files from 40.2.3 to 41.0.1 (linkerd/linkerd2-proxy#2586)
* build(deps): bump ahash from 0.8.5 to 0.8.6 (linkerd/linkerd2-proxy#2582)
* build(deps): bump jemallocator from 0.5.0 to 0.5.4 (linkerd/linkerd2-proxy#2581)
* build(deps): bump anyhow from 1.0.69 to 1.0.76 (linkerd/linkerd2-proxy#2583)
* build(deps): bump symbolic-common from 12.6.0 to 12.8.0 (linkerd/linkerd2-proxy#2584)
* build(deps): bump gimli from 0.28.0 to 0.28.1 (linkerd/linkerd2-proxy#2588)
* build(deps): bump foreign-types-macros from 0.2.2 to 0.2.3 (linkerd/linkerd2-proxy#2590)
* build(deps): bump symbolic-demangle from 12.6.0 to 12.8.0 (linkerd/linkerd2-proxy#2591)
* control: Enforce timeouts on response stream (linkerd/linkerd2-proxy#2587)

Signed-off-by: Oliver Gould <[email protected]>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 27, 2023
When connecting to a control plane API, the API server can return an
HTTP response long before it returns the first stream response. To bound
this time, we now enforce timeouts so that failures may result in attempting
to use an alternate controller instances.

All controller response streams now use a generic gRPC middleware with
initial, idle, and lifetime timeouts. When an initial timeout is
encountered, a DeadlineExceeded grpc status is synthesized. When the
other timeouts are encountered, the stream terminates gracefully.

These timeouts are configurable by the proxy injector. Timeouts are not
enabled without configuration:

* LINKERD2_PROXY_CONTROL_STREAM_INITIAL_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_IDLE_TIMEOUT
* LINKERD2_PROXY_CONTROL_STREAM_LIFETIME

Each of these parameters is optional.

---

* build(deps): bump semver from 1.0.17 to 1.0.20 (linkerd/linkerd2-proxy#2576)
* build(deps): bump memchr from 2.5.0 to 2.6.4 (linkerd/linkerd2-proxy#2577)
* build(deps): bump arbitrary from 1.2.3 to 1.3.2 (linkerd/linkerd2-proxy#2578)
* build(deps): bump data-encoding from 2.3.3 to 2.5.0 (linkerd/linkerd2-proxy#2579)
* build(deps): bump tj-actions/changed-files from 40.2.3 to 41.0.1 (linkerd/linkerd2-proxy#2586)
* build(deps): bump ahash from 0.8.5 to 0.8.6 (linkerd/linkerd2-proxy#2582)
* build(deps): bump jemallocator from 0.5.0 to 0.5.4 (linkerd/linkerd2-proxy#2581)
* build(deps): bump anyhow from 1.0.69 to 1.0.76 (linkerd/linkerd2-proxy#2583)
* build(deps): bump symbolic-common from 12.6.0 to 12.8.0 (linkerd/linkerd2-proxy#2584)
* build(deps): bump gimli from 0.28.0 to 0.28.1 (linkerd/linkerd2-proxy#2588)
* build(deps): bump foreign-types-macros from 0.2.2 to 0.2.3 (linkerd/linkerd2-proxy#2590)
* build(deps): bump symbolic-demangle from 12.6.0 to 12.8.0 (linkerd/linkerd2-proxy#2591)
* control: Enforce timeouts on response stream (linkerd/linkerd2-proxy#2587)

Signed-off-by: Oliver Gould <[email protected]>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 28, 2023
linkerd/linkerd2-proxy#2587 adds configuration parameters that bound the
lifetime and idle times of control plane streams. This change helps to
mitigate imbalanced control plane replica usage and to generally prevent
scenarios where a stream becomes "stuck," as has been observed when a
control plane replica is unhealthy.

This change adds helm values to control this behavior. Default values
are provided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant