Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set more lenient timeouts for protection services #1027

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

adam-cattermole
Copy link
Member

@adam-cattermole adam-cattermole commented Nov 15, 2024

The default connection timeout used by the wasm-shim when the value is omitted is 20ms. This could be too short when set to use external redis for rate limiting, or with complex auth policies. By default I have increased this to:

  • 200ms for auth (this matches the value in envoy ext auth)
  • 100ms for rate limiting

This is also not exposed by the API in anyway, and any changes to the WasmPlugin will be overridden by the operator, so it's not possible to set these values. For now until we decide how to expose this in the API I've added two env vars:

  • AUTH_SERVICE_TIMEOUT
  • RATELIMIT_SERVICE_TIMEOUT

These let us override the defaults if a user requires.

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 84.20%. Comparing base (cc1b41f) to head (f1e4643).
Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
pkg/wasm/types.go 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1027      +/-   ##
==========================================
+ Coverage   76.15%   84.20%   +8.05%     
==========================================
  Files         111       81      -30     
  Lines        8986     6679    -2307     
==========================================
- Hits         6843     5624    -1219     
+ Misses       1852      842    -1010     
+ Partials      291      213      -78     
Flag Coverage Δ
bare-k8s-integration 15.75% <0.00%> (+4.87%) ⬆️
controllers-integration 76.72% <75.00%> (+17.86%) ⬆️
envoygateway-integration 40.82% <75.00%> (+8.32%) ⬆️
gatewayapi-integration 16.08% <0.00%> (+2.64%) ⬆️
istio-integration 43.68% <75.00%> (+9.35%) ⬆️
unit 17.26% <37.50%> (-8.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 90.00% <100.00%> (-2.19%) ⬇️
api/v1beta2 (u) ∅ <ø> (∅)
pkg/common (u) ∅ <ø> (∅)
pkg/istio (u) 62.06% <ø> (+15.03%) ⬆️
pkg/log (u) 93.18% <ø> (ø)
pkg/reconcilers (u) 24.67% <ø> (∅)
pkg/rlptools (u) ∅ <ø> (∅)
controllers (i) 86.95% <90.47%> (+2.53%) ⬆️
Files with missing lines Coverage Δ
pkg/wasm/utils.go 92.24% <100.00%> (+9.70%) ⬆️
pkg/wasm/types.go 72.44% <60.00%> (+3.21%) ⬆️

... and 32 files with indirect coverage changes

Comment on lines 29 to 36
func AuthServiceTimeout() string {
return env.GetString("AUTH_SERVICE_TIMEOUT", "5s")
}

func RatelimitServiceTimeout() string {
return env.GetString("RATELIMIT_SERVICE_TIMEOUT", "100ms")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was torn here between var that executes once on startup vs function that calls out to env every time. My other consideration was making these return *string with ptr.To immediately to avoid at every call site

@adam-cattermole adam-cattermole self-assigned this Nov 15, 2024
@adam-cattermole adam-cattermole marked this pull request as ready for review November 15, 2024 15:43
Copy link
Contributor

@eguzki eguzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have listed 7 env vars so far

  • LOG_LEVEL
  • LOG_MODE
  • OPERATOR_NAMESPACE
  • RELATED_IMAGE_WASMSHIM
  • RELATED_IMAGE_CONSOLEPLUGIN
  • AUTH_SERVICE_TIMEOUT
  • RATELIMIT_SERVICE_TIMEOUT

We should document those env vars for devs and operator release managers. I was thinking on documenting about how to tune those parameters when releasing using OLM (using the subscription object). As well as when using helm charts. I know currently it is not possible to tune those env vars when deploying using helm charts, but this is also something to be done. Leaving here for the record, not asking for this PR.

@adam-cattermole adam-cattermole merged commit 6890020 into main Nov 18, 2024
34 checks passed
@adam-cattermole adam-cattermole deleted the service-timeouts branch November 18, 2024 12:11
@alexsnaps
Copy link
Member

alexsnaps commented Nov 18, 2024

My stance on environment variables is that they are "backdoors" into the system. That's certainly what I had envisioned for these two new ones. So, if we really document them, I think this should be done in such a way that it's clear that this is either "experimental" or whatever, but clearly not supported and/or guaranteed to work moving forward.

Discussing this issue with @adam-cattermole on Friday, the intent is to go with somewhat sensible defaults and see how make this a properly supported (and documented) feature moving forward... There is more to this than it may seem and this deserves some real thinking. A default per service is definitely not sensible from a Kuadrant perspective.

@eguzki
Copy link
Contributor

eguzki commented Nov 18, 2024

My stance on environment variables is that they are "backdoors" into the system. That's certainly what I had envisioned for these two new ones. So, if we really document them, I think this should be done in such a way that it's clear that this is either "experimental" or whatever, but clearly not supported and/or guaranteed to work moving forward.

Discussing this issue with @adam-cattermole on Friday, the intent is to go with somewhat sensible defaults and see how make this a properly supported (and documented) feature moving forward... There is more to this than it may seem and this deserves some real thinking. A default per service is definitely not sensible from a Kuadrant perspective.

Fair enough.

Let's not document those env vars yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants