-
-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mutator tracing #1050
feat: mutator tracing #1050
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1050 +/- ##
==========================================
- Coverage 78.09% 77.74% -0.36%
==========================================
Files 83 83
Lines 3977 3841 -136
==========================================
- Hits 3106 2986 -120
+ Misses 593 578 -15
+ Partials 278 277 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor thing for tests otherwise looks good!
|
pipeline/mutate/mutator_hydrator.go
Outdated
if len(cfg.Api.Retry.MaxDelay) > 0 { | ||
maxRetryDelay := time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this needs to be moved one scope up, like it was before, so that the default is also set when the config is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we need a test for this, so e.g. only setting cfg.Api.Retry.MaxDelay
, and expecting the giveUpAfter
default to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this in more detail, the retry config was completely broken in 2a97e05.
I'll revisit this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty... the retry configuration is broken all over Oathkeeper. Here's my understanding of what is currently happening:
- The
give_up_after
property specifies the per-retry timeout. That's surprising to me, since it sounds like it means the overall timeout, including retries. - The
max_delay
specifies the maximum backoff wait time in between retries. - The maximum number of retries is not configurable, and is always 4 (the default in
hashicorp/retryablehttp
). - This config migration is incorrect. It thinks
give_up_after
is the overall timeout (including retries).
Not sure now is the time to fix this fully, since that could also break users.
With the goal of moving this PR over the line, I propose to keep as much backward-compatibility as possible. This particular mutator previously had no overall timeout and no retries if the retry config was not specified at all. If the retry config was specified (even if {}
), it peformed up to 4 retries. If the retry config is empty ({}
), the timeouts also changed.
This patch keeps that behavior, while fixing bugs:
- the fallback timeouts (retry config present but empty) were wrong: 50ms timeout and 1s delay, even though the config schema promises a more reasonable 1s timeout and 100ms delay. Those defaults now at least work as advertised.
- faulty error messages
We should consider revamping the timeout handling all over Oathkeeper in a follow-up ticket.
Thoughts? @zepatrik @aeneasr @daviddelucca ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is related to activate tracing on mutator. I think we could merge it and open another ticket/branch to work on revamping on timeouts.
Does make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I agree to merge this as is and fix the other issues you found in a separate PR 👍
This needs more work to synchronize the tracing config schemata with ory/x. |
I've synced the tracing config schema with the one from ory/x. Initially, I tried to reference it by URL, but that (logically) requires Oathkeeper to load the remote schema on startup, which is often not possible due to network segmentation. @daviddelucca can you try this branch, please? |
#1001
This slightly refactors #1049 and keeps more of the timeout behavior unchanged.