-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ResiliencePipeline CircuitBreaker doesn't handle Half Open State Correctly when probing call throws unhandeled exception - half open state can never be left again #1979
Comments
Thanks @DominicUllmann for the report, the repro steps help a lot. I believe I found the culprit:
This line tells CB that the outcome is not handled and therefore it should be considered as success only if the outcome does not contain the exception. And here lies the bug, because the This is pretty serious bug and can lead to circuit breaker being open forever :/ My thinking is that we should change the code to: if (await _handler(args).ConfigureAwait(context.ContinueOnCapturedContext))
{
await _controller.OnActionFailureAsync(outcome, context).ConfigureAwait(context.ContinueOnCapturedContext);
}
else
{
await _controller.OnActionSuccessAsync(outcome, context).ConfigureAwait(context.ContinueOnCapturedContext);
} Meaning that if the outcome is handled, the it's treated as a failure and if it's not handled it's treated as a success. Thies applies even if the outcome contains an exception. Internally we should also change the names of internal methods to:
Thoughts? This one should be a hotfix though. |
@martintmk It seems good to me. Regarding to the renaming. I agree with the |
@DominicUllmann Do you want to tackle this and create a PR? Otherwise, I can take a look in my free time over this weekend or next one. |
Thanks a lot for the fast feedback @martintmk. I wonder if the suggested fix matches with the documentation here: But generally, it would work well, if the circuit breaker is closed in case of success as well as when an unhandled exception occurs. In case of e.g. validation errors, it is as good as a normal success answer and therefore it doesn't need to be handled differently. I can probably create a PR this evening or tomorrow evening. |
That page is covering the legacy V7 API. That works slightly differently than the V8 API. One of differences is covered here: https://www.pollydocs.org/strategies/circuit-breaker.html#reducing-thrown-exceptions (The shown anti-pattern works smoothly in case of V7) |
Thanks @peter-csala. Good to know. Then I will try the fix suggested above. |
@martincostello What's your plan to release the hotfix? |
I think there was another PR I was waiting on before thinking about preparing a release, but I've forgotten what it was now. I'll look at doing a 8.3.1 tomorrow. |
Describe the bug
We use a resilience pipeline with a circuit breaker. We distinguish between handled and unhandled exceptions. When the probe call is completed with an unhandled exception, the circuit breaker stays in half open state and no additional probe call is ever allowed.
The issue is probably in https://github.com/App-vNext/Polly/blame/10730b9f723f80559b0c115c2102db974382fdfd/src/Polly.Core/CircuitBreaker/Controller/CircuitStateController.cs#L131. The state controller only handles the case where the state before was open. But it doesn't correctly handle the case where the state was already half-open.
Based on https://github.com/App-vNext/Polly/wiki/Circuit-Breaker#half-open, a non handled exception results in staying in the half open state. A next probe call should then be allowed to either transition to closed or open, but it is not.
Expected behavior
The circuit breaker should not stay forever in half-open state after a non handled exception occured in the probe call used in the half open state. Instead a next probe call must be allowed and should then lead to either a next non handled exception, a handled exception or a success result.
Actual behavior
The circuit breaker stays forever in half open state after the probe call resulted in a non handled exception.
Steps to reproduce
Please have a look at:
https://github.com/DominicUllmann/PollyHalfOpenIssue/tree/main
The issue occurs in the unit test TestResiliencePipelineDirectly2, There we throw an unhandled exception from the probe call.
In this test, we first ensure to open the circuit breaker, then we verify that it is really open. Then we wait for more than the reset time and make a call throwing an unhandled exception (MyUnhandledException). Afterwards we wait again (which should not be necessary in half-open state) and try to send a successful call. But in this state, we can never leave half open state again, neither to open or closed.
The the unit test TestResiliencePipelineDirectly shows what happens when we use a handled exception instead. In this case everything works as expected (i.e. transition from half-open to open and then after the reset time again to closed).
Exception(s) (if any)
No response
Polly version
8.3.0
.NET Version
8.0.102
Anything else?
No response
The text was updated successfully, but these errors were encountered: