Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation of AtomicState leveraging WaitAsync #6109

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

ismaelhamed
Copy link
Member

@ismaelhamed ismaelhamed commented Sep 23, 2022

Might be related to #6106

The circuit breaker, in its current implementation, could give false positives (signal tasks as done when in reality they failed).

I ported the CircuitBreakerStressSpec and slightly modified it to make all tasks take longer than the CB's CallTimeout, so that all of them fail. Instead, by running the test you can see that they are all "marked" as succeeded (DoneCount).

Upon further inspection:

  • The tasks indeed all failed with a TimeException, but the exception is swallowed by the CallFail method. This makes impossible to capture the TimeException outside the CB --as demonstrated by the StressActor.

  • Because we are awaiting the task first, and only after it finishes we check whether it took longer than the CallTimeout, we could potentially be awaiting indefinitely for the task to complete. This PR leverages the new Task.WaitAsync in .NET6 instead.

When compared with the results of the test in the previous PR, we now get the correct behavior:

BEFORE

FailCount:0, DoneCount:1000, CircCount:0, TimeoutCount:0
FailCount:0, DoneCount:1000, CircCount:0, TimeoutCount:0
FailCount:0, DoneCount:1000, CircCount:0, TimeoutCount:0

AFTER

FailCount:0, DoneCount:0, CircCount:106753, TimeoutCount:1001
FailCount:0, DoneCount:0, CircCount:110008, TimeoutCount:1001
FailCount:0, DoneCount:0, CircCount:110216, TimeoutCount:1001

@ismaelhamed
Copy link
Member Author

NOTE: for whatever reason, Task.WaitAsync in .NET6 also fails sometimes.

@ismaelhamed
Copy link
Member Author

BTW, some of the CB's tests are failing now because they seem tweaked to work with the current CB implementation. If this PR goes ahead, I'll make sure to fix them all.

@Aaronontheweb
Copy link
Member

BTW, some of the CB's tests are failing now because they seem tweaked to work with the current CB implementation. If this PR goes ahead, I'll make sure to fix them all.

got it - thanks for letting us know!

@Aaronontheweb
Copy link
Member

Does #6108 need to be merged first or does this?

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - please proceed

src/core/Akka.Tests/Pattern/CircuitBreakerStressSpec.cs Outdated Show resolved Hide resolved
src/core/Akka/Util/Internal/AtomicState.cs Show resolved Hide resolved
@ismaelhamed
Copy link
Member Author

Does #6108 need to be merged first or does this?

No, I've included the CircuitBreakerStressSpec in this one too.

@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch 2 times, most recently from 9dcef85 to c22486f Compare September 26, 2022 07:38
@ismaelhamed
Copy link
Member Author

For WithSyncCircuitBreaker #L259, I wonder if we should just do:

public void WithSyncCircuitBreaker(Action body) => 
    WithCircuitBreaker(body, b => Task.Run(b)).GetAwaiter().GetResult();

@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch 3 times, most recently from bf42423 to bda4de1 Compare September 28, 2022 06:07
@Aaronontheweb
Copy link
Member

LMK when this is ready for review

@Aaronontheweb
Copy link
Member

For WithSyncCircuitBreaker #L259, I wonder if we should just do:

public void WithSyncCircuitBreaker(Action body) => 
    WithCircuitBreaker(body, b => Task.Run(b)).GetAwaiter().GetResult();

I think that's fine IMHO

@ismaelhamed ismaelhamed marked this pull request as ready for review September 28, 2022 09:54
@ismaelhamed
Copy link
Member Author

@Aaronontheweb this is ready for review. I have an improved version of the WaitAsync with some optimizations and support token cancellation (taken from the .NET6 implementation), but I'd prefer an initial review first.

@Aaronontheweb
Copy link
Member

Looks like you have a test suite issue here:

Akka.Tests.Pattern.ASynchronousCircuitBreakerThatIsClosed.A synchronous circuit breaker that is closed must increment failure count on callTimeout before call finishes
System.AggregateException : One or more errors occurred. (Timeout 00:00:00.1000000 expired while waiting for condition.
Expected: True
Actual:   False)
---- Timeout 00:00:00.1000000 expired while waiting for condition.
Expected: True
Actual:   False

@ismaelhamed ismaelhamed marked this pull request as draft October 11, 2022 07:54
@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch 5 times, most recently from 8b69365 to 29c012a Compare October 13, 2022 10:47
@Aaronontheweb
Copy link
Member

Going to re-review this this week

@ismaelhamed
Copy link
Member Author

I haven't had the time to work on this some more, and I'm still not sure why that test keeps failing. I even went ahead and reimplemented Within and AwaitCond (this one specially doesn't work like the JVM, so it might be skewing tests), but no luck.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have your failing test figured out

@@ -293,10 +293,59 @@ protected static async Task<bool> InternalAwaitConditionAsync(Func<Task<bool>> c
return true;
}

private static void ConditionalLog(ILoggingAdapter logger, string format, params object[] args)
protected void AwaitCond(Func<bool> p, TimeSpan? max = null, TimeSpan? interval = null, string message = "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have an AwaitCondition method or does this do something different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of AwaitCondition is different from the JVM, that's why I tried a straight port instead. No luck, keeps failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AwaitCondition arbitrarily calculates the interval(when not specified) as a 10th of max. It should be a fixed 100 ms, otherwise tests ported from the JVM might not behave as expected.

{
[Fact(DisplayName = "A synchronous circuit breaker that is half open should pass call and transition to close on success")]
public void Should_Pass_Call_And_Transition_To_Close_On_Success( )
[Fact(DisplayName = "A synchronous circuit breaker that is closed must increment failure count on callTimeout before call finishes")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this spec is currently racy

var breaker = ShortCallTimeoutCb();
Task.Run(() => breaker.Instance.WithSyncCircuitBreaker(() => Thread.Sleep(Dilated(TimeSpan.FromSeconds(1)))));
Within(TimeSpan.FromMilliseconds(900),
() => AwaitCond(() => breaker.Instance.CurrentFailureCount == 1, Dilated(TimeSpan.FromMilliseconds(100))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your problem here is your parameters on AwaitCond - you meant to set Dilated(TimeSpan.FromMilliseconds(100)) as the interval but it's being used as the max value here. @ismaelhamed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aaronontheweb , see my comment above. In the meantime, I've fixed it by passing both max and interval.

src/core/Akka/Util/Extensions/TaskExtensions.cs Outdated Show resolved Hide resolved
@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch 4 times, most recently from 052c442 to ffe4c43 Compare December 28, 2022 08:39
@Aaronontheweb
Copy link
Member

@ismaelhamed is this ready for review?

@ismaelhamed
Copy link
Member Author

@ismaelhamed is this ready for review?

The implementation yes, but I couldn't figure out where the problem with the specs was. I'll give it another shot soon.

@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch 2 times, most recently from 867be27 to 30a69af Compare March 22, 2023 06:47
@ismaelhamed
Copy link
Member Author

Unrelated test failing now.

@ismaelhamed ismaelhamed marked this pull request as ready for review March 22, 2023 09:24
@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch from 30a69af to 93ac7e8 Compare March 23, 2023 07:07
@Aaronontheweb
Copy link
Member

@ismaelhamed is this good for me to review again?

@Aaronontheweb
Copy link
Member

ah yes, it is - you just requested one from me! I'll get right on it.

@Aaronontheweb Aaronontheweb added this to the 1.5.2 milestone Mar 23, 2023
@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch from 93ac7e8 to dc45d73 Compare March 24, 2023 06:40
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) March 24, 2023 15:23
@Aaronontheweb
Copy link
Member

Queued for auto-merge - nice work @ismaelhamed

@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch from fd41320 to 8b793ed Compare March 25, 2023 06:40
@ismaelhamed ismaelhamed force-pushed the circuit-breaker-wait branch from 8b793ed to 3f1490c Compare March 28, 2023 05:37
@Aaronontheweb Aaronontheweb disabled auto-merge March 28, 2023 18:03
@Aaronontheweb Aaronontheweb merged commit d156ff4 into akkadotnet:dev Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants