-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce sensitivity of fncache cancellation test #14585
reduce sensitivity of fncache cancellation test #14585
Conversation
2a4bca3
to
63a3d1c
Compare
defer cancel() | ||
|
||
loadFnWasRun := atomic.NewBool(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but I think this will panic on a 32-bit arch. We've already had issues like that #11822
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use stdlib atomic int64 for now? Next version of Go will have a built-in atomic bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakule @zmb3 I think I can answer both questions at the same time:
We started using go.uber.org/atomic
a couple years back precisely because of issues w/ alignment (and the generally poor standard library API). go.uber.org/atomic
automatically handles alignment and protects from non-atomic access. #11822 wouldn't have happened if we'd been better about using go.uber.org/atomic
everywhere. IMO we should keep preferring go.uber.org/atomic
over the standard library's API until we get to go1.19
(when the standard library will start exposing an equivalent API of its own).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go.uber.org/atomic automatically handles alignment and protects from non-atomic access.
I don't think this is entirely true, ex: uber-go/atomic#105
But I checked, and this use seems to work on the 32bit arch. I didn't realize that this is uber
atomic, not stdlib
.
I agree with @zmb3 that we should prefer stuff that is in the stdlib, but atomics is one of those things that so far causes us a lot of problems, and Go 1.19 is not available for us now.
@fspmarshall Do you know why did we decide to stop using Uber atomics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider the linked issue to be an abuse of the API rather than a failure of the library. They are manually dereferencing the wrapper returned by NewDuration()
, which breaks alignment. IMO if you're dereferencing opaque types returned by concurrency libraries, you should expect some problems 😬
Do you know why did we decide to stop using Uber atomics?
If we stopped, I certainly wasn't aware of it. I always encourage people to use it when I'm reviewing, and I use it in all the code I write 🤷 The team was a lot smaller back then. Probably just something that got lost along the way.
We do deliberately avoid it in the api
package because we are a more aggressive about introducing external deps in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we/I just lost the context. When I was looking at our code, I wasn't aware that we were using Uber atomics as most of the usage that I saw used the std
library.
63a3d1c
to
33fc5c9
Compare
@fspmarshall is there a way we can write a deterministic test to cover this cancellation logic rather than something that depends on time and just tweaking timeouts? |
33fc5c9
to
ab02676
Compare
@zmb3 Took another pass. Wasn't able to remove timeouts, not sure if that's practical for this kind of test, but I was able to move all timeouts out of the happy path, which means I can use much larger timeouts without them affecting the runtime of the test when everything is working correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test is currently the top offender. I think it makes sense to merge this change and see what happens. If the timeouts are still a problem, we can revisit this topic.
The
FnCacheCancellation
test was using a 10ms timeout for an operation that required waking a background goroutine and completing multiple channel operations in order to succeed. Given the resource constraints of our CI env, I think this was probably optimistic, especially for a parallel test. Also fixes an invocation oft.Fatal
that was outside of the main test goroutine.Closes #14556