Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce sensitivity of fncache cancellation test #14585

Merged
merged 4 commits into from
Jul 27, 2022

Conversation

fspmarshall
Copy link
Contributor

The FnCacheCancellation test was using a 10ms timeout for an operation that required waking a background goroutine and completing multiple channel operations in order to succeed. Given the resource constraints of our CI env, I think this was probably optimistic, especially for a parallel test. Also fixes an invocation of t.Fatal that was outside of the main test goroutine.

Closes #14556

@github-actions github-actions bot requested review from jakule and Tener July 18, 2022 16:49
@fspmarshall fspmarshall force-pushed the fspmarshall/improve-fncache-cancellation-test branch from 2a4bca3 to 63a3d1c Compare July 18, 2022 16:49
lib/utils/fncache_test.go Outdated Show resolved Hide resolved
lib/utils/fncache_test.go Outdated Show resolved Hide resolved
defer cancel()

loadFnWasRun := atomic.NewBool(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but I think this will panic on a 32-bit arch. We've already had issues like that #11822

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use stdlib atomic int64 for now? Next version of Go will have a built-in atomic bool.

Copy link
Contributor Author

@fspmarshall fspmarshall Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakule @zmb3 I think I can answer both questions at the same time:

We started using go.uber.org/atomic a couple years back precisely because of issues w/ alignment (and the generally poor standard library API). go.uber.org/atomic automatically handles alignment and protects from non-atomic access. #11822 wouldn't have happened if we'd been better about using go.uber.org/atomic everywhere. IMO we should keep preferring go.uber.org/atomic over the standard library's API until we get to go1.19 (when the standard library will start exposing an equivalent API of its own).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go.uber.org/atomic automatically handles alignment and protects from non-atomic access.

I don't think this is entirely true, ex: uber-go/atomic#105
But I checked, and this use seems to work on the 32bit arch. I didn't realize that this is uber atomic, not stdlib.
I agree with @zmb3 that we should prefer stuff that is in the stdlib, but atomics is one of those things that so far causes us a lot of problems, and Go 1.19 is not available for us now.

@fspmarshall Do you know why did we decide to stop using Uber atomics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider the linked issue to be an abuse of the API rather than a failure of the library. They are manually dereferencing the wrapper returned by NewDuration(), which breaks alignment. IMO if you're dereferencing opaque types returned by concurrency libraries, you should expect some problems 😬

Do you know why did we decide to stop using Uber atomics?

If we stopped, I certainly wasn't aware of it. I always encourage people to use it when I'm reviewing, and I use it in all the code I write 🤷 The team was a lot smaller back then. Probably just something that got lost along the way.

We do deliberately avoid it in the api package because we are a more aggressive about introducing external deps in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we/I just lost the context. When I was looking at our code, I wasn't aware that we were using Uber atomics as most of the usage that I saw used the std library.

@fspmarshall fspmarshall force-pushed the fspmarshall/improve-fncache-cancellation-test branch from 63a3d1c to 33fc5c9 Compare July 20, 2022 17:07
@fspmarshall fspmarshall requested review from zmb3 and jakule July 20, 2022 17:15
@fspmarshall fspmarshall enabled auto-merge (rebase) July 20, 2022 17:16
@zmb3
Copy link
Collaborator

zmb3 commented Jul 20, 2022

@fspmarshall is there a way we can write a deterministic test to cover this cancellation logic rather than something that depends on time and just tweaking timeouts?

@fspmarshall fspmarshall force-pushed the fspmarshall/improve-fncache-cancellation-test branch from 33fc5c9 to ab02676 Compare July 21, 2022 17:12
@fspmarshall
Copy link
Contributor Author

@zmb3 Took another pass. Wasn't able to remove timeouts, not sure if that's practical for this kind of test, but I was able to move all timeouts out of the happy path, which means I can use much larger timeouts without them affecting the runtime of the test when everything is working correctly.

Copy link
Contributor

@jakule jakule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this test is currently the top offender. I think it makes sense to merge this change and see what happens. If the timeouts are still a problem, we can revisit this topic.

@fspmarshall fspmarshall merged commit 885c893 into master Jul 27, 2022
@zmb3 zmb3 deleted the fspmarshall/improve-fncache-cancellation-test branch September 9, 2022 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestFnCacheCancellation flakiness
5 participants