-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flakes: Address Common Unit Test Races #12546
Conversation
Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
a662305
to
89e142b
Compare
Signed-off-by: Matt Lord <[email protected]>
89e142b
to
8a63f38
Compare
go/vt/log/log.go
Outdated
@@ -78,5 +81,32 @@ var ( | |||
// calls this function, or call this function directly before parsing | |||
// command-line arguments. | |||
func RegisterFlags(fs *pflag.FlagSet) { | |||
fs.Uint64Var(&glog.MaxSize, "log_rotate_max_size", glog.MaxSize, "size in bytes at which logs are rotated (glog.MaxSize)") | |||
flagVal := logRotateMaxSize{ | |||
val: "1887436800", // glog.MaxSize value (which is not concurrency safe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copying the value here, can't we use glob.MaxSize
itself here? Maybe with an atomic load then I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, lol, come on GitHub PR UI #12546 (comment)
so, uhh, yeah I agree with @dbussink here 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the first thing I tried and it helped but not as much (failed every 20-30 times locally IIRC). I could go back to that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving it a go in the CI now (will do 20 test runs again): 0c5be06
IIRC, the race moved to pflag.Parse/TryParse around flag.TrickGlog but let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news is that we don't seem to lose much if any reliability in the CI with that change: https://github.com/vitessio/vitess/actions/runs/4336855297/attempts/19
So I think we all feel better about it now. 🙂 Thanks!
Signed-off-by: Matt Lord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one small (potential) improvement, take it or leave it! and thanks!
go/vt/vtctl/vdiff_env_test.go
Outdated
@@ -94,6 +80,17 @@ func newTestVDiffEnv(sourceShards, targetShards []string, query string, position | |||
} | |||
env.wr = wrangler.NewTestWrangler(env.cmdlog, env.topoServ, env.tmc) | |||
|
|||
dialerName := fmt.Sprintf("VDiffTest-%d", rand.Intn(1000000000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we update this to take a *testing.T
(or better, testing.TB
), you can parameterize this to fmt.Sprintf("VDiffTest-%s", t.Name())
and avoid any chance of collision entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I did that here (kept the rand int part for too): 2d7e509
Thanks!
Signed-off-by: Matt Lord <[email protected]>
* Deflake unit race tests Signed-off-by: Matt Lord <[email protected]> * Try to address glog.MaxSize race Signed-off-by: Matt Lord <[email protected]> * Test w/o using literal copy of glog.MaxSize value in CI Signed-off-by: Matt Lord <[email protected]> * Make the dialer names truly unique Signed-off-by: Matt Lord <[email protected]> --------- Signed-off-by: Matt Lord <[email protected]>
* Flakes: Address Common Unit Test Races (#12546) * Deflake unit race tests Signed-off-by: Matt Lord <[email protected]> * Try to address glog.MaxSize race Signed-off-by: Matt Lord <[email protected]> * Test w/o using literal copy of glog.MaxSize value in CI Signed-off-by: Matt Lord <[email protected]> * Make the dialer names truly unique Signed-off-by: Matt Lord <[email protected]> --------- Signed-off-by: Matt Lord <[email protected]> * Use atomic.Bool for fakesqldb behavior flags (#12603) Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Max Englander <[email protected]> Co-authored-by: Matt Lord <[email protected]> --------- Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Max Englander <[email protected]> Co-authored-by: Matt Lord <[email protected]> Co-authored-by: Max Englander <[email protected]>
Description
The unit race workflow was failing pretty regularly in the CI (~20% of the time), and the two failures that I would regularly see were:
testVDiffEnv
related races — most often in the wrangler unit tests, as they have the widest useglog.MaxSize
related races — most often in the wrangler unit tests as they create a LOT of loggers, primarily ConsoleLoggers and MemoryLoggersglog.MaxSize
global exported variable in our--log_rotate_max_size
flag handling was not concurrency-safeI was able to quickly repeat these failures locally as well using the method below. After the changes in this PR, I was able to run it successfully 100 times in a row:
It was failing in the CI about 1 in 4-5 attempts. In this PR it passed 19 out of 20 times, 18 times in a row — which is at least a big improvement.
Related Issue(s)
Checklist