-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: port tpcc/mixed-headroom to the new framework #113706
Conversation
48e0bc2
to
292a48c
Compare
RFAL Passed on gceworker + TC job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 359 at r1 (raw file):
// `bank` dataset, and runs one or multiple database upgrades while a // TPCC workload is running. The number of database upgrades is // controlled by the `versionsToUpgrade` parameter.
This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume mixedversion
chooses a random older but supported version to start with?)
pkg/cmd/roachtest/tests/tpcc.go
line 430 at r1 (raw file):
} uploadVersion(ctx, t, c, workloadNode, clusterupgrade.CurrentVersion())
At which point in this test sequence do we start the upgrade process? Is it at some point during the runTPCCWorkload
because that is the InMixedVersion
step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 359 at r1 (raw file):
Previously, RaduBerinde wrote…
This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume
mixedversion
chooses a random older but supported version to start with?)
Good catch. Done.
pkg/cmd/roachtest/tests/tpcc.go
line 430 at r1 (raw file):
Previously, RaduBerinde wrote…
At which point in this test sequence do we start the upgrade process? Is it at some point during the
runTPCCWorkload
because that is theInMixedVersion
step?
The upgrade process is started after OnStartup
but before InMixedVersion
, where InMixedVersion
hooks will be called once per upgrade step. It's randomized when, but the general idea is if we are going from v23.1 to v23.2, then at any point before it upgrades an individual node, it could call runTPCCWorkload
, doing so once per cluster upgrade/downgrade step.
Taking a peek at the planning output in test.log probably gives a better idea of a potential plan than what I can explain. (I had to save it locally for the formatting to be nice btw)
pkg/cmd/roachtest/tests/tpcc.go
Outdated
runTPCCWorkload := func(ctx context.Context, l *logger.Logger, rng *rand.Rand, h *mixedversion.Helper) error { | ||
cmd := roachtestutil.NewCommand("./cockroach workload run tpcc"). | ||
Arg("{pgurl%s}", crdbNodes). | ||
Flag("duration", rampDuration(c.IsLocal())). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to have the duration be separate from rampDuration
.
Also, I think the following change would be welcome in order to test the migrations:
workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
rampDur = 1 * time.Minute
workloadDur = 100 * time.Minute
}
We want the workload to ramp up faster when migrations are running to expose them to more concurrent load. Also letting the workload run for longer in that case is good, and mimics what the previous version of this test was doing.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I'll give 100 minutes a test run but are we concerned at all that this will take too long? I'm not sure what the chances are but could it possibly call runTPCCWorkload
every time while migrations are happening and all of a sudden this test takes 10+ hours to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 430 at r1 (raw file):
I had to save it locally for the formatting to be nice btw
If anyone uses firefox, the Repair Text Encoding
option should fix it:
Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 359 at r1 (raw file):
Previously, DarrylWong wrote…
Good catch. Done.
I don't think you pushed the new version.
pkg/cmd/roachtest/tests/tpcc.go
line 430 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
I had to save it locally for the formatting to be nice btw
If anyone uses firefox, the
Repair Text Encoding
option should fix it:Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.
Very cool, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 359 at r1 (raw file):
Previously, RaduBerinde wrote…
I don't think you pushed the new version.
Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 359 at r1 (raw file):
Previously, DarrylWong wrote…
Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.
Thanks, looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 412 at r1 (raw file):
every time while migrations are happening and all of a sudden this test takes 10+ hours to finish
We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:
workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
rampDur = 1 * time.Minute
if h.Context().ToVersion.IsCurrent() {
workloadDur = 100 * time.Minute
}
}
Note that the new framework randomizes how many version upgrades occur. This removes the need for both a single upgrade and multiple upgrade test and the two were merged. Release note: None Fixes: cockroachdb#110537
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/tpcc.go
line 412 at r1 (raw file):
Previously, renatolabs (Renato Costa) wrote…
every time while migrations are happening and all of a sudden this test takes 10+ hours to finish
We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:
workloadDur := 10 * time.Minute rampDur := rampDuration(c.IsLocal()) if h.Context().Finalizing && !c.IsLocal() { rampDur = 1 * time.Minute if h.Context().ToVersion.IsCurrent() { workloadDur = 100 * time.Minute } }
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)
TFTRs! bors r=renatolabs, RaduBerinde |
Build succeeded: |
blathers backport 23.1 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 8cdcf77 to blathers/backport-release-23.1-113706: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Note that the new framework randomizes how many version upgrades occur. This removes the need for both a single upgrade and multiple upgrade test and the two were merged.
Release note: None
Fixes: #110537