Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: port tpcc/mixed-headroom to the new framework #113706

Merged
merged 1 commit into from
Nov 10, 2023

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Nov 2, 2023

Note that the new framework randomizes how many version upgrades occur. This removes the need for both a single upgrade and multiple upgrade test and the two were merged.

Release note: None
Fixes: #110537

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong DarrylWong added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Nov 2, 2023
@DarrylWong DarrylWong force-pushed the port-tpcc branch 3 times, most recently from 48e0bc2 to 292a48c Compare November 6, 2023 19:53
@DarrylWong
Copy link
Contributor Author

RFAL

Passed on gceworker + TC job

@DarrylWong DarrylWong marked this pull request as ready for review November 6, 2023 19:57
@DarrylWong DarrylWong requested a review from a team as a code owner November 6, 2023 19:57
@DarrylWong DarrylWong requested review from srosenberg and renatolabs and removed request for a team November 6, 2023 19:57
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):

// `bank` dataset, and runs one or multiple database upgrades while a
// TPCC workload is running. The number of database upgrades is
// controlled by the `versionsToUpgrade` parameter.

This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume mixedversion chooses a random older but supported version to start with?)


pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):

	}

	uploadVersion(ctx, t, c, workloadNode, clusterupgrade.CurrentVersion())

At which point in this test sequence do we start the upgrade process? Is it at some point during the runTPCCWorkload because that is the InMixedVersion step?

Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):

Previously, RaduBerinde wrote…

This needs updating (no more versionsToUpgrade). It would be good to include a high level idea of how it works (I assume mixedversion chooses a random older but supported version to start with?)

Good catch. Done.


pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):

Previously, RaduBerinde wrote…

At which point in this test sequence do we start the upgrade process? Is it at some point during the runTPCCWorkload because that is the InMixedVersion step?

The upgrade process is started after OnStartup but before InMixedVersion, where InMixedVersion hooks will be called once per upgrade step. It's randomized when, but the general idea is if we are going from v23.1 to v23.2, then at any point before it upgrades an individual node, it could call runTPCCWorkload, doing so once per cluster upgrade/downgrade step.

Taking a peek at the planning output in test.log probably gives a better idea of a potential plan than what I can explain. (I had to save it locally for the formatting to be nice btw)

runTPCCWorkload := func(ctx context.Context, l *logger.Logger, rng *rand.Rand, h *mixedversion.Helper) error {
cmd := roachtestutil.NewCommand("./cockroach workload run tpcc").
Arg("{pgurl%s}", crdbNodes).
Flag("duration", rampDuration(c.IsLocal())).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to have the duration be separate from rampDuration.

Also, I think the following change would be welcome in order to test the migrations:

workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
    rampDur = 1 * time.Minute
    workloadDur = 100 * time.Minute
}

We want the workload to ramp up faster when migrations are running to expose them to more concurrent load. Also letting the workload run for longer in that case is good, and mimics what the previous version of this test was doing.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I'll give 100 minutes a test run but are we concerned at all that this will take too long? I'm not sure what the chances are but could it possibly call runTPCCWorkload every time while migrations are happening and all of a sudden this test takes 10+ hours to finish?

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):

I had to save it locally for the formatting to be nice btw

If anyone uses firefox, the Repair Text Encoding option should fix it:

Screenshot 2023-11-07 at 3.32.38 PM.png

Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):

Previously, DarrylWong wrote…

Good catch. Done.

I don't think you pushed the new version.


pkg/cmd/roachtest/tests/tpcc.go line 430 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I had to save it locally for the formatting to be nice btw

If anyone uses firefox, the Repair Text Encoding option should fix it:

Screenshot 2023-11-07 at 3.32.38 PM.png

Not sure if there is a way to tell TC an artifact is UTF-8 encoded so that this wouldn't be necessary.

Very cool, thanks!

Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):

Previously, RaduBerinde wrote…

I don't think you pushed the new version.

Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 359 at r1 (raw file):

Previously, DarrylWong wrote…

Ah my bad, saw Renato's comment last second and got distracted, it's pushed now.

Thanks, looks great!

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 412 at r1 (raw file):

every time while migrations are happening and all of a sudden this test takes 10+ hours to finish

We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:

workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
    rampDur = 1 * time.Minute
    if h.Context().ToVersion.IsCurrent() {
        workloadDur = 100 * time.Minute
    }
}

Note that the new framework randomizes how many version
upgrades occur. This removes the need for both a single
upgrade and multiple upgrade test and the two were merged.

Release note: None
Fixes: cockroachdb#110537
Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/tests/tpcc.go line 412 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

every time while migrations are happening and all of a sudden this test takes 10+ hours to finish

We perform up to 3 upgrades by default, so worst case scenario would be 300 minutes or 5h, which is long but not the longest roachtest to exist. That said, I didn't realize the existing roachtest only uses 100 minutes if we are upgrading to current, so we can do that here too, i.e., only bump the duration if we are upgrading to the current version:

workloadDur := 10 * time.Minute
rampDur := rampDuration(c.IsLocal())
if h.Context().Finalizing && !c.IsLocal() {
    rampDur = 1 * time.Minute
    if h.Context().ToVersion.IsCurrent() {
        workloadDur = 100 * time.Minute
    }
}

Done.

Test run here

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @RaduBerinde, and @srosenberg)

@DarrylWong
Copy link
Contributor Author

TFTRs!

bors r=renatolabs, RaduBerinde

@craig
Copy link
Contributor

craig bot commented Nov 10, 2023

Build succeeded:

@craig craig bot merged commit 194151c into cockroachdb:master Nov 10, 2023
3 checks passed
@renatolabs
Copy link
Contributor

blathers backport 23.1

Copy link

blathers-crl bot commented Nov 17, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 8cdcf77 to blathers/backport-release-23.1-113706: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: port tpcc/mixed-headroom/n5cpu16 to new mixed-version framework
4 participants