Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.1: cmd: wholesale replace roachprod, roachtest, workload from master #55892

Merged
merged 1 commit into from
Oct 31, 2020

Conversation

adityamaru
Copy link
Contributor

Informs: #51897

@adityamaru adityamaru added the do-not-merge bors won't merge a PR with this label. label Oct 23, 2020
@adityamaru adityamaru requested a review from jlinder October 23, 2020 13:51
@adityamaru adityamaru requested a review from a team as a code owner October 23, 2020 13:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 23, 2020

Two incompatible roachtests deleted:
backup/KMS - which I can sign off for
tpcdsvec - which Yahor will sign off for

schemachange workload was deleted too as this had several 20.2 dependancies.

if !errors.Is(err, server.ErrIncompatibleBinaryVersion) {
t.Fatalf("expected err: %s, got %v", server.ErrIncompatibleBinaryVersion, err)
}
// TODO(adityamaru): Check whether we want to backport this particular
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irfansharif this test wasn't backported to 20.1. The only incompatibility seems to be ErrIncompatibleBinaryVersion not being an error mode in 20.1. Just wanted to run this by you, whether this is fine as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping this test in 20.1 makes sense. It only makes sense for v20.2. It looks like the min version would bar this from running already. Is that not what you saw happen?

	r.Add(testSpec{
		Name:       "join-init/mixed",
		Owner:      OwnerKV,
		MinVersion: "v20.2.0",

Copy link
Contributor Author

@adityamaru adityamaru Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sorry I didnt catch that. I'm just going to delete it in that case if that's fine? Mainly cause the file doesn't compile as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that's more than fine.

@adityamaru adityamaru force-pushed the release-20.1-roach branch 2 times, most recently from 383a205 to 1e6d4d4 Compare October 23, 2020 17:40
@adityamaru
Copy link
Contributor Author

@yuzefovich the last commit is RFAL. I ran it and it passes.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

Reviewed 1 of 1 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @jlinder)


pkg/cmd/roachtest/tpcdsvec.go, line 149 at r7 (raw file):

			// emit an error because of that difference.
			if err := cmpconn.CompareConns(
				ctx, timeout, conns, "", "SHOW vectorize;"); err == nil {

super nit: it's an unusual break of this condition into two lines - I'd either combine this and the previous line into one or break this line into two.

@adityamaru adityamaru force-pushed the release-20.1-roach branch 3 times, most recently from ecc3098 to a4ea71f Compare October 23, 2020 21:03
@@ -1,21 +0,0 @@
// Copyright 2020 The Cockroach Authors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting this file which was backported in #55905, because we already have a file in testutils/stress.go which does this. Switching all unit tests to use this method is not in the scope of this PR and has already been done for 20.2+

@adityamaru adityamaru force-pushed the release-20.1-roach branch 2 times, most recently from 27c44d1 to 90d2578 Compare October 23, 2020 23:16
@adityamaru
Copy link
Contributor Author

sorry for the force push noise, I'll ping the PR once CI behaves.

@adityamaru adityamaru force-pushed the release-20.1-roach branch 6 times, most recently from 1ab34e0 to 151d33d Compare October 26, 2020 13:42
@adityamaru
Copy link
Contributor Author

I think this is ready for review, I can't figure out why the unit test is failing remotely but passing locally, and why lint thinks my workspace isn't clean. I'll work on getting CI green but would love to get some 👀 on the commits.

@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 26, 2020

@ajwerner adding you for the last commit, which backports the schemachange workload.

All commits have been squashed, the schemachange workload changes are to be reviewed!

@adityamaru adityamaru requested a review from ajwerner October 26, 2020 14:13
@irfansharif
Copy link
Contributor

irfansharif commented Oct 26, 2020

why lint thinks my workspace isn't clean.

diff --git a/pkg/workload/schemachange/optype_string.go b/pkg/workload/schemachange/optype_string.go
index e486a1b5ed..95858aa212 100644
--- a/pkg/workload/schemachange/optype_string.go
+++ b/pkg/workload/schemachange/optype_string.go
@@ -37,9 +37,9 @@ func _() {
 	_ = x[validate-26]
 }
 
-const _opType_name = "addColumnaddConstraintcreateIndexcreateSequencecreateTablecreateTableAscreateViewcreateEnumcreateSchemadropColumndropColumnDefaultdropColumnNotNulldropColumnStoreddropConstraintdropIndexdropSequencedropTabledropViewdropSchemarenameColumnrenameIndexrenameSequencerenameTablerenameViewsetColumnDefaultsetColumnNotNullsetColumnTypeinsertRowvalidate"
+const _opType_name = "addColumnaddConstraintcreateIndexcreateSequencecreateTablecreateTableAscreateViewdropColumndropColumnDefaultdropColumnNotNulldropColumnStoreddropConstraintdropIndexdropSequencedropTabledropViewdropSchemarenameColumnrenameIndexrenameSequencerenameTablerenameViewsetColumnDefaultsetColumnNotNullsetColumnTypeinsertRowvalidate"
 
-var _opType_index = [...]uint16{0, 9, 22, 33, 47, 58, 71, 81, 91, 103, 113, 130, 147, 163, 177, 186, 198, 207, 215, 225, 237, 248, 262, 273, 283, 299, 315, 328, 337, 345}
+var _opType_index = [...]uint16{0, 9, 22, 33, 47, 58, 71, 81, 91, 108, 125, 141, 155, 164, 176, 185, 193, 203, 215, 226, 240, 251, 261, 277, 293, 306, 315, 323}

Looks like it's finding the following diff. Did you run make generate at after having changed out the workload wholesale?

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking all of this on, this looks like it was a ton of work to get just right. One thing I'm having difficulty in reviewing is actually making sense of the diffs themselves. Do you mind outlining in text for me exactly all the things we did in this PR? I think I would understand it all best if:

  • it was one squashed commit (which we'll want anyway to make sure none of the intermediate commits are broken, for bisectability)
  • for each diff between release-20.1 code and this new wholesale imported code, we had links to point to to understand why it was needed. If a test was written incorrectly, and we had fixed it in master, I would point to the fix PR itself. If we refactored some test code (like adding better github project routing for e.g.), I'd point to that PR (if we're deciding to keep it, which looks like we didn't?). That sort of thing.

I also don't mean to make it any more tedious for you than it already is. I just remember from past experiences that small changes to our roachtest infrastructure can start lots of little fires throughout (Ref: cockroachdb/jepsen#24). This is a very big change, so we'd want to be very careful.

pkg/cmd/roachtest/backup.go Outdated Show resolved Hide resolved
@@ -327,6 +327,29 @@ func (s *bankState) startSplitMonkey(ctx context.Context, d time.Duration, c *cl
}()
}

// IsExpectedRelocateError maintains an allowlist of errors related to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did these diffs come from? Do you mind adding references to them in the commit message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#52647 moved isExpectedRelocateError to package kv. I didn't want to introduce any changes to package kv in this PR so I simply copy-pasted the method from there. Would you suggest otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@@ -12,10 +12,10 @@ package workload_test

import (
"fmt"
"github.com/cockroachdb/cockroach/pkg/col/coltypes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Imports will need to move down.

@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 26, 2020

Thanks for taking all of this on, this looks like it was a ton of work to get just right. One thing I'm having difficulty in reviewing is actually making sense of the diffs themselves. Do you mind outlining in text for me exactly all the things we did in this PR? I think I would understand it all best if:

  • it was one squashed commit (which we'll want anyway to make sure none of the intermediate commits are broken, for bisectability)
  • for each diff between release-20.1 code and this new wholesale imported code, we had links to point to to understand why it was needed. If a test was written incorrectly, and we had fixed it in master, I would point to the fix PR itself. If we refactored some test code (like adding better github project routing for e.g.), I'd point to that PR (if we're deciding to keep it, which looks like we didn't?). That sort of thing.

I also don't mean to make it any more tedious for you than it already is. I just remember from past experiences that small changes to our roachtest infrastructure can start lots of little fires throughout (Ref: cockroachdb/jepsen#24). This is a very big change, so we'd want to be very careful.

That makes a lot of sense, I've squashed all the commits and fleshed out the commit message to give a high level overview of the major changes. Going to go in and link the diffs from master inline.

Edit: Gone through 75% of the files to inline comment PRs where major changes were made. Going to come back to this in a bit.

Edit: Did another round of PR tagging. Going to hold off till I get some review comments. I don't really know how to make this easier to review :/

pkg/cmd/roachtest/allocator.go Show resolved Hide resolved
pkg/cmd/roachtest/alterpk.go Show resolved Hide resolved
pkg/cmd/roachtest/alterpk.go Show resolved Hide resolved
pkg/cmd/roachtest/autoupgrade.go Show resolved Hide resolved
pkg/cmd/roachtest/backup.go Show resolved Hide resolved
pkg/cmd/roachtest/django.go Show resolved Hide resolved
pkg/cmd/roachtest/drop.go Show resolved Hide resolved
pkg/cmd/roachtest/follower_reads.go Show resolved Hide resolved
pkg/cmd/roachtest/import.go Show resolved Hide resolved
pkg/cmd/roachtest/inconsistency.go Show resolved Hide resolved
@adityamaru adityamaru force-pushed the release-20.1-roach branch 2 times, most recently from 5d22b6e to 501eb6e Compare October 28, 2020 14:28
pkg/cmd/roachtest/kv.go Show resolved Hide resolved
pkg/cmd/roachtest/many_splits.go Show resolved Hide resolved
pkg/cmd/roachtest/mixed_version_jobs.go Show resolved Hide resolved
pkg/cmd/roachtest/quit.go Show resolved Hide resolved
pkg/cmd/roachtest/replicagc.go Show resolved Hide resolved
pkg/cmd/roachtest/secondary_indexes.go Show resolved Hide resolved
pkg/cmd/roachtest/sqlalchemy.go Show resolved Hide resolved
@@ -0,0 +1,49 @@
// Code generated by "stringer -type=opType"; DO NOT EDIT.
Copy link
Contributor Author

@adityamaru adityamaru Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajwerner I had to drop enum related logic from the schemachange workload to make it 20.1 compatible. The squash hides the compatibility changes I had to make - would you prefer this to be an independent PR where I add masters version of schemachange, and then make the compatibility tweaks?

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes here LGTM, assuming it all checks out in CI (I'm unsure how to make sure we're running all roachtests in CI for release-20.1, I assume that's what you've been doing?). Major plause for taking all of this one, this must have been a slog. Linking in the referencing PRs really helped for the review. Hopefully it helped isolate the points where we needed more manual intervention, in case we need to revisit going forward.

I'm glad most of it were clean(-ish) backports/wholesale replacement. I think that'll make future backports to roachtest/roachprod easy enough to not want to run them off of master again. I'll stamp off here, and leave you to discuss with Andrew regarding what's to be done with the schamachange workload diffs.

@adityamaru adityamaru changed the title cmd: wholesale replace roachprod, roachtest, workload from master cmd: release-20.1 - wholesale replace roachprod, roachtest, workload from master Oct 29, 2020
@jlinder
Copy link
Collaborator

jlinder commented Oct 29, 2020

I think the changes here LGTM, assuming it all checks out in CI (I'm unsure how to make sure we're running all roachtests in CI for release-20.1, I assume that's what you've been doing?).

The changes that need to happen in CI:

  • Change the script in the Roachtest Nightly GCE build config to use the branch it was triggered for (and not master)
  • Change the script in the Roachtest Nightly AWS build config to use the branch it was triggered for (and not master)

These changes need to happen in concert with the 19.2 and the 20.2 PRs.

The roachtest build config already runs the roachtests from its branch and does not need to be changed.

@adityamaru adityamaru changed the title cmd: release-20.1 - wholesale replace roachprod, roachtest, workload from master release-20.1: cmd: wholesale replace roachprod, roachtest, workload from master Oct 29, 2020
@adityamaru adityamaru force-pushed the release-20.1-roach branch 2 times, most recently from ce066d5 to d7dee75 Compare October 30, 2020 15:32
@adityamaru
Copy link
Contributor Author

Added roachprod-stress and the two nightly build scripts to this PR too so that we can avoid having to reference master completely. Also rebased it on top of the lint util backport - #56123.

I'm not sure how to deal with the flaking TestAllRegisteredImportFixture/tpcc, its been an evasive flake which noone from bulk has been able to reproduce as can be seen in #47015, and it hasn't flaked in 2 months. I ran it on roachprod-stressrace for 35 mins with no failure.

…(ccl)

This change does the following:

- Deletes release-20.1 roachprod, roachtest, workload and workloadccl
  packages.

- Adds the above packages from master.

- Make all the above packages compatible with 20.1. I have highlighted
  some of the major changes in each package and will inline links to why
  they were changed on master. I took some decisions on what required a
  backport and what looked too invasive, but am open to suggestions
  here.

roachtest:
- Deleted roachtests with min version 20.2.0 namely:
	- backup/KMS
	- join-init/mixed
	- disk_full
	- multitenant acceptance
	- tpchvec/bench
	- tpchvec/perf_no_stats
- Backported `tpcdsvec` roachtest

workload:
- Backported `schemachange` workload
- deleted geospatial.go
- switched all `rowenc` pkg references back to `sqlbase`
- type representation in 20.2 differed from what was supported in 20.1.
  All workloads had to be switched back to using `colTypes`

workloadccl:
- More type representation related changes
- Set --deprecated-fk-indexes to be always true for tpcc workload

Release note: None
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema change workload :lgtm:

Reviewed 1 of 34 files at r11.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @jlinder, and @yuzefovich)

@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 30, 2020

CI is green wooop

I'm going to merge this, and then make a switch to the build template to special case release-20.1 to run on itself. cc: @jlinder

Thank you for all your reviews!

@adityamaru adityamaru removed the do-not-merge bors won't merge a PR with this label. label Oct 30, 2020
@adityamaru
Copy link
Contributor Author

Ran a special-cased run on TC for this branch to check that it passes a few GCE nightlies. Going to merge and trigger a nightly GCE run on release-20.1.

@adityamaru adityamaru merged commit d905ffe into cockroachdb:release-20.1 Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants