Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: update version map for 20.1.6 and create fixtures #54761

Closed

Conversation

arulajmani
Copy link
Collaborator

This commit updates the PredecessorVersion to 20.1.6 and updates
fixtures.

Release note: None

@arulajmani arulajmani requested a review from jlinder September 24, 2020 19:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani
Copy link
Collaborator Author

I just realized #53668 never went in because a bors failure. I'll address the second commit from that PR in a separate patch, not sure why there was a test failure there.

Copy link
Collaborator

@jlinder jlinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arulajmani
Copy link
Collaborator Author

For some reason, updating the fixtures is causing the version-upgrade roachtest to fail. @asubiotto do you know what could be happening here? Tagging you because you were the last person to successfully upgrade the fixtures

@asubiotto
Copy link
Contributor

Hmm not sure. It seems like a cluster initialization failure:

	cluster.go:2138,versionupgrade.go:318,versionupgrade.go:190,versionupgrade.go:178,acceptance.go:58,acceptance.go:95,test_runner.go:755: /go/src/github.com/cockroachdb/cockroach/bin/roachprod start --binary=./cockroach-20.1.6 --encrypt=false --sequential=false local:1-4 returned: exit status 1
		(1) /go/src/github.com/cockroachdb/cockroach/bin/roachprod start --binary=./cockroach-20.1.6 --encrypt=false --sequential=false local:1-4 returned
		  | stderr:
		  | I200924 20:01:47.297667 23 cockroach.go:170  unable to initialize cluster: ~ cd ${HOME}/local/1 ; 
		  | 		if ! test -e /home/roach/local/1/data/cluster-bootstrapped ; then
		  | 			COCKROACH_CONNECT_TIMEOUT=0 /home/roach/local/1/cockroach-20.1.6 init --url 'postgres://root@localhost:26257?sslmode=disable' && touch /home/roach/local/1/data/cluster-bootstrapped
		  | 		fi
		  | warning: --url specifies user/password, but command "init" does not accept user/password details - details ignored
		  | *
		  | * ERROR: ERROR: rpc error: code = Unknown desc = cluster has already been initialized
		  | * HINT: Please ensure all your start commands are using --join.
		  | *
		  | E200924 20:01:47.293047 1 cli/error.go:373  ERROR: rpc error: code = Unknown desc = cluster has already been initialized
		  | HINT: Please ensure all your start commands are using --join.
		  | ERROR: rpc error: code = Unknown desc = cluster has already been initialized
		  | HINT: Please ensure all your start commands are using --join.
		  | Failed running "init": exit status 1
		  |
		  | stdout:
		  | local: starting nodes
		  | local: initializing cluster
		Wraps: (2) exit status 1

Not sure why this would have anything to do with the fixtures. What happens if you run the roachtest without the new fixtures? Like this we can isolate either the version or the fixtures. Maybe @tbg can offer some color

@tbg
Copy link
Member

tbg commented Sep 25, 2020

I think the problem is that roachtest changed since the fixtures were made? It's looking at the cluster-bootstrapped file and I think that is not in the fixture. @arulajmani when did you create these fixtures? Does the problem go away if you regenerate them now?

@arulajmani
Copy link
Collaborator Author

Re: running the roachtest without the fixture to isolate if the problem is the fixture or the map -- it's the fixture, it only fails once they're brought into the picture.

@tbg I created the fixtures yesterday right before I opened the PR, so I don't think that's the issue. I searched for this cluster-bootstrapped file, and seems like there's was #52040 from @irfansharif which fixes roachtest failures due to a double init on release-20.1, which sounds similar to what is happening here.

@tbg
Copy link
Member

tbg commented Sep 25, 2020

Ah - makes sense.

@irfansharif it looks like this will bite us every time we use fixtures for anything, no? It doesn't have anything to do with #51879. I think roachtest we need to either have roachtest detect whether store dir is empty initially or we need to put the cluster-bootstrapped file into the fixture. I guess the second one is easier!

@tbg
Copy link
Member

tbg commented Sep 25, 2020

@arulajmani could you try with this diff:

diff --git a/pkg/cmd/roachprod/install/cockroach.go b/pkg/cmd/roachprod/install/cockroach.go
index 04a1946990..a6ef929fc3 100644
--- a/pkg/cmd/roachprod/install/cockroach.go
+++ b/pkg/cmd/roachprod/install/cockroach.go
@@ -185,8 +185,8 @@ func (r Cockroach) Start(c *SyncedCluster, extraArgs []string) {
 			// addressing #51897.
 			//
 			// TODO(irfansharif): Remove this once #51897 is resolved.
-			markBootstrap := fmt.Sprintf("touch %s/%s", h.c.Impl.NodeDir(h.c, nodes[nodeIdx]), "cluster-bootstrapped")
-			cmdOut, err := h.run(nodeIdx, markBootstrap)
+			const bootstrapFile = "touch {store-dir}/cluster-bootstrapped"
+			cmdOut, err := h.run(nodeIdx, "touch "+bootstrapFile)
 			if err != nil {
 				log.Fatalf("unable to run cmd: %v", err)
 			}
@@ -571,13 +571,13 @@ func (h *crdbInstallHelper) generateInitCmd(nodeIdx int) string {
 		initCmd = `cd ${HOME}/local/1 ; `
 	}
 
-	path := fmt.Sprintf("%s/%s", h.c.Impl.NodeDir(h.c, nodes[nodeIdx]), "cluster-bootstrapped")
+	const bootstrapFile = "{store-dir}/cluster-bootstrapped"
 	url := h.r.NodeURL(h.c, "localhost", h.r.NodePort(h.c, nodes[nodeIdx]))
 	binary := cockroachNodeBinary(h.c, nodes[nodeIdx])
 	initCmd += fmt.Sprintf(`
 		if ! test -e %[1]s ; then
 			COCKROACH_CONNECT_TIMEOUT=0 %[2]s init --url %[3]s && touch %[1]s
-		fi`, path, binary, url)
+		fi`, bootstrapFile, binary, url)
 	return initCmd
 }

basically it moves this bootstrap file into the store dir, so it will be part of the fixture when you make it (one more time).

@irfansharif
Copy link
Contributor

irfansharif commented Sep 25, 2020

Putting cluster-bootstrapped into the fixture itself makes sense, and that patch LGTM too if it works.

@jlinder, given #51897 keeps biting us so often and that we're now close to finally releasing 20.2, can we prioritize #51897? I really think it will make a world of difference for roachtests and test infrastructure sanity. For the patch suggested above, it's a bit unfortunate we need it at all.

EDIT: I see it #51897 is not what's causing the issue here. Confused myself cause of Tobi's broken link.

@arulajmani arulajmani force-pushed the 20.1.6-update-version-map branch from 459ce14 to 2752a8b Compare September 25, 2020 23:07
This commit updates the PredecessorVersion to 20.1.6 and updates
fixtures.

Release note: None
@arulajmani arulajmani force-pushed the 20.1.6-update-version-map branch from 2752a8b to 8eca6de Compare September 26, 2020 00:40
@arulajmani
Copy link
Collaborator Author

arulajmani commented Sep 26, 2020

@tbg din't have much luck trying to generate fixtures after trying the diff you sent. Straight out of the box, I encountered the Unknown desc = cluster has already been initialized error when trying to generate fixtures. I tried to play around with the diff a bit, trying:

@@ -165,6 +165,9 @@ func (r Cockroach) Start(c *SyncedCluster, extraArgs []string) {
                shouldInit := !h.useStartSingleNode(vers) && vers.AtLeast(version.MustParse("v20.1.0"))
                if shouldInit {
                        fmt.Printf("%s: initializing cluster\n", h.c.Name)
+                       const bootstrapFile = "{store-dir}/cluster-bootstrapped"
+                       h.run(nodeIdx, "mkdir -p {store-dir}")
+                       _, err := h.run(nodeIdx, "touch "+bootstrapFile)
                        initOut, err := h.initializeCluster(nodeIdx)
                        if err != nil {
                                log.Fatalf("unable to initialize cluster: %v", err)

in addition to what you pasted above as we're creating fixtures for v20.1.6 -- I was able to create fixtures with this successfully (again), but they don't pass CI.

@irfansharif
Copy link
Contributor

I can take a look come Monday.

@irfansharif irfansharif self-assigned this Sep 26, 2020
@tbg
Copy link
Member

tbg commented Oct 2, 2020

@irfansharif would you mind taking this one?

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 6, 2020
And address fallout from `roachprod` changes around cluster
initialization (cockroachdb#51329). Touches cockroachdb#54761.

Release note: None
@irfansharif
Copy link
Contributor

#55253.

@irfansharif irfansharif closed this Oct 6, 2020
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 7, 2020
And address fallout from `roachprod` changes around cluster
initialization (cockroachdb#51329). Touches cockroachdb#54761.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 7, 2020
55253: roachtest: bump fixtures for 20.1 (to 20.1.6) r=irfansharif a=irfansharif

And address fallout from `roachprod` changes around cluster
initialization (#51329). Touches #54761.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
jayshrivastava pushed a commit that referenced this pull request Oct 8, 2020
And address fallout from `roachprod` changes around cluster
initialization (#51329). Touches #54761.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants