Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roach{test,prod},acceptance: no longer rely on auto-init #51329

Merged
merged 1 commit into from
Jul 14, 2020

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Jul 11, 2020

cockroach start without --join auto-initializes the cluster. This
was deprecated in 19.2 and will be removed in a future commit. We update
test code that relies on the previous behavior. To (mostly) retain
existing roachprod behavior, we auto-initialize when roachprod start-ing
n1. We introduce a new --skip-init flag for when that is
not desired (for example in roachtests that restart n1, but don't intend
to re-initialize the cluster). Most of the roachtest changes are exactly
to this effect, where tests than invoke roachprod start a second time
on n1 are changed to now specify this new --skip-init flag.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/cmd/roachprod/install/cockroach.go, line 304 at r1 (raw file):

	})

	if StartOpts.SkipInit || !bootstrappable {

I'd prefer it if we could avoid adding a --skip-init flag as the introduction of the flag is a change to the roachprod UX. Safer to leave the UX unchanged, which I think is possible. Instead, can we leave a marker file on node 1 once initialization has been performed? We can possibly reuse the <dir>/settings-initialized file that already exists that is used to initialize cluster settings only once when the cluster is first bootstrapped.

@irfansharif irfansharif force-pushed the 200710.skip-init-roachprod branch from bee2ec5 to 6f7b36f Compare July 14, 2020 00:18
@irfansharif
Copy link
Contributor Author

I'd prefer it if we could avoid adding a --skip-init flag as the introduction of the flag is a change to the roachprod UX.

Done. I'd originally added a cluster-bootstrapped file doing exactly that, but then thought to expose the --skip-init control to it (but I see now that it's not really necessary, not yet anyway).

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other comment than what Peter already said.

@irfansharif irfansharif force-pushed the 200710.skip-init-roachprod branch from 6f7b36f to b3d36dd Compare July 14, 2020 14:39
@irfansharif
Copy link
Contributor Author

TFTRs! Ended up having to add the new marker file to the version upgrade fixtures given we're not solving it with --skip-init anymore.

bors r+

@irfansharif
Copy link
Contributor Author

Flaked on #48423, re-running.

@irfansharif
Copy link
Contributor Author

bors r+

`cockroach start` without `--join` auto-initializes the cluster. This
is deprecated behavior as of 19.2 and will be removed soon. `roachprod
start` inherited similar semantics, auto-initializing a cluster on
`roachprod start`. It did so by cutting around the join flags and using
auto-init behaviour of crdb. We keep the same external API of
`roachprod start`, but we now explicitly `cockroach init` the managed
clusters (using a dedicated `cluster-bootstrapped` signal file to avoid
double initialization).

Release note: None
@irfansharif irfansharif force-pushed the 200710.skip-init-roachprod branch from b3d36dd to ad380a4 Compare July 14, 2020 17:55
@craig
Copy link
Contributor

craig bot commented Jul 14, 2020

Canceled

@irfansharif
Copy link
Contributor Author

Ugh, cla bot acting up. Retrying.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 14, 2020

Build succeeded

@craig craig bot merged commit bc2f84c into cockroachdb:master Jul 14, 2020
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)


pkg/cmd/roachprod/install/cockroach.go, line 330 at r2 (raw file):

		binary := cockroachNodeBinary(c, 1)
		path := fmt.Sprintf("%s/%s", c.Impl.NodeDir(c, nodes[i]), "cluster-bootstrapped")

One bit of fragility here is that wiping node 1 will cause a reinitialization. I wonder if we should pushing the cluster-bootstrapped file to all nodes. Perhaps just add a TODO for now as it isn't clear to me if this is worth worrying about.


pkg/cmd/roachprod/install/cockroach.go, line 347 at r2 (raw file):

	if initOut != "" {
		fmt.Println(initOut)

This block isn't taken if the cluster is already bootstrapped, right?

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis and @tbg)


pkg/cmd/roachprod/install/cockroach.go, line 330 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

One bit of fragility here is that wiping node 1 will cause a reinitialization. I wonder if we should pushing the cluster-bootstrapped file to all nodes. Perhaps just add a TODO for now as it isn't clear to me if this is worth worrying about.

yea, I'll add the TODO. I did audit through our usage of c.Wipe in roachtests to see if there were any problematic usages, and didn't find any. It helps that any sort of problematic usage will fail loudly with a clear "attempting to initialize a re-initialized cluster" message.


pkg/cmd/roachprod/install/cockroach.go, line 347 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This block isn't taken if the cluster is already bootstrapped, right?

Correct.

@irfansharif irfansharif deleted the 200710.skip-init-roachprod branch July 15, 2020 01:08
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 15, 2020
Fixes cockroachdb#44116. Informs cockroachdb#24118. Reviving cockroachdb#44112.

We remove the deprecated behavior of crdb where we previously
auto-initialized the cluster when `cockroach start` was invoked without
a corresponding `--join` flag. We update a few tests along the way that
made use of this behavior by changing them to either explicitly init, or
now lean on `roachprod` changes (from cockroachdb#51329) that transparently manage
initialization.

Release note (backward-incompatible change): A CockroachDB node
started with cockroach start without the --join flag no
longer automatically initializes the cluster. The `cockroach init`
command is now mandatory. The auto-initialization behavior had
been deprecated in version 19.2.

Release note (backward-incompatible change): `cockroach start` without
`--join` is no longer supported. It was deprecated in a previous
release.
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 17, 2020
Fixes cockroachdb#44116. Informs cockroachdb#24118. Reviving cockroachdb#44112.

We remove the deprecated behavior of crdb where we previously
auto-initialized the cluster when `cockroach start` was invoked without
a corresponding `--join` flag. We update a few tests along the way that
made use of this behavior by changing them to either explicitly init, or
now lean on `roachprod` changes (from cockroachdb#51329) that transparently manage
initialization.

Release note (backward-incompatible change): A CockroachDB node
started with cockroach start without the --join flag no
longer automatically initializes the cluster. The `cockroach init`
command is now mandatory. The auto-initialization behavior had
been deprecated in version 19.2.

Release note (backward-incompatible change): `cockroach start` without
`--join` is no longer supported. It was deprecated in a previous
release.
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 17, 2020
Fixes cockroachdb#51532.

This is fallout from cockroachdb#51329, after which we tried using --join flags for
single node clusters (that roachprod uses `start-single-node` for). This
tripped up roachtests like cockroachdb#51532, which are now fixed.

Release note: None.
craig bot pushed a commit that referenced this pull request Jul 17, 2020
51374: sql: add a virtual index on the pg_catalog.pg_type.OID r=rohany a=ekalinin

Fixes #49208

Release note (performance improvement): scans over virtual
table pg_type by OID column have improved performance in common cases.

51484: geomfn: implement validity operators r=rytaft a=otan

Unfortunately we cannot implement ST_IsValidDetail because it returns a
composite type, which we do not yet support.

Resolves #48960
Resolves #48961
Resolves #48963
Resolves #48964

Release note (sql change): Implements ST_IsValid, ST_IsValidReason and
ST_MakeValid operators for geometry types.

51557: roachprod: fix roachprod start behaviour for single node clusters r=rafiss a=irfansharif

Fixes #51532.

This is fallout from #51329, after which we tried using --join flags for
single node clusters (that roachprod uses `start-single-node` for). This
tripped up roachtests like #51532, which are now fixed.

Release note: None.

51558: roachtest: cleanup sqlalchemy roachtest r=rafiss a=irfansharif

Was just in the area investigating something else, cleaned it up a bit.
It's just all code movement.

Release note: None

Co-authored-by: kev <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 24, 2020
..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness
outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs.

Fixes cockroachdb#51497
Fixes cockroachdb#51721
Fixes cockroachdb#51738
Fixes cockroachdb#51768
Fixes cockroachdb#51769
Fixes cockroachdb#51776

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 25, 2020
..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness
outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs.

Fixes cockroachdb#51497
Fixes cockroachdb#51721
Fixes cockroachdb#51738
Fixes cockroachdb#51768
Fixes cockroachdb#51769
Fixes cockroachdb#51776

Release note: None
craig bot pushed a commit that referenced this pull request Jul 25, 2020
51893: roachprod: fixup `roachprod --sequential` r=irfansharif a=irfansharif

..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in #51329, and the broken-ness
outlined in TODOs in #51790. This PR just addresses those TODOs.

Fixes #51497
Fixes #51721
Fixes #51738
Fixes #51768
Fixes #51769
Fixes #51776

Release note: None

Co-authored-by: irfan sharif <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 27, 2020
Fixes cockroachdb#51713. Previously this test used to decommission+wipe node 1,
which played a bit badly with cockroachdb#51329, where node 1 is tasked to
initialize the cluster (and no-ops attempts on finding a persisted
file). Given that previously this test wiped away the store dir in its
entirety, this test attempted to re-initialize the cluster and naturally
failed to do so. As a short workaround we just decommission+wipe nodes
towards the end of the cluster list (so node 4), instead of the
beginning. And while here, clean the test code up a bit.

Release note: None.
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jul 28, 2020
Fixes cockroachdb#51713. Previously this test used to decommission+wipe node 1,
which played a bit badly with cockroachdb#51329, where node 1 is tasked to
initialize the cluster (and no-ops attempts on finding a persisted
file). Given that previously this test wiped away the store dir in its
entirety, this test attempted to re-initialize the cluster and naturally
failed to do so. As a short workaround we just decommission+wipe nodes
towards the end of the cluster list (so node 4), instead of the
beginning. And while here, clean the test code up a bit.

Release note: None.
craig bot pushed a commit that referenced this pull request Jul 28, 2020
51870: kvserver: improve node liveness failure log, point to docs r=lunevalex a=nvanbenschoten

This commit improves the message logged on node liveness failures, which
is logged roughly every 4.5 seconds during liveness unavailability. The
improved message describes some of the common causes of liveness
unavailability (resource starvation and network connectivity problems)
and then links to our troubleshooting docs about the topic.

This was an action item coming out of a recent support incident postmortem.

51907: roachtest: deflake decommission/nodes=4 test r=irfansharif a=irfansharif

Fixes #51713. Previously this test used to decommission+wipe node 1,
which played a bit badly with #51329, where node 1 is tasked to
initialize the cluster (and no-ops attempts on finding a persisted
file). Given that previously this test wiped away the store dir in its
entirety, this test attempted to re-initialize the cluster and naturally
failed to do so. As a short workaround we just decommission+wipe nodes
towards the end of the cluster list (so node 4), instead of the
beginning. And while here, clean the test code up a bit.

Release note: None.

51945: roachprod: lose sudo in roachprod stage r=ajwerner a=ajwerner

We shouldn't need it.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Aug 20, 2020
..and `roachprod init`. I attempted to originally introduce this flag in
\cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to
have such a thing, especially given now we're writing integration tests
that want to control `init` behaviour. It's much easier to write them
with this --skip-init flag than it is to work around roachprod's magical
auto-init behavior.

To do what's skipped when using --skip-init, we introduce a `roachprod
init` sub command.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Aug 20, 2020
..and `roachprod init`. I attempted to originally introduce this flag in
\cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to
have such a thing, especially given now we're writing integration tests
that want to control `init` behaviour. It's much easier to write them
with this --skip-init flag than it is to work around roachprod's magical
auto-init behavior.

To do what's skipped when using --skip-init, we introduce a `roachprod
init` sub command.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 20, 2020
52844: kvserver,rangefeed: ensure that iterators are only constructed under tasks r=andreimatei a=ajwerner


Prior to this change, it was possible for a rangefeed request to be issued
concurrently with shutting down which could lead to an iterator being
constructed after the engine has been closed.

Touches #51544

Release note: None

52996: partialidx: prove implication for comparisons with two variables r=RaduBerinde a=mgartner

This commit adds support for proving partial index predicates are
implied by query filters when they contain comparison expressions with
two variables and they are not identical expressions.

Below are some examples where the left expression implies (=>) the right
expression. The right is guaranteed to contain the left despite both
expressions having no constant values.

    a > b  =>  a >= b
    a = b  =>  a >= b
    b < a  =>  a >= b
    a > b  =>  a != b

Release note: None

53113: roachprod: introduce --skip-init to `roachprod start` r=irfansharif a=irfansharif

..and `roachprod init`. I attempted to originally introduce this flag in
\#51329, and ultimately abandoned it. I still think it's a good idea to
have such a thing, especially given now we're writing integration tests
that want to control `init` behaviour. It's much easier to write them
with this --skip-init flag than it is to work around roachprod's magical
auto-init behavior.

To do what's skipped when using --skip-init, we introduce a `roachprod
init` sub command.

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
RichardJCai pushed a commit to RichardJCai/cockroach that referenced this pull request Aug 20, 2020
..and `roachprod init`. I attempted to originally introduce this flag in
\cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to
have such a thing, especially given now we're writing integration tests
that want to control `init` behaviour. It's much easier to write them
with this --skip-init flag than it is to work around roachprod's magical
auto-init behavior.

To do what's skipped when using --skip-init, we introduce a `roachprod
init` sub command.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 6, 2020
And address fallout from `roachprod` changes around cluster
initialization (cockroachdb#51329). Touches cockroachdb#54761.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 7, 2020
And address fallout from `roachprod` changes around cluster
initialization (cockroachdb#51329). Touches cockroachdb#54761.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 7, 2020
55253: roachtest: bump fixtures for 20.1 (to 20.1.6) r=irfansharif a=irfansharif

And address fallout from `roachprod` changes around cluster
initialization (#51329). Touches #54761.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
jayshrivastava pushed a commit that referenced this pull request Oct 8, 2020
And address fallout from `roachprod` changes around cluster
initialization (#51329). Touches #54761.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants