-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roach{test,prod},acceptance: no longer rely on auto-init #51329
roach{test,prod},acceptance: no longer rely on auto-init #51329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/cmd/roachprod/install/cockroach.go, line 304 at r1 (raw file):
}) if StartOpts.SkipInit || !bootstrappable {
I'd prefer it if we could avoid adding a --skip-init
flag as the introduction of the flag is a change to the roachprod UX. Safer to leave the UX unchanged, which I think is possible. Instead, can we leave a marker file on node 1 once initialization has been performed? We can possibly reuse the <dir>/settings-initialized
file that already exists that is used to initialize cluster settings only once when the cluster is first bootstrapped.
bee2ec5
to
6f7b36f
Compare
Done. I'd originally added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other comment than what Peter already said.
6f7b36f
to
b3d36dd
Compare
TFTRs! Ended up having to add the new marker file to the version upgrade fixtures given we're not solving it with --skip-init anymore. bors r+ |
Flaked on #48423, re-running. |
bors r+ |
`cockroach start` without `--join` auto-initializes the cluster. This is deprecated behavior as of 19.2 and will be removed soon. `roachprod start` inherited similar semantics, auto-initializing a cluster on `roachprod start`. It did so by cutting around the join flags and using auto-init behaviour of crdb. We keep the same external API of `roachprod start`, but we now explicitly `cockroach init` the managed clusters (using a dedicated `cluster-bootstrapped` signal file to avoid double initialization). Release note: None
b3d36dd
to
ad380a4
Compare
Canceled |
Ugh, cla bot acting up. Retrying. bors r+ |
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/cmd/roachprod/install/cockroach.go, line 330 at r2 (raw file):
binary := cockroachNodeBinary(c, 1) path := fmt.Sprintf("%s/%s", c.Impl.NodeDir(c, nodes[i]), "cluster-bootstrapped")
One bit of fragility here is that wiping node 1 will cause a reinitialization. I wonder if we should pushing the cluster-bootstrapped
file to all nodes. Perhaps just add a TODO for now as it isn't clear to me if this is worth worrying about.
pkg/cmd/roachprod/install/cockroach.go, line 347 at r2 (raw file):
if initOut != "" { fmt.Println(initOut)
This block isn't taken if the cluster is already bootstrapped, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @petermattis and @tbg)
pkg/cmd/roachprod/install/cockroach.go, line 330 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
One bit of fragility here is that wiping node 1 will cause a reinitialization. I wonder if we should pushing the
cluster-bootstrapped
file to all nodes. Perhaps just add a TODO for now as it isn't clear to me if this is worth worrying about.
yea, I'll add the TODO. I did audit through our usage of c.Wipe in roachtests to see if there were any problematic usages, and didn't find any. It helps that any sort of problematic usage will fail loudly with a clear "attempting to initialize a re-initialized cluster" message.
pkg/cmd/roachprod/install/cockroach.go, line 347 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This block isn't taken if the cluster is already bootstrapped, right?
Correct.
Fixes cockroachdb#44116. Informs cockroachdb#24118. Reviving cockroachdb#44112. We remove the deprecated behavior of crdb where we previously auto-initialized the cluster when `cockroach start` was invoked without a corresponding `--join` flag. We update a few tests along the way that made use of this behavior by changing them to either explicitly init, or now lean on `roachprod` changes (from cockroachdb#51329) that transparently manage initialization. Release note (backward-incompatible change): A CockroachDB node started with cockroach start without the --join flag no longer automatically initializes the cluster. The `cockroach init` command is now mandatory. The auto-initialization behavior had been deprecated in version 19.2. Release note (backward-incompatible change): `cockroach start` without `--join` is no longer supported. It was deprecated in a previous release.
Fixes cockroachdb#44116. Informs cockroachdb#24118. Reviving cockroachdb#44112. We remove the deprecated behavior of crdb where we previously auto-initialized the cluster when `cockroach start` was invoked without a corresponding `--join` flag. We update a few tests along the way that made use of this behavior by changing them to either explicitly init, or now lean on `roachprod` changes (from cockroachdb#51329) that transparently manage initialization. Release note (backward-incompatible change): A CockroachDB node started with cockroach start without the --join flag no longer automatically initializes the cluster. The `cockroach init` command is now mandatory. The auto-initialization behavior had been deprecated in version 19.2. Release note (backward-incompatible change): `cockroach start` without `--join` is no longer supported. It was deprecated in a previous release.
Fixes cockroachdb#51532. This is fallout from cockroachdb#51329, after which we tried using --join flags for single node clusters (that roachprod uses `start-single-node` for). This tripped up roachtests like cockroachdb#51532, which are now fixed. Release note: None.
51374: sql: add a virtual index on the pg_catalog.pg_type.OID r=rohany a=ekalinin Fixes #49208 Release note (performance improvement): scans over virtual table pg_type by OID column have improved performance in common cases. 51484: geomfn: implement validity operators r=rytaft a=otan Unfortunately we cannot implement ST_IsValidDetail because it returns a composite type, which we do not yet support. Resolves #48960 Resolves #48961 Resolves #48963 Resolves #48964 Release note (sql change): Implements ST_IsValid, ST_IsValidReason and ST_MakeValid operators for geometry types. 51557: roachprod: fix roachprod start behaviour for single node clusters r=rafiss a=irfansharif Fixes #51532. This is fallout from #51329, after which we tried using --join flags for single node clusters (that roachprod uses `start-single-node` for). This tripped up roachtests like #51532, which are now fixed. Release note: None. 51558: roachtest: cleanup sqlalchemy roachtest r=rafiss a=irfansharif Was just in the area investigating something else, cleaned it up a bit. It's just all code movement. Release note: None Co-authored-by: kev <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: irfan sharif <[email protected]>
..and the setting of cluster settings for single node clusters. `roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs. Fixes cockroachdb#51497 Fixes cockroachdb#51721 Fixes cockroachdb#51738 Fixes cockroachdb#51768 Fixes cockroachdb#51769 Fixes cockroachdb#51776 Release note: None
..and the setting of cluster settings for single node clusters. `roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs. Fixes cockroachdb#51497 Fixes cockroachdb#51721 Fixes cockroachdb#51738 Fixes cockroachdb#51768 Fixes cockroachdb#51769 Fixes cockroachdb#51776 Release note: None
51893: roachprod: fixup `roachprod --sequential` r=irfansharif a=irfansharif ..and the setting of cluster settings for single node clusters. `roachprod start --sequential` was broken in #51329, and the broken-ness outlined in TODOs in #51790. This PR just addresses those TODOs. Fixes #51497 Fixes #51721 Fixes #51738 Fixes #51768 Fixes #51769 Fixes #51776 Release note: None Co-authored-by: irfan sharif <[email protected]>
Fixes cockroachdb#51713. Previously this test used to decommission+wipe node 1, which played a bit badly with cockroachdb#51329, where node 1 is tasked to initialize the cluster (and no-ops attempts on finding a persisted file). Given that previously this test wiped away the store dir in its entirety, this test attempted to re-initialize the cluster and naturally failed to do so. As a short workaround we just decommission+wipe nodes towards the end of the cluster list (so node 4), instead of the beginning. And while here, clean the test code up a bit. Release note: None.
Fixes cockroachdb#51713. Previously this test used to decommission+wipe node 1, which played a bit badly with cockroachdb#51329, where node 1 is tasked to initialize the cluster (and no-ops attempts on finding a persisted file). Given that previously this test wiped away the store dir in its entirety, this test attempted to re-initialize the cluster and naturally failed to do so. As a short workaround we just decommission+wipe nodes towards the end of the cluster list (so node 4), instead of the beginning. And while here, clean the test code up a bit. Release note: None.
51870: kvserver: improve node liveness failure log, point to docs r=lunevalex a=nvanbenschoten This commit improves the message logged on node liveness failures, which is logged roughly every 4.5 seconds during liveness unavailability. The improved message describes some of the common causes of liveness unavailability (resource starvation and network connectivity problems) and then links to our troubleshooting docs about the topic. This was an action item coming out of a recent support incident postmortem. 51907: roachtest: deflake decommission/nodes=4 test r=irfansharif a=irfansharif Fixes #51713. Previously this test used to decommission+wipe node 1, which played a bit badly with #51329, where node 1 is tasked to initialize the cluster (and no-ops attempts on finding a persisted file). Given that previously this test wiped away the store dir in its entirety, this test attempted to re-initialize the cluster and naturally failed to do so. As a short workaround we just decommission+wipe nodes towards the end of the cluster list (so node 4), instead of the beginning. And while here, clean the test code up a bit. Release note: None. 51945: roachprod: lose sudo in roachprod stage r=ajwerner a=ajwerner We shouldn't need it. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
..and `roachprod init`. I attempted to originally introduce this flag in \cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to have such a thing, especially given now we're writing integration tests that want to control `init` behaviour. It's much easier to write them with this --skip-init flag than it is to work around roachprod's magical auto-init behavior. To do what's skipped when using --skip-init, we introduce a `roachprod init` sub command. Release note: None
..and `roachprod init`. I attempted to originally introduce this flag in \cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to have such a thing, especially given now we're writing integration tests that want to control `init` behaviour. It's much easier to write them with this --skip-init flag than it is to work around roachprod's magical auto-init behavior. To do what's skipped when using --skip-init, we introduce a `roachprod init` sub command. Release note: None
52844: kvserver,rangefeed: ensure that iterators are only constructed under tasks r=andreimatei a=ajwerner Prior to this change, it was possible for a rangefeed request to be issued concurrently with shutting down which could lead to an iterator being constructed after the engine has been closed. Touches #51544 Release note: None 52996: partialidx: prove implication for comparisons with two variables r=RaduBerinde a=mgartner This commit adds support for proving partial index predicates are implied by query filters when they contain comparison expressions with two variables and they are not identical expressions. Below are some examples where the left expression implies (=>) the right expression. The right is guaranteed to contain the left despite both expressions having no constant values. a > b => a >= b a = b => a >= b b < a => a >= b a > b => a != b Release note: None 53113: roachprod: introduce --skip-init to `roachprod start` r=irfansharif a=irfansharif ..and `roachprod init`. I attempted to originally introduce this flag in \#51329, and ultimately abandoned it. I still think it's a good idea to have such a thing, especially given now we're writing integration tests that want to control `init` behaviour. It's much easier to write them with this --skip-init flag than it is to work around roachprod's magical auto-init behavior. To do what's skipped when using --skip-init, we introduce a `roachprod init` sub command. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: irfan sharif <[email protected]>
..and `roachprod init`. I attempted to originally introduce this flag in \cockroachdb#51329, and ultimately abandoned it. I still think it's a good idea to have such a thing, especially given now we're writing integration tests that want to control `init` behaviour. It's much easier to write them with this --skip-init flag than it is to work around roachprod's magical auto-init behavior. To do what's skipped when using --skip-init, we introduce a `roachprod init` sub command. Release note: None
And address fallout from `roachprod` changes around cluster initialization (cockroachdb#51329). Touches cockroachdb#54761. Release note: None
And address fallout from `roachprod` changes around cluster initialization (cockroachdb#51329). Touches cockroachdb#54761. Release note: None
cockroach start
without--join
auto-initializes the cluster. Thiswas deprecated in 19.2 and will be removed in a future commit. We update
test code that relies on the previous behavior. To (mostly) retain
existing roachprod behavior, we auto-initialize when
roachprod start
-ingn1. We introduce a new
--skip-init
flag for when that isnot desired (for example in roachtests that restart n1, but don't intend
to re-initialize the cluster). Most of the roachtest changes are exactly
to this effect, where tests than invoke
roachprod start
a second timeon n1 are changed to now specify this new
--skip-init
flag.Release note: None