roachtest/cdc-mixed-versions: add multi-step upgrade test #107948

jayshrivastava · 2023-08-01T15:03:29Z

This change adds a new roachtest "cdc/mixed-versions-multiple-upgrades"
which starts a changefeed and ensures that it keeps running through
multiple cluster upgrades and rollbacks. This test is an extension
of a previous test "cdc/mixed-versions" which only tests upgrading
from one previous version to the present version. "cdc/mixed-versions"
is now renamed to "cdc/mixed-versions-single-upgrade".

"cdc/mixed-versions-multiple-upgrades" performs 4 upgrades starting
from 4 versions in the past. As seen in #107293,
changefeeds running on versions prior to 22.2 can be very brittle,
mainly due to them not retrying transient errors. To address this
problem, this new roachtest will ensure that 22.2 is the minimum
binary version. So if this test is running on 23.1, it will only test
the upgrade path from 22.2-23.1. Once 23.2 releases, it will test the
upgrade path from 22.2 -> 23.1 -> 23.2 and so on, keeping a running
window of size 4.

More notes:

Can we remove the restriction of starting at version 22.2?
In #107293, we added some code
to restart the changefeed in case it fails to reduce flakes when running 22.1. It does
defeat the purpose of the test to do so, so it would not make sense to do something similar
in this change. It also does not make sense to change 22.1 because it is EOL'd.

Why perform the upgrade through multiple versions?
As seen in #107451, there are cases where,
for example, a 22.1 changefeed survives an upgrade to 22.2 and a 22.2 changefeed survives
and upgrade to 23.1, but a 22.1 does NOT survive an upgrade from 22.1 -> 22.2 -> 23.1.
This non-transitive relationship is the reason that I added this new test. We (and our
users) expect that arbitrarily old jobs can be resumed in new versions without error (unless we
explicitly add code to fail these jobs in certain cases). Because of this expectation,
we should have a test which asserts a changefeed stays running through multiple version
upgrades.

Why limit the test at 4 versions?
We do not want the test to run for an unbounded number of upgrades as we have
many more releases in the future.

Closes: #107451
Epic: None
Release note: None

cockroach-teamcity · 2023-08-01T15:03:44Z

This change is

This change adds a new roachtest "cdc/mixed-versions-multiple-upgrades" which starts a changefeed and ensures that it keeps running through multiple cluster upgrades and rollbacks. This test is an extension of a previous test "cdc/mixed-versions" which only tests upgrading from one previous version to the present version. "cdc/mixed-versions" is now renamed to "cdc/mixed-versions-single-upgrade". "cdc/mixed-versions-multiple-upgrades" performs 4 upgrades starting from 4 versions in the past. As seen in cockroachdb#107293, changefeeds running on versions prior to 22.2 can be very brittle, mainly due to them not retrying transient errors. To address this problem, this new roachtest will ensure that 22.2 is the minimum binary version. So if this test is running on 23.1, it will only test the upgrade path from 22.2-23.1. Once 23.2 releases, it will test the upgrade path from 22.2 -> 23.1 -> 23.2 and so on, keeping a running window of size 4. More notes: Can we remove the restriction of starting at version 22.2? In [cockroachdb#107293](cockroachdb#107293), we added some code to restart the changefeed in case it fails to reduce flakes when running 22.1. It does defeat the purpose of the test to do so, so it would not make sense to do something similar in this change. It also does not make sense to change 22.1 because it is EOL'd. Why perform the upgrade through multiple versions? As seen in cockroachdb#107451, there are cases where, for example, a 22.1 changefeed survives an upgrade to 22.2 and a 22.2 changefeed survives and upgrade to 23.1, but a 22.1 does NOT survive an upgrade from 22.1 -> 22.2 -> 23.1. This non-transitive relationship is the reason that I added this new test. We (and our users) expect that arbitrarily old jobs can be resumed in new versions without error (unless we explicitly add code to fail these jobs in certain cases). Because of this expectation, we should have a test which asserts a changefeed stays running through multiple version upgrades. Why limit the test at 4 versions? We do not want the test to run for an unbounded number of upgrades as we have many more releases in the future. Closes: cockroachdb#107451 Epic: None Release note: None

srosenberg · 2023-08-03T02:52:36Z

Nice addition! It's somewhat orthogonal, but I was reminded of a recent regression [1], which could have been caught via pausing on 22.2 and resuming on 23.1, instead of running it continuously during upgrade steps. Is that being tackled elsewhere?

[1] #107722

jayshrivastava

@srosenberg I think that is a great thing to test. Is it alright if we address that in a separate issue and PR? I think this test can be expended to do that (ex. have a 50% chance that we will pause the changefeed between upgrades).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @HonoreDB, @renatolabs, and @srosenberg)

jayshrivastava

In addition to this, we should randomly select some changefeed options as well (ex. expressions) which have been known to change between versions.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @HonoreDB, @renatolabs, and @srosenberg)

srosenberg · 2023-08-03T19:37:05Z

In addition to this, we should randomly select some changefeed options as well (ex. expressions) which have been known to change between versions.

Indeed! This is actually a common theme across many (or most) existing roachtests. For the most part, we're testing either against default settings or a fixed custom variant of it rather than random.

renatolabs · 2023-08-03T19:47:49Z

@srosenberg I think that is a great thing to test. Is it alright if we address that in a separate issue and PR? I think this test can be expended to do that (ex. have a 50% chance that we will pause the changefeed between upgrades).

If I may suggest, I think we would benefit from migrating this uprgade test to use the mixedversion framework. It's the direction we are trying to move toward and I think the goals seem to be pretty aligned (multiple upgrades, random options and other operations, etc).

If we do create an issue for follow-up work, I think our time would be better invested in that effort instead of adapting the existing tests. I'm more than happy to support in the transition process as well.

Thank you for this PR!

miretskiy · 2023-08-03T21:25:07Z

@jayshrivastava are you going to attempt to use the framework mentioned by @renatolabs ?
Or should we review this version?

jayshrivastava · 2023-08-03T21:34:04Z

@miretskiy Yes I'll migrate it. Please hold off on reviews :)

jayshrivastava · 2023-08-15T20:21:35Z

Closing in favor of #108802

jayshrivastava force-pushed the cdc-mixed-versions-long branch 3 times, most recently from 98485c5 to e74e793 Compare August 2, 2023 17:53

jayshrivastava changed the title ~~Cdc mixed versions long~~ roachtest/cdc-mixed-versions: add multi-step upgrade test Aug 2, 2023

jayshrivastava requested review from a team and HonoreDB and removed request for a team August 2, 2023 17:54

jayshrivastava marked this pull request as ready for review August 2, 2023 17:54

jayshrivastava requested a review from a team as a code owner August 2, 2023 17:54

jayshrivastava requested review from herkolategan and srosenberg and removed request for a team August 2, 2023 17:54

jayshrivastava force-pushed the cdc-mixed-versions-long branch 2 times, most recently from fcf3da3 to b5f9cd3 Compare August 2, 2023 18:19

jayshrivastava force-pushed the cdc-mixed-versions-long branch from b5f9cd3 to 65a10e5 Compare August 2, 2023 18:22

srosenberg requested a review from renatolabs August 3, 2023 02:44

jayshrivastava commented Aug 3, 2023

View reviewed changes

jayshrivastava closed this Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest/cdc-mixed-versions: add multi-step upgrade test #107948

roachtest/cdc-mixed-versions: add multi-step upgrade test #107948

jayshrivastava commented Aug 1, 2023 •

edited

Loading

cockroach-teamcity commented Aug 1, 2023

srosenberg commented Aug 3, 2023

jayshrivastava left a comment

jayshrivastava left a comment

srosenberg commented Aug 3, 2023

renatolabs commented Aug 3, 2023

miretskiy commented Aug 3, 2023

jayshrivastava commented Aug 3, 2023

jayshrivastava commented Aug 15, 2023

roachtest/cdc-mixed-versions: add multi-step upgrade test #107948

roachtest/cdc-mixed-versions: add multi-step upgrade test #107948

Conversation

jayshrivastava commented Aug 1, 2023 • edited Loading

cockroach-teamcity commented Aug 1, 2023

srosenberg commented Aug 3, 2023

jayshrivastava left a comment

Choose a reason for hiding this comment

jayshrivastava left a comment

Choose a reason for hiding this comment

srosenberg commented Aug 3, 2023

renatolabs commented Aug 3, 2023

miretskiy commented Aug 3, 2023

jayshrivastava commented Aug 3, 2023

jayshrivastava commented Aug 15, 2023

jayshrivastava commented Aug 1, 2023 •

edited

Loading