Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: roachtest: update multitenant/distsql to use new roachprod service APIs #116526

Merged

Conversation

herkolategan
Copy link
Collaborator

Backport 6/6 commits from #115599.

/cc @cockroachdb/release


Previously the multitenant distsql roachtest relied on an internal util in roachtest to start virtual clusters. This PR updates the test to use the new official roachtest and roachprod APIs for starting virtual clusters.

Some additional changes were required to support upgrading the test. The cluster interface only exposed a method to start storage nodes, but that is insufficient to start virtual clusters that have a separate method on the roachprod API (for starting).

This change adds a new method StartServiceForVirtualCluster to the cluster interface to enable roachtests to start virtual clusters. Some refactoring was required to enable different sets of cluster settings, depending on what service
type is going to be started.

There are now two sets of cluster settings that can be utilised in test_runner. For virtual clusters virtualClusterSettings will be used, and for storage clusters clusterSettings will be utilised.

Fixes: #116019

Release Note: None
Epic: CRDB-31933

Release justification: Test only change.

Previously the cluster interface only exposed a method to start storage nodes,
but that is insufficient to start virtual clusters that have a separate method
on the `roachprod` API (for starting).

This change adds a new method `StartServiceForVirtualCluster` to the cluster
interface to enable roachtests to start virtual clusters. Some refactoring was
required to enable different sets of cluster settings, depending on what service
type is going to be started.

There are now two sets of cluster settings that can be utilised in
`test_runner`. For virtual clusters `virtualClusterSettings` will be used, and
for storage clusters `clusterSettings` will be utilised.

Epic: None
Release Note: None
Previously the multitenant distsql roachtest relied on an internal util in
`roachtest` to start virtual clusters. This change updates the test to use the
new official `roachtest` and `roachprod` APIs for starting virtual clusters.

Fixes: cockroachdb#116019

Epic: None
Release Note: None
Copy link

blathers-crl bot commented Dec 15, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Dec 15, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Add a convenience function to return `StartOpts` for starting an external
process virtual cluster.

Epic: None
Release Note: None
@herkolategan herkolategan marked this pull request as ready for review December 15, 2023 17:46
@herkolategan herkolategan requested a review from a team as a code owner December 15, 2023 17:46
@herkolategan herkolategan requested review from srosenberg and DarrylWong and removed request for a team December 15, 2023 17:46
@herkolategan herkolategan marked this pull request as draft December 15, 2023 17:49
@herkolategan herkolategan marked this pull request as ready for review January 8, 2024 11:45
Copy link
Contributor

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to backport #117505 once this merges or else the multitenant/distsql tests will fail. Or you can include them here but you'd have to make sure to set both SQLPort and adminUIPort to 0 like done in #117626.

@yuzefovich
Copy link
Member

What's the benefit of backporting such changes to older branches? On a quick glance this shouldn't increase the test coverage, but it might introduce more flakiness - like having to include a backport of #117505.

@herkolategan
Copy link
Collaborator Author

herkolategan commented Jan 12, 2024

What's the benefit of backporting such changes to older branches? On a quick glance this shouldn't increase the test coverage

Mostly to make other future backports easier. But for this case I'm also wondering if it's more trouble than it's worth. I'll close it if we feel it has no appropriate value for future backports @renatolabs, @DarrylWong?

@renatolabs
Copy link
Contributor

I'm not sure. If we backport both PRs, is there a risk of introducing flakiness? I believe the failure we were seeing was quite deterministic, so if we are able to run those tests successfully on this branch, that should be enough evidence that things are working fine, right?

While this PR on its own might not directly increase coverage, it might make other more impactful backports possible or at least a lot less painful. To me, this is quite valuable given that 23.1 will continue to be supported for a long time.

@DarrylWong
Copy link
Contributor

I don't think backporting my PR will be hard at all. My comment was mostly just a note to self to do it promptly to avoid more failures.

@herkolategan herkolategan merged commit cac67a7 into cockroachdb:release-23.1 Jan 12, 2024
2 checks passed
@yuzefovich
Copy link
Member

While this PR on its own might not directly increase coverage, it might make other more impactful backports possible or at least a lot less painful. To me, this is quite valuable given that 23.1 will continue to be supported for a long time.

Are there concrete improvements that we plan to backport to 23.1 that would need this PR as a prerequisite?

Generally speaking, I think we should default to "no backport" decision for any PR, so there must be a good reason for something to be backported, even if it's test-only. Clearly, bug fixes or fixes to test failures / test flakes should be backported. The category of increasing test coverage is questionable IMO, but I could see an argument for backporting such changes too. Simply "this backport might make backporting some possible improvements in the future easier" doesn't seem justified to me. If there are concrete things (issues, PRs, epics) that we will want to backport, and those in turn need such an "API improvement" PR, they could all be backported together.

My hesitation from backporting changes like this is the fact that it has non-zero chance of creating extra work / noise (I used "flakes" when I meant "noise" / "failures"). More concretely, this PR has already been merged, and if #117737 doesn't get merged before the nightly run, then SQL Queries will get 4 new issues like #117601. I think it's straight up incorrect that this PR was merged as is, without including commits from #117737, but I also currently don't see the reason for why we backported this changes at all (especially to 23.1 branch which is quite stable at this point).

@renatolabs
Copy link
Contributor

Are there concrete improvements that we plan to backport to 23.1 that would need this PR as a prerequisite?

We're actively working on better multi-tenant support in roachtest/roachprod, so I think there's good chance of conflicts if we didn't backport this.

I think we should default to "no backport" decision [...] even if it's test-only

I disagree with this. Speaking specifically for Test Eng and roachtest work: we used to have this approach, and it made our life much harder when the time came that we had to backport something. Nothing would merge cleanly and resolving conflicts / figuring out what got merged between two points in time was very painful. It's also very helpful to have a single(-ish) mental model of the test infrastructure instead of having to remember how X works in branch Y when debugging.

it has non-zero chance of creating extra work / noise

That said, I understand this pain point for teams (including our own). I'd like to think that this is the exception, not the norm. We have been backporting infrastructure improvements to 23.1 quite actively and they don't introduce regressions that often (I'm curious if your impression is different).

Maybe my main point is: while flakes are a pain, I don't think the correct response is to stop backporting anything that is not a bug fix (speaking from test-infrastructure standpoint; for DB code, the story is very different naturally). If we introduce a chaos event in the test suite (arbitrary example), I'd think we would want to backport that to 23.1, at the risk of introducing a flake.

incorrect that this PR was merged as is

I agree with this, we should have grouped them, sorry about that. If the nightly kicks in and you see failures in these tests, please reassign them to us straight away.

renatolabs added a commit to renatolabs/cockroach that referenced this pull request Jan 12, 2024
…ort23.1-115599"

This reverts commit cac67a7, reversing
changes made to fb921d2.
@renatolabs
Copy link
Contributor

FWIW, we're reverting this PR on 23.1 until we're able to merge the corresponding fix: #117742.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants