-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: roachtest: update multitenant/distsql to use new roachprod service APIs #116526
release-23.1: roachtest: update multitenant/distsql to use new roachprod service APIs #116526
Conversation
Epic: None Release Note: None
Previously the cluster interface only exposed a method to start storage nodes, but that is insufficient to start virtual clusters that have a separate method on the `roachprod` API (for starting). This change adds a new method `StartServiceForVirtualCluster` to the cluster interface to enable roachtests to start virtual clusters. Some refactoring was required to enable different sets of cluster settings, depending on what service type is going to be started. There are now two sets of cluster settings that can be utilised in `test_runner`. For virtual clusters `virtualClusterSettings` will be used, and for storage clusters `clusterSettings` will be utilised. Epic: None Release Note: None
Previously the multitenant distsql roachtest relied on an internal util in `roachtest` to start virtual clusters. This change updates the test to use the new official `roachtest` and `roachprod` APIs for starting virtual clusters. Fixes: cockroachdb#116019 Epic: None Release Note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
Epic: None Release Note: None
Epic: None Release Note: None
Add a convenience function to return `StartOpts` for starting an external process virtual cluster. Epic: None Release Note: None
5a3956e
to
e1d7050
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of backporting such changes to older branches? On a quick glance this shouldn't increase the test coverage, but it might introduce more flakiness - like having to include a backport of #117505. |
Mostly to make other future backports easier. But for this case I'm also wondering if it's more trouble than it's worth. I'll close it if we feel it has no appropriate value for future backports @renatolabs, @DarrylWong? |
I'm not sure. If we backport both PRs, is there a risk of introducing flakiness? I believe the failure we were seeing was quite deterministic, so if we are able to run those tests successfully on this branch, that should be enough evidence that things are working fine, right? While this PR on its own might not directly increase coverage, it might make other more impactful backports possible or at least a lot less painful. To me, this is quite valuable given that 23.1 will continue to be supported for a long time. |
I don't think backporting my PR will be hard at all. My comment was mostly just a note to self to do it promptly to avoid more failures. |
Are there concrete improvements that we plan to backport to 23.1 that would need this PR as a prerequisite? Generally speaking, I think we should default to "no backport" decision for any PR, so there must be a good reason for something to be backported, even if it's test-only. Clearly, bug fixes or fixes to test failures / test flakes should be backported. The category of increasing test coverage is questionable IMO, but I could see an argument for backporting such changes too. Simply "this backport might make backporting some possible improvements in the future easier" doesn't seem justified to me. If there are concrete things (issues, PRs, epics) that we will want to backport, and those in turn need such an "API improvement" PR, they could all be backported together. My hesitation from backporting changes like this is the fact that it has non-zero chance of creating extra work / noise (I used "flakes" when I meant "noise" / "failures"). More concretely, this PR has already been merged, and if #117737 doesn't get merged before the nightly run, then SQL Queries will get 4 new issues like #117601. I think it's straight up incorrect that this PR was merged as is, without including commits from #117737, but I also currently don't see the reason for why we backported this changes at all (especially to 23.1 branch which is quite stable at this point). |
We're actively working on better multi-tenant support in roachtest/roachprod, so I think there's good chance of conflicts if we didn't backport this.
I disagree with this. Speaking specifically for Test Eng and roachtest work: we used to have this approach, and it made our life much harder when the time came that we had to backport something. Nothing would merge cleanly and resolving conflicts / figuring out what got merged between two points in time was very painful. It's also very helpful to have a single(-ish) mental model of the test infrastructure instead of having to remember how X works in branch Y when debugging.
That said, I understand this pain point for teams (including our own). I'd like to think that this is the exception, not the norm. We have been backporting infrastructure improvements to 23.1 quite actively and they don't introduce regressions that often (I'm curious if your impression is different). Maybe my main point is: while flakes are a pain, I don't think the correct response is to stop backporting anything that is not a bug fix (speaking from test-infrastructure standpoint; for DB code, the story is very different naturally). If we introduce a chaos event in the test suite (arbitrary example), I'd think we would want to backport that to 23.1, at the risk of introducing a flake.
I agree with this, we should have grouped them, sorry about that. If the nightly kicks in and you see failures in these tests, please reassign them to us straight away. |
FWIW, we're reverting this PR on 23.1 until we're able to merge the corresponding fix: #117742. |
Backport 6/6 commits from #115599.
/cc @cockroachdb/release
Previously the multitenant distsql roachtest relied on an internal util in
roachtest
to start virtual clusters. This PR updates the test to use the new officialroachtest
androachprod
APIs for starting virtual clusters.Some additional changes were required to support upgrading the test. The cluster interface only exposed a method to start storage nodes, but that is insufficient to start virtual clusters that have a separate method on the
roachprod
API (for starting).This change adds a new method
StartServiceForVirtualCluster
to the cluster interface to enable roachtests to start virtual clusters. Some refactoring was required to enable different sets of cluster settings, depending on what servicetype is going to be started.
There are now two sets of cluster settings that can be utilised in
test_runner
. For virtual clustersvirtualClusterSettings
will be used, and for storage clustersclusterSettings
will be utilised.Fixes: #116019
Release Note: None
Epic: CRDB-31933
Release justification: Test only change.