-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that qps doesn't dip when gracefully draining a node #23274
Comments
+1, this seems pretty important. One requirement here is being able to programmatically obtain the load generator statistics (ideally while the load runs). I wonder if workload should export an HTTP interface for that. Or we can query the cluster statement statistics (this is nice because users should be able to access this information for their workload, too). Or workload could insert periodically into a statistics table that we can then query. |
Do you need cluster statement statistics, or access to some of the internal time series metrics? For a specific metric, the time series are already programmatically available (though the specific magic incantation is a bit involved). |
I think we'd want to be able to run a few different load gens eventually, and some of them might be so low in qps that their dip could be shadowed by a faster one (say kv). Maybe statement statistics can do well enough for starters. |
Good point, though for initial test a single QPS metric would suffice.
Yeah. I've forgotten the specifics of when these are reset and the info they contain, but certainly seems possible they could work. |
Most of this was addressed by #26542, which gracefully shuts down a third of a cluster and watches the QPS of |
I think that test is sufficient. |
I don't think that test actually tests this? That test is basically:
It is explicitly not trying to measure how an ongoing load is affected by the process of draining a node, and almost certainly would not have found the various bugs in the node draining logic that motivated this issue. |
The test verifies that QPS isn't affected by a node being gracefully drained and shut down. Fixes cockroachdb#23274 Release note: None
33188: roachtest: Add test of graceful draining during shutdown r=a-robinson a=a-robinson The test verifies that QPS isn't affected by a node being gracefully drained and shut down. Fixes #23274 Release note: None Co-authored-by: Alex Robinson <[email protected]>
Follow-up to cockroachdb#33188, which fixed cockroachdb#23274 Release note: None
This is an important scenario that could really use some regression test coverage, as indicated by the fact that nobody noticed or followed up on #22573 until more than 4 months after 1.1 was released.
This seems like a good fit to be a workload test -- run something like
kv
with its-max-rate
flag set, then gracefully stop a node and expect QPS to not dip more than a few percent below the specified-max-rate
. If we wanted to make this extra rigorous, #23202 could be used to pin all leases on the node that we stop before we stop it.cc @asubiotto
The text was updated successfully, but these errors were encountered: