Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: basic multi-day roachtest #31223

Closed
petermattis opened this issue Oct 10, 2018 · 1 comment
Closed

roachtest: basic multi-day roachtest #31223

petermattis opened this issue Oct 10, 2018 · 1 comment
Assignees
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-stale

Comments

@petermattis
Copy link
Collaborator

Roachtests currently run nightly and are expected to run for a few hours at most. The expectation when roachtests were being created is that we'd eventually add longer running roachtests. The time for doing so has arrived. We should get started with a very basic roachtest that is automatically run weekly (we'll need a "Roachtest Weekly" teamcity config).

My current thinking around a multi-day roachtest is that it should run a well know load and perform a series of normal admin operations. For example, we could create a 7 node cluster, restore TPC-C 1K (for which this cluster would be over provisioned), run load for a day, perform a rolling restart (to simulate a rolling upgrade), add an index to a table, drop an index from a table, etc.

Success criteria would be monitoring the load and verifying that it never stalls and that there are no unexpected crashes.

@petermattis petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-testing Testing tools and infrastructure labels Oct 10, 2018
@danhhz
Copy link
Contributor

danhhz commented Nov 13, 2018

I'd like to propose that as an intermediate state (or possibly a second test), simply running our max warehouse tpcc configuration for some hardware for multiple days. I've been seeing at least two issues while trying to test cdc over tpcc which I strongly suspect would also be present without cdc. 1) I get "error in newOrder: missing stock row" frequently enough that the cdc tests now have to run with --tolerate-errors and 2) #32058.

I haven't heard of us seeing anything like these on the release cluster that's running tpcc, but perhaps there's some difference between it and a roachprod cluster.

In particular, this could just be a version of tpcc/nodes=3/w=max that runs for a longer time. Which means the setup for getting this going is more or less just teamcity stuff, which could be done very quickly. Then, the more complicated test suggested in the issue text could be worked on while we're already starting to get data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-stale
Projects
None yet
Development

No branches or pull requests

4 participants