-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: ycsb/E/nodes=3/cpu=32 failed #106474
Comments
@cockroachdb/kv should this error have bubbled up to the client (workload)? I'll let you decide what to do with this failure. |
@arulajmani can you take a quick look at this. Looking more at this there are some confusing things:
Nothing indicates that the node is having any problems other than the clock jump. I believe that jump is causing the request to fail, but I'm unsure how. |
This can also mean the node was overloaded, btw. That would be my default assumption here, especially in conjunction with the The workload here doesn't tolerate errors, which, you know, technically it should always have to tolerate this one. SQL users "always" have to be prepared to handle it, though admittedly there shouldn't be a reason to see it in a healthy system. Still, the system in these benchmarks is not "healthy" since we're driving it as close to the limit as possible. |
Yeah, this isn't a clock jump, it's request latency. |
Is there something to do about this and related failures?
Not sure we have a good way to handle this in the workload in the general case (i.e., not safe to always retry). What would we prefer |
We basically have to implement the same client-side error handling that we ask all of our users to implement in their applications that use CRDB: https://www.cockroachlabs.com/docs/stable/transaction-retry-error-reference https://www.cockroachlabs.com/docs/stable/common-errors#result-is-ambiguous |
Created #107571 for us to keep track of this. I'm going to close this issue in favour of that one, as it describes the issue more directly. |
roachtest.ycsb/E/nodes=3/cpu=32 failed with artifacts on master @ 43c26aec0072f76e02e6d5ffc1b7079026b24630:
Parameters:
ROACHTEST_arch=amd64
,ROACHTEST_cloud=aws
,ROACHTEST_cpu=32
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-29579
The text was updated successfully, but these errors were encountered: