-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: schemachange/index/tpcc/w=800 failed [deadlock in kvflowhandle] #106078
Comments
Requesting KV help on this. There are approx ~11,000 messages like this in node 3's logs:
|
Seems to be related to #106285? |
@irfansharif can you take a look ? |
From n3 stacks google.golang.org/grpc/internal/transport.(*writeQuota).get(...) [select, 354 minutes]
I am guessing that some peer has stopped consuming messages. On n1, quotapool.(*AbstractPool).Acquire() [select, 233 minutes]
I found what looks like a deadlock in Details
There are a few goroutines trying to get that mutex. There's also this log msg:
That error message actually straight up leaks the mutex: cockroach/pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go Lines 98 to 102 in 3da5c73
@irfansharif I assume you'll take it from here. |
This is the same problem as #106349 I assume? |
Bor-sed a fix, I'll comb through these open CI issues. I don't understand why this tripped up now though, things have been enabled for a few weeks. Sorry for the noise! |
106206: prereqs: delete tests r=rail a=rickystewart These tests have always been skipped under Bazel because the implementation doesn't work in a Bazel world due to the dependency on `"golang.org/x/tools/go/packages"`. Since the command is only useful/ used in `make`, which is going to be deleted shortly, just delete the tests rather than waste time getting it working. Also this is the last `broken_in_bazel` test, so rip out all the corresponding logic too. Epic: none Release note: None Closes: #61924 Closes: #92814 106343: ci: don't do retries on tests in CI on master, release branches r=rail a=rickystewart First of all, test retries don't even have the correct behavior: #103042 This means that a successfully-retried test tramples the logs of previously-failed tests, which is very confusing and erases your ability to debug the test. Also, we are focusing on quality and wiping out flaky and skipped tests. This to me suggests we should not be retrying tests to let already-flaky tests through. Rather, we should be surfacing real failures immediately. For both of these reasons I turn off test retries for unit tests on `master` and release branches. We keep it for `staging` so `bors` is unaffected. Epic: none Release note: None 106411: kvflowhandle: fix mutex leak r=irfansharif a=irfansharif Fixes #106078. We were forgetting to unlock the mutex on the error path. Release note: None Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: irfan sharif <[email protected]>
roachtest.schemachange/index/tpcc/w=800 failed with artifacts on master @ aacba20d325e5702836e9a76be646b5f1bd922af:
Parameters:
ROACHTEST_arch=amd64
,ROACHTEST_cloud=gce
,ROACHTEST_cpu=16
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-29388
The text was updated successfully, but these errors were encountered: