-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: schemachange/index/tpcc/w=1000 failed #36094
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/25398c010b2af75b11fed189680ea6b9645f0cf5 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1199659&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/23f9707873abbd2de91a42055535529d7ff296ce Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1209900&tab=buildLog
|
this test seems to have passed. But I don't understand the error above. The test seems to have move forward after hitting the error. |
SHA: https://github.com/cockroachdb/cockroach/commits/5921cf0dcc76548931cc85500c0fa2186a82142f Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1212185&tab=buildLog
|
node 3 died on an OOO |
SHA: https://github.com/cockroachdb/cockroach/commits/5267932f6fec0405b31328c1ad43711b0bb013e5 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1220238&tab=buildLog
|
It looks like a node died around an hour into the test, and then the test never returned until it timed out (at the 6h mark):
The timeout wasn't handled properly and prevented debug info from being collected (the contexts were expired so all the commands failed), so we don't know why a node died. I'm honestly not quite sure why the contexts were canceled (from the looks of it, we shouldn've only cancelled the one given to the test, not the one used for getting the debugging) but the main problem is also that we were destroying the cluster the very moment the timeout hit, so there wasn't a chance in the world that the debugging code would've grabbed something useful from it. |
SHA: https://github.com/cockroachdb/cockroach/commits/1a5eabad4511a3371a6b2809d2bfc29e8aff66a6 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1224702&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/6da68d7fe2c9a29b85e2ec0c7e545a0d6bdc4c5c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1226521&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/58c458efeaa3b38c8c982f23a36381aac1b1004b Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1226503&tab=buildLog
|
New failures here over the weekend that look like overloaded nodes. @vivekmenezes can you triage? |
This is not looking like a legit failure. Will continue to keep an eye on it for now. |
Can you help me understand why these failures are not legit? I don't think nodes should be dying during this test. |
SHA: https://github.com/cockroachdb/cockroach/commits/682c2f2f466bbf768545ca4687822206a63983ad Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1231772&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/bf399d2677783dc1eea7f5ede6d4561f95c0ea10 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1234662&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/509c5b130fb1ad0042beb74e083817aa68e4fc92 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237068&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/0c83360778c511ab79103aefd8f5e3a115990144 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237179&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/9e6ae3cc37e7691147bb6f5d1a156ebe4c5cf7f9 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1245443&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/c65b71a27e4d0941bf9427b5dec1ff7f096bba7b Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1245461&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/83de585d331b05a4aa02a65b353bed6bf829b696 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1247383&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/4b3a1216e3a387aad900e70fde65b97b0fa17a8c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1251417&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/99306ec3e9fcbba01c05431cbf496e8b5b8954b4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260033&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/7b2651400b2003d0a381cba9dbfc0b7bc0dfee00 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1293898&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/923a3b2a6f4a6492883141092280d1041de1381a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295056&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/cab299a0ef983f8b4ffe5d724e44587d9665d3a3 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295811&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/58c567a325056033b326cb9c4ed9ba490e8956da Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1296592&tab=buildLog
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Previous two issues addressed by #37701. |
SHA: https://github.com/cockroachdb/cockroach/commits/c9301cf71ea69da451fe5e5ba2c3074a4fe53831 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1303699&tab=buildLog
|
Same failure as #37488 (comment). |
SHA: https://github.com/cockroachdb/cockroach/commits/630a6e9cb3771912cd138f9aa3bea1f0ca9fa7c9 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306250&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/fc7e48295cd05f94fd2883498d96d91ad538e559 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308263&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/6bc296955cbbc4313d91b94ee129b73b81ab12f4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1337184&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1360435&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1361643&tab=buildLog
|
@andreimatei it looks like this old friend is back. Interestingly, this test failed here and in #36024 (comment) on the same night after not showing up for a long time. I wonder if something changed recently to make this possible again. |
SHA: https://github.com/cockroachdb/cockroach/commits/90841a6559df9d9a4724e1d30490951bbdb811b4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364443&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/537767ac9daa52b0026bb957d7010e3b88b61071 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364821&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog
|
Fixes cockroachdb#36024. Fixes cockroachdb#36094. 8b5bafb ensured that all transaction state was propagated by DistSender on errors. In doing so, it touched that fact that DistSender drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024 and cockroachdb#36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in DistSender. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the DistSender equivalent of 574e805, which is why they now share code. Release note: None
Fixes cockroachdb#36024. Fixes cockroachdb#36094. 8b5bafb ensured that all transaction state was propagated by DistSender on errors. In doing so, it touched that fact that DistSender drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024 and cockroachdb#36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in DistSender. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the DistSender equivalent of 574e805, which is why they now share code. Release note: None
38579: kv: prioritize severe errors when merging partial batches in DistSender r=andreimatei a=nvanbenschoten Fixes #36024. Fixes #36094. 8b5bafb ensured that all transaction state was propagated by `DistSender` on errors. In doing so, it touched that fact that `DistSender` drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the `TxnCoordSender`, triggering panics like the ones we see in #36024 and #36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in `DistSender`. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the `DistSender` equivalent of 574e805, which is why they now share code. Co-authored-by: Nathan VanBenschoten <[email protected]>
SHA: https://github.com/cockroachdb/cockroach/commits/5a746073c3f8ede851f37dd895cf1a91d6dcc3cf
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1195714&tab=buildLog
The text was updated successfully, but these errors were encountered: