-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: cdc/crdb-chaos/rangefeed=true failed #34600
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/40e403544d60b1a44b8b1ed961a817c77d67aa31 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1131411&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/b80d241d5693d6ba2bde01ae9167b49b04b47226 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1133193&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/10f8010fa5778e740c057905e2d7664b5fd5d647 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1135549&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/acba091f04f3d8ecabf51009bf394951fbd3643c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1137872&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/d7c56dcb87c8f187e50303c8e32a64836c42515c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1139797&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/4fc4d63ddddeb507564bfe53c6cd79549ff2fd27 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1141650&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/8e9a1e310e3e8e37f091b7ca8bd204084ad9e2e5 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1142461&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/bd80a74f882a583d6bb2a04dfdb57b49254bc7ba Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1143393&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/2e16b7357d5a15d87cd284d5f2c12e424ed29ca1 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1146277&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/763e90b043b5d732856d3ecf1d7b0d6aa33e3b26 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1147715&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/46d1ab4c06edfd37d875114972c55b44105acd83 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1147903&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/d0a93d286ee42ceb94f986b6a06a1afd52cf914e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149020&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/eaad50f0ea64985a0d7da05abb00cc9f321c5fa9 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149743&tab=buildLog
|
Recent failures are like #35114 (comment). |
SHA: https://github.com/cockroachdb/cockroach/commits/71681f60031c101f17339236e2ba75f7a684d1a1 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1155904&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/71681f60031c101f17339236e2ba75f7a684d1a1 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1155867&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/032c4980720abc1bdd71e4428e4111e6e6383297 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1158877&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/de1793532332fb64fca27cafe92d2481d900a5a0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1160394&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/959dcf7de0f94cfcfa0062387b109adebd1f11da Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1163702&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/5f7de3c35348e847dc0d4ab1ba8d76574e1706c2 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1169980&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/a119a3a158725c9e3f9b8084d9398601c0e67007 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1170795&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/9d058d53c8a82fceb2205f1827c26f1bf36c32ba Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1172386&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/a512e390f7f2f2629f3f811bab5866c46e3e5713 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1174765&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/57e825a7940495b67e0cc7213a5fabc24e12be0e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1176948&tab=buildLog
|
cc @nvanbenschoten / @danhhz
|
Woah, interesting |
I just saw this on a run of cdc/tpcc-1000 I was doing with #35745. @ajwerner any chance this could be #35130?
|
Hmmm the last failure claims to have happened at 57e825a which doesn't include that PR. |
And mine doesn't either. Okay no clue what's going on here, but I've never seen this panic before 24h ago |
Yeah I just landed the referenced change. I don't know all that much about the resolved timestamp, how does it relate to the closed timestamp? Before that change it was absolutely possible for a closed timestamp to "regress" for a given range in the face of lease transfer but any consumer of closed timestamps can safely use the maximum seen value. |
Resolved timestamp = closed timestamp + no intents outstanding. RangeFeeds do a scan of all unresolved intents on the range when they start up, and then when they get a "Logical Op" that resolves an intent, they remove it from being tracked. Similarly when they get a logical op that creates an intent or advances the timestamp of an intent, they put it in the min heap or move it. This or a closed timestamp can advance the resolved timestamp, at which time RangeFeed will emit it. I'd start with https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/rangefeed/resolved_timestamp.go which is pretty well factored out. |
Informs cockroachdb#34600. It is critical that `unresolvedIntentQueue` properly handle negative reference counts to avoid leaking references before a rangefeed has finished its resolved timestamp initialization scan. However, once this scan is complete, reference counts on transactions should never drop below zero. Such an occurrence would indicate that an intent was lost either during the initial scan or somewhere in the logical ops stream. Based on the stacktraces in cockroachdb#34600, I believe this is what is happening because we can see that an `MVCCCommitIntentOp` is triggering the assertion. This commit will make this more explicit and should hopefully also fire more because it won't rely on the intent with a negative refcount being the oldest intent tracked to fire. Release note: None
Fixes cockroachdb#34600. Before this commit, rangefeed made the assumption that an aborted intent implied an aborted transaction (after the rangefeed was in a steady-state). It used this assumption to eagerly discard the bookkeeping for a transaction after its first MVCCAbortIntent operation was seen as an optimization. This was incorrect. A transaction can "abort" an intent even if it commits. This is possible if the intent was only written at a previous epoch and not written in the epoch that ended up finalizing the transaction. This commit removes this assumption, which I confirmed fixes the resolved timestamp regression using the new assertion from cockroachdb#35772. I believe this could have two effects: - it could trigger the assertion failure from cockroachdb#34600 if the transaction experiencing the incorrect assumption was the oldest being tracked at the time of the assumption. - it could cause resolved timestamp stalls and prevent all future forward progress of resolved timestamps if the transaction experiencing the incorrect assumption was not the oldest being tracked at the time of the assumption. Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/70e3468e7ed5fb30b10eaaf972acbb0f16099d01 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1178890&tab=buildLog
|
Here it goes again, but looks like @nvanbenschoten already figured this out: #35777
|
35772: rangefeed: improve assertion when txn refcount becomes negative r=nvanbenschoten a=nvanbenschoten Informs #34600. It is critical that `unresolvedIntentQueue` properly handle negative reference counts to avoid leaking references before a rangefeed has finished its resolved timestamp initialization scan. However, once this scan is complete, reference counts on transactions should never drop below zero. Such an occurrence would indicate that an intent was lost either during the initial scan or somewhere in the logical ops stream. Based on the stacktraces in #34600, I believe this is what is happening because we can see that an `MVCCCommitIntentOp` is triggering the assertion. This commit will make this more explicit and should hopefully also fire more because it won't rely on the intent with a negative refcount being the oldest intent tracked to fire. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#34600. Before this commit, rangefeed made the assumption that an aborted intent implied an aborted transaction (after the rangefeed was in a steady-state). It used this assumption to eagerly discard the bookkeeping for a transaction after its first MVCCAbortIntent operation was seen as an optimization. This was incorrect. A transaction can "abort" an intent even if it commits. This is possible if the intent was only written at a previous epoch and not written in the epoch that ended up finalizing the transaction. This commit removes this assumption, which I confirmed fixes the resolved timestamp regression using the new assertion from cockroachdb#35772. I believe this could have two effects: - it could trigger the assertion failure from cockroachdb#34600 if the transaction experiencing the incorrect assumption was the oldest being tracked at the time of the assumption. - it could cause resolved timestamp stalls and prevent all future forward progress of resolved timestamps if the transaction experiencing the incorrect assumption was not the oldest being tracked at the time of the assumption. Release note: None
35777: rangefeed: don't discard ref count for txn after intent abort r=nvanbenschoten a=nvanbenschoten Fixes #34600. Before this commit, rangefeed made the assumption that an aborted intent implied an aborted transaction (after the rangefeed was in a steady-state). It used this assumption to eagerly discard the bookkeeping for a transaction after its first MVCCAbortIntent operation was seen as an optimization. This was incorrect. A transaction can "abort" an intent even if it commits. This is possible if the intent was only written at a previous epoch and not written in the epoch that ended up finalizing the transaction. This commit removes this assumption, which I confirmed fixes the resolved timestamp regression using the new assertion from #35772. I believe this could have two effects: - it could trigger the assertion failure from #34600 if the transaction experiencing the incorrect assumption was the oldest being tracked at the time of the assumption. - it could cause resolved timestamp stalls and prevent all future forward progress of resolved timestamps if the transaction experiencing the incorrect assumption was not the oldest being tracked at the time of the assumption. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
SHA: https://github.com/cockroachdb/cockroach/commits/82026117d83262e87873aad52b8eca2dd0bea335
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1126329&tab=buildLog
The text was updated successfully, but these errors were encountered: