Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: v21.1.7: raft closed timestamp regression in cmd #78419

Closed
cockroach-teamcity opened this issue Mar 24, 2022 · 4 comments
Closed

kvserver: v21.1.7: raft closed timestamp regression in cmd #78419

cockroach-teamcity opened this issue Mar 24, 2022 · 4 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 24, 2022

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/3128226386/?referrer=webhooks_plugin

Panic message:

store_raft.go:524: log.Fatal: ???: raft closed timestamp regression in cmd: "\x05\xb7\xb9\xedP%7\xc9" (term: 62, index: 1023824); batch state: 1648059645.360138000,0, command: 1648050197.443306000,0, lease: repl=(n1,s1):1 seq=55 start=1647835210.740091000,0 exp=1648060927.990359000,0 pro=1648060918.990359000,0, req: <unknown; not leaseholder>, applying at LAI: 10066.
(1) assertion failure
Wraps: (2) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaAppBatch).assertNoCmdClosedTimestampRegression
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1099
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaAppBatch).Stage
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:465
| github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCmdIter
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:175
| github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:280
| github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:803
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:466
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:523
| github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker
| /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284
| github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
| /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1374
Wraps: (3) raft closed timestamp regression in cmd: "\x05\xb7\xb9\xedP%7\xc9" (term: 62, index: 1023824); batch state: 1648059645.360138000,0, command: 1648050197.443306000,0, lease: repl=(n1,s1):1 seq=55 start=1647835210.740091000,0 exp=1648060927.990359000,0 pro=1648060918.990359000,0, req: <unknown; not leaseholder>, applying at LAI: 10066.
| Closed timestamp was set by req: <unknown; not leaseholder or not lease request> under lease: %!s(PANIC=SafeFormatter method: value method github.com/cockroachdb/cockroach/pkg/roachpb.Lease.SafeFormat called using nil *Lease pointer); applied at LAI: 0. Batch idx: 0.
| This assertion will fire again on restart; to ignore run with env var COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED=trueRaft log tail:
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
|
| ×
| ×
| ×
| ×
Error types: (1) *assert.withAssertionFailure (2) *withstack.withStack (3) *errutil.leafError
--
*errutil.leafError: log.Fatal: ???: raft closed timestamp regression in cmd: "\x05\xb7\xb9\xedP%7\xc9" (term: 62, index: 1023824); batch state: 1648059645.360138000,0, command: 1648050197.443306000,0, lease: repl=(n1,s1):1 seq=55 start=1647835210.740091000,0 exp=1648060927.990359000,0 pro=1648060918.990359000,0, req: <unknown; not leaseholder>, applying at LAI: 10066. (1)
*secondary.withSecondaryError: details for github.com/cockroachdb/errors/withstack/*withstack.withStack::: (2)
store_raft.go:524: *withstack.withStack (top exception)
(check the extra data payloads)

Stacktrace (expand for inline code snippets):

stats, expl, err := r.handleRaftReady(ctx, noSnap)
removed := maybeFatalOnRaftReadyErr(ctx, expl, err)
elapsed := timeutil.Since(start)
in pkg/kv/kvserver.(*Store).processReady
if state.flags&stateRaftReady != 0 {
s.processor.processReady(ctx, id)
}
in pkg/kv/kvserver.(*raftScheduler).worker
f(ctx)
}()
in pkg/util/stop.(*Stopper).RunAsyncTask.func1
/usr/local/go/src/runtime/asm_amd64.s#L1373-L1375 in runtime.goexit

pkg/kv/kvserver/store_raft.go in pkg/kv/kvserver.(*Store).processReady at line 524
pkg/kv/kvserver/scheduler.go in pkg/kv/kvserver.(*raftScheduler).worker at line 284
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTask.func1 at line 351
/usr/local/go/src/runtime/asm_amd64.s in runtime.goexit at line 1374
Tag Value
Cockroach Release v21.1.7
Cockroach SHA: 1fac61a
Platform darwin amd64
Distribution CCL
Environment development
Command start-single-node
Go Version ``
# of CPUs
# of Goroutines

Jira issue: CRDB-14122

@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Mar 24, 2022
@tbg
Copy link
Member

tbg commented Mar 24, 2022

development + darwin, so not a credible report. I'm surprised these even get forwarded to this issue tracker.

@yuzefovich
Copy link
Member

yuzefovich commented Mar 30, 2022

I think we had a bug that all binaries would be marked as "development", and the bug has only been fixed like in the last few weeks.

@yuzefovich yuzefovich changed the title sentry: store_raft.go:524: log.Fatal: ???: raft closed timestamp regression in cmd: "\x05\xb7\xb9\xedP%7\xc9" (term: 62, index: 1023824); batch state: 1648059645.360138000,0, command: 1648050197.443306000,0, ... kvserver: v21.1.7: raft closed timestamp regression in cmd Mar 30, 2022
@erikgrinaker
Copy link
Contributor

Seems plausible that this could be caused by a clock jump, e.g. due to OS suspend/resume, which is being looked into in #70894 with a PR in #75298.

@erikgrinaker
Copy link
Contributor

I'm going to close this, tracked by #70894.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

No branches or pull requests

4 participants