-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: assert around snapshot sending/receiving #42011
storage: assert around snapshot sending/receiving #42011
Conversation
f7d1a40
to
31929d3
Compare
@tbg: I'm likely missing other spots where asserts would be useful. If you know of any I missed, let me know. |
31929d3
to
60ec8ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of other places to put these. There's some more information that we'll generally want to put into these errors plus we'll want to create RocksDB checkpoints whenever they occur, but other than that this looks good!
pkg/storage/replica_raftstorage.go
Outdated
@@ -936,6 +936,12 @@ func (r *Replica) applySnapshot( | |||
s.RaftAppliedIndex, snap.Metadata.Index) | |||
} | |||
|
|||
if expLen := (s.RaftAppliedIndex - s.TruncatedState.Index); expLen != uint64(len(logEntries)) { | |||
log.Fatalf(ctx, | |||
"received inconsistent number of log entries: got %d entries, expected %d entries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
For all of the fatals, also create a rocks Checkpoint (CreateCheckpoint, see
cockroach/pkg/storage/replica_proposal.go
Line 222 in a769be1
if err := r.store.engine.CreateCheckpoint(checkpointDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that maybe you want to extract a helper that you then call in all the places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateCheckpoint is not available in v2.1, it was only added in #36867.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
Done.
60ec8ce
to
ab8683d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far, thanks! Curious if we can get CreateCheckpoint onto release-2.1. It's really our biggest punch we can land if this bug comes up again
Reviewed 2 of 7 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/storage/raft_log_queue.go, line 354 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Done (but we don't modify decision.Input, nor does it seem that it should, which is why I put this up).
I know, just de-risking the change regardless. As a reviewer, I can't tell whetherInput
is embedded in truncateDecision
. If, for example, ChosenVia
were really in Input
, we'd have introduced some weirdness. I know nothing like that happened, but I like to keep the diffs braindead on release branches.
pkg/storage/raft_log_queue.go, line 397 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I certainly wouldn't want it to return a truncate decision that has NewFirstIndex > LastIndex
This already does happen (and why my first revision failed teamcity).
input.FirstIndex
is set to TruncatedState.Index + 1, so 11 for uinit'ed replicas, whereas LastIndex is 10.The code in line 384 above seems like it would set NewFirstIndex := 10 in this case. Is that not what happens?
It does, but given 10 < 11 (input.FirstIndex), it's brought back up to 11. And thus we have NewFirstIndex > LastIndex. So this is funky, only for uninit'ed replicas can we have input.FirstIndex > input.LastIndex (input.FirstIndex = input.LastIndex + 1).
Add that in a comment, please.
pkg/storage/replica_raftstorage.go, line 941 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
CreateCheckpoint is not available in v2.1, it was only added in #36867.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
Done.
Ah, that's a real bummer because some deployments tend to auto-restart crashing nodes which will wipe the evidence. Does c1d8a2e backport to release-2.1 somewhat cleanly?
pkg/storage/replica_raftstorage.go, line 944 at r2 (raw file):
"(RaftAppliedIndex=%d, TruncatedState.Index=%d, HardState=%s, ReceivedLogEntries=[%d,%d])", len(logEntries), expLen, s.RaftAppliedIndex, s.TruncatedState.Index, hs.String(), logEntries[0].Index, logEntries[len(logEntries)-1].Index)
logEntries will be empty if we see the same bug again, so make sure the assertion doesn't panic in that case.
pkg/storage/store_snapshot.go, line 283 at r2 (raw file):
// snapshot) and the truncated index should equal the number of log entries // shipped over. expLen := endIndex - firstIndex
This is the assertion that we expect to fire - the snapshot that was sent had zero entries. Make sure this gets an engine snapshot if you find that CreateCheckpoint does backport well enough. Without the snapshot I think we'll be unlikely to have much evidence left by the time we get to take a look. Like in the other place, also print the actual range of indexes we got here.
ab8683d
to
aa2b5a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR. I'll try backporting CreateCheckpoint in a separate PR (also it looks like it's missing from 19.1). I'll ping back here if successful 🤞
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/raft_log_queue.go, line 354 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I know, just de-risking the change regardless. As a reviewer, I can't tell whether
Input
is embedded intruncateDecision
. If, for example,ChosenVia
were really inInput
, we'd have introduced some weirdness. I know nothing like that happened, but I like to keep the diffs braindead on release branches.
Gotcha, that makes sense. I'll keep this heuristic in mind going forward.
pkg/storage/raft_log_queue.go, line 397 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add that in a comment, please.
Done.
pkg/storage/replica_raftstorage.go, line 941 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ah, that's a real bummer because some deployments tend to auto-restart crashing nodes which will wipe the evidence. Does c1d8a2e backport to release-2.1 somewhat cleanly?
Trying this in a separate PR.
pkg/storage/replica_raftstorage.go, line 944 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
logEntries will be empty if we see the same bug again, so make sure the assertion doesn't panic in that case.
Whoops, fixed.
pkg/storage/store_snapshot.go, line 283 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This is the assertion that we expect to fire - the snapshot that was sent had zero entries. Make sure this gets an engine snapshot if you find that CreateCheckpoint does backport well enough. Without the snapshot I think we'll be unlikely to have much evidence left by the time we get to take a look. Like in the other place, also print the actual range of indexes we got here.
Trying this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
aa2b5a9
to
3feb097
Compare
Rebased atop #42042, PTA(brief)L. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but I'm confused about the go version check.
Reviewed 5 of 5 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)
pkg/storage/replica_command.go, line 1033 at r4 (raw file):
sent, ); err != nil { if _, ok := err.(*MalformedSnapshotError); ok {
This looks brittle, I'd go for errors.Cause
to make sure that an intermittent errors.Wrap
isn't letting the error bypass this check. Also just test it manually (by always returning this error) and making sure that any test that involves snapshots fails with the proper fatal.
3feb097
to
eebb3f3
Compare
In v2.1 log entries are shipped alongside snapshots. The log entries included in snapshots (which also include the truncated state and the applied index) cover all indexes in the range [truncated-state.index + 1, applied-state]. We simply assert that this is always the case. We also assert during log truncations that the number of deleted entries is no more than what we expect (last index - first index). Additionally rename PendingPreemptiveSnapshotIndex to PendingSnapshotIndex, as it applies to both raft snapshots and pre-emptive snapshots. Release note: None
eebb3f3
to
93db76c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/replica_command.go, line 1033 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This looks brittle, I'd go for
errors.Cause
to make sure that an intermittenterrors.Wrap
isn't letting the error bypass this check. Also just test it manually (by always returning this error) and making sure that any test that involves snapshots fails with the proper fatal.
Done. Also checked manually that checkpoints are created (tests create in mem rocksdb instances, which don't create checkpoints it seems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
</god merge> |
In v2.1 log entries are shipped alongside snapshots. The log entries
included in snapshots (which also include the truncated state and the
applied index) cover all indexes in the range
[truncated-state.index + 1, applied-state]. We simply assert that this
is always the case. We also assert during log truncations that the
number of deleted entries is no more than what we expect (last index -
first index).
Additionally rename PendingPreemptiveSnapshotIndex to
PendingSnapshotIndex, as it applies to both raft snapshots and
pre-emptive snapshots.
Release note: None