-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: introduce LogSnapshot API #130967
raft: introduce LogSnapshot API #130967
Conversation
@tbg @nvanbenschoten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good the (lo,hi]
clean-up is great. I tried to scrutinize all of the index arithmetic and didn't see anything that looked off.
Some minor comments and questions about testing but I'll approve this pending some reaction to the comments, happy to do another quick pass on the resulting diff.
pkg/raft/log.go
Outdated
// first is the first available log index. | ||
first uint64 | ||
// storage contains the stable log entries. | ||
storage Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably saw my comment on the prototype. LogSnapshot
is definitely a step in the right direction, so I don't think it would make sense to try to take Storage
out of it at this point. In other words, ignore that comment as far as making code changes now goes. In spirit, I still think there's something there, but it may be for another day (which may never come).
// entries (e.g. the committed entries subject to state machine application), | ||
// allow the application to read them from LogSnapshot in the Ready handler. | ||
// This gives the application direct control on resource allocation, and | ||
// flexibility to do raft log IO without blocking RawNode operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕
4c0e9a0
to
497926b
Compare
130980: raft: clean up raftLog entry slice indexing r=tbg a=pav-kv This PR cleans up various `raftLog/unstable` methods, and converts the sub-slicing methods to `(lo, hi]` semantics instead of `[lo, hi)`. All raft log slices are constructed in context of being appended after a certain index, so `(lo, hi]` addressing makes more sense, and simplifies a bunch of +-1 arithmetics. Related to #130967, #130955 Co-authored-by: Pavel Kalinnikov <[email protected]>
7854e24
to
f41ae68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r9, 5 of 5 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv and @tbg)
pkg/raft/log.go
line 30 at r9 (raw file):
// // To access it safely, the user must not mutate the underlying raft log storage // between the snapshot is obtained and the reads are done.
"between when the ..."
pkg/raft/rawnode.go
line 123 at r9 (raw file):
// long as the application guarantees immutability of the underlying log storage // while the snapshot is being used. Typically, this means that the application // does not run a Ready() handling cycle until the snapshot is released.
It also means that the application does not Step the RawNode with any incoming messages, right?
pkg/raft/storage.go
line 89 at r10 (raw file):
// LogStorage is a read API for the raft log. type LogStorage interface {
The distinction here between LogStorage
and LogStorageRO
doesn't feel very strong to me, and it's not clear why it is valuable to users for them to be split into two separate interfaces. What about establishing a snapshot makes LogStorage
any less "read-only"? Is there a benefit to separating these out, given that any LogStorage
is also a LogStorageRO
? If anything, I would think we would want to go the other way around and specialize a LogStorageSnapshot
interface which extends LogStorage
with a guarantee of immutability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/rawnode.go
line 123 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It also means that the application does not Step the RawNode with any incoming messages, right?
No, RawNode
can be stepped - this is the key idea. With LogSnapshot()
method, we effectively return a consistent RawNode
snapshot (only the subset of it that's interesting to us in this case - the log). Once Replica.mu
is released, the RawNode
can run forward, but that will not invalidate our snapshot. What invalidates the snapshot is when we mutate the storage (and so the storage-related queries to the snapshot read updated state which can be inconsistent with our RawNode
snapshot). So while raftMu
is held, our snapshot is good, despite steps that can occur under Replica.mu
only.
The LogSnapshot
idea can be extended to a whole RawNode
snapshot. This is almost what Ready()
returns, but it mostly returns a delta today. Instead it could return a whole snapshot + indicators of what the delta is. Then the Ready
handler would iterate the delta and send it to storage/peers (under raftMu
only). Note that storage can mutate under RawNode
's nose, but that is fine - RawNode
always keeps the yet unstable part of the state in memory, so the storage can do the updates asynchronously, and RawNode
will never try to read those parts.
However, RawNode
may try to read from storage while only Replica.mu
is held, and it can be racing with raftMu
-only writes. There is no conflict in which keys are read/written, but there might be some conflicts/races w.r.t. in-memory-data structures that are involved in these paths.
I think this all exists today, we're just rethinking it.
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The distinction here between
LogStorage
andLogStorageRO
doesn't feel very strong to me, and it's not clear why it is valuable to users for them to be split into two separate interfaces. What about establishing a snapshot makesLogStorage
any less "read-only"? Is there a benefit to separating these out, given that anyLogStorage
is also aLogStorageRO
? If anything, I would think we would want to go the other way around and specialize aLogStorageSnapshot
interface which extendsLogStorage
with a guarantee of immutability.
Yeah, I like your idea in saying that LogStorageSnapshot
extends LogStorage
. I tried to play with this, it looks like:
// LogStorageRO is a read-only API for the raft log.
type LogStorageRO interface {
// ...
}
// LogStorageSnapshot is a read-only API for the raft log. It guarantees
// immutability of outside the raft package while the user holds this snapshot.
type LogStorageSnapshot LogStorageRO
// LogStorage is the API for the raft log.
type LogStorage interface {
LogStorageRO
// LogSnapshot returns an immutable point-in-time log storage snapshot.
LogSnapshot() LogStorageSnapshot
}
Which is almost identical to what's in this PR, modulo the type alias. Not sure if the type alias is worth it in this case.
Alternatively, we could structure it like below, to avoid one extra type:
// LogStorage is a read-only API for the raft log.
type LogStorage interface {
// ...
}
// LogStorageSnapshot is a read-only API for the raft log. It guarantees
// immutability outside the raft package while the user holds this snapshot.
type LogStorageSnapshot LogStorage
type Storage interface {
LogStorage
StateStorage
// LogSnapshot returns an immutable point-in-time log storage snapshot.
LogSnapshot() LogStorageSnapshot
}
Or simply:
// LogStorage is a read-only API for the raft log.
type LogStorage interface {
// ...
}
type Storage interface {
LogStorage
StateStorage
// LogSnapshot returns an immutable point-in-time log storage snapshot.
// The returned snapshot guarantees immutability outside raft package while
// the user holds this snapshot.
LogSnapshot() LogStorage
}
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/rawnode.go
line 123 at r9 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
No,
RawNode
can be stepped - this is the key idea. WithLogSnapshot()
method, we effectively return a consistentRawNode
snapshot (only the subset of it that's interesting to us in this case - the log). OnceReplica.mu
is released, theRawNode
can run forward, but that will not invalidate our snapshot. What invalidates the snapshot is when we mutate the storage (and so the storage-related queries to the snapshot read updated state which can be inconsistent with ourRawNode
snapshot). So whileraftMu
is held, our snapshot is good, despite steps that can occur underReplica.mu
only.The
LogSnapshot
idea can be extended to a wholeRawNode
snapshot. This is almost whatReady()
returns, but it mostly returns a delta today. Instead it could return a whole snapshot + indicators of what the delta is. Then theReady
handler would iterate the delta and send it to storage/peers (underraftMu
only). Note that storage can mutate underRawNode
's nose, but that is fine -RawNode
always keeps the yet unstable part of the state in memory, so the storage can do the updates asynchronously, andRawNode
will never try to read those parts.However,
RawNode
may try to read from storage while onlyReplica.mu
is held, and it can be racing withraftMu
-only writes. There is no conflict in which keys are read/written, but there might be some conflicts/races w.r.t. in-memory-data structures that are involved in these paths.I think this all exists today, we're just rethinking it.
So my mental model is:
- raft storage reads can occur when either
Replica.mu
orraftMu
is held (for reads or writes) - raft storage writes occur only when
raftMu
is held for writes?
Is this accurate? If not, would it make sense to make it accurate - are there gaps in this thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv and @tbg)
pkg/raft/rawnode.go
line 123 at r9 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
No,
RawNode
can be stepped - this is the key idea. WithLogSnapshot()
method, we effectively return a consistentRawNode
snapshot (only the subset of it that's interesting to us in this case - the log). OnceReplica.mu
is released, theRawNode
can run forward, but that will not invalidate our snapshot. What invalidates the snapshot is when we mutate the storage (and so the storage-related queries to the snapshot read updated state which can be inconsistent with ourRawNode
snapshot). So whileraftMu
is held, our snapshot is good, despite steps that can occur underReplica.mu
only.The
LogSnapshot
idea can be extended to a wholeRawNode
snapshot. This is almost whatReady()
returns, but it mostly returns a delta today. Instead it could return a whole snapshot + indicators of what the delta is. Then theReady
handler would iterate the delta and send it to storage/peers (underraftMu
only). Note that storage can mutate underRawNode
's nose, but that is fine -RawNode
always keeps the yet unstable part of the state in memory, so the storage can do the updates asynchronously, andRawNode
will never try to read those parts.However,
RawNode
may try to read from storage while onlyReplica.mu
is held, and it can be racing withraftMu
-only writes. There is no conflict in which keys are read/written, but there might be some conflicts/races w.r.t. in-memory-data structures that are involved in these paths.I think this all exists today, we're just rethinking it.
I'm confused by the comment that "as long as the application guarantees immutability of the underlying log storage while the snapshot is being used" then. Can't a Step due to an incoming message mutate the log storage? We might get incoming MsgApps from a new leader.
Or are you specifically referring to the raft.Storage
persistent storage interface and distinguishing that from the in-memory storage of log entries within the raft library? If so, what you're saying makes more sense because only a Ready handling cycle will mutate the persistent log storage, but then this comment could make that more clear. Maybe by replacing "the underlying log storage"
with "the underlying persistent log storage exposed through the Storage interface"
, or something along those lines.
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
Yeah, I like your idea in saying that
LogStorageSnapshot
extendsLogStorage
. I tried to play with this, it looks like:// LogStorageRO is a read-only API for the raft log. type LogStorageRO interface { // ... } // LogStorageSnapshot is a read-only API for the raft log. It guarantees // immutability of outside the raft package while the user holds this snapshot. type LogStorageSnapshot LogStorageRO // LogStorage is the API for the raft log. type LogStorage interface { LogStorageRO // LogSnapshot returns an immutable point-in-time log storage snapshot. LogSnapshot() LogStorageSnapshot }Which is almost identical to what's in this PR, modulo the type alias. Not sure if the type alias is worth it in this case.
Alternatively, we could structure it like below, to avoid one extra type:
// LogStorage is a read-only API for the raft log. type LogStorage interface { // ... } // LogStorageSnapshot is a read-only API for the raft log. It guarantees // immutability outside the raft package while the user holds this snapshot. type LogStorageSnapshot LogStorage type Storage interface { LogStorage StateStorage // LogSnapshot returns an immutable point-in-time log storage snapshot. LogSnapshot() LogStorageSnapshot }Or simply:
// LogStorage is a read-only API for the raft log. type LogStorage interface { // ... } type Storage interface { LogStorage StateStorage // LogSnapshot returns an immutable point-in-time log storage snapshot. // The returned snapshot guarantees immutability outside raft package while // the user holds this snapshot. LogSnapshot() LogStorage }WDYT?
I was thinking something closer to:
type LogStorage interface {
...
LogSnapshot() LogStorageSnapshot
}
type LogStorageSnapshot interface {
LogStorage
IsImmutableSnapshot() // annotation method
}
This allows users of the interface which require an immutable snapshot to use LogStorageSnapshot
, and others that don't to use LogStorage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/rawnode.go
line 123 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm confused by the comment that "as long as the application guarantees immutability of the underlying log storage while the snapshot is being used" then. Can't a Step due to an incoming message mutate the log storage? We might get incoming MsgApps from a new leader.
Or are you specifically referring to the
raft.Storage
persistent storage interface and distinguishing that from the in-memory storage of log entries within the raft library? If so, what you're saying makes more sense because only a Ready handling cycle will mutate the persistent log storage, but then this comment could make that more clear. Maybe by replacing"the underlying log storage"
with"the underlying persistent log storage exposed through the Storage interface"
, or something along those lines.
Correction:
- raft storage reads under
Replica.mu
are valid only because they never try to read keys thatraftMu
protects (the stable part as seen byRawNode
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/rawnode.go
line 123 at r9 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
Correction:
- raft storage reads under
Replica.mu
are valid only because they never try to read keys thatraftMu
protects (the stable part as seen byRawNode
)
NB: raft.Storage
exposes more than the stable state. It may contain an in-flight suffix that's not yet synced. But raft will never try to read it until it's reported back as stable.
So what's required from LogSnapshot()
is not exactly immutability of storage (though we can implement it this way, it's stronger/simpler), but it requires immutability of keys that RawNode
would read. For example, if we take a LogSnapshot()
when RawNode
thinks that the stable log is <= (term, index)
, then there must be no in-flight or future writes below this key until the snapshot is "released".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/log.go
line 41 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You probably saw my comment on the prototype.
LogSnapshot
is definitely a step in the right direction, so I don't think it would make sense to try to takeStorage
out of it at this point. In other words, ignore that comment as far as making code changes now goes. In spirit, I still think there's something there, but it may be for another day (which may never come).
Yeah, I think it's possible to move the "snapshot" concept out of RawNode
entirely. In some end state, I imagine the replica_raftstorage.go
and logstore
packages merged into the canonical LogStorage
implementation which observes all raft writes.
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I was thinking something closer to:
type LogStorage interface { ... LogSnapshot() LogStorageSnapshot } type LogStorageSnapshot interface { LogStorage IsImmutableSnapshot() // annotation method }This allows users of the interface which require an immutable snapshot to use
LogStorageSnapshot
, and others that don't to useLogStorage
.
Looks a bit recursive (which I was trying to avoid), but otherwise looks ok to me.
Let's consider the Storage.Entries
call. We basically want it to be flexible w.r.t. which mutex is the "main" one being held. Today, it's always r.mu
, so we have r.mu.AssertHeld()
in the Entries()
implementation (and it then uses the r.mu.stateLoader
etc). But we also want to be able to hold raftMu
only (which, in this implementation, requires using r.raftMu.stateLoader
instead; I think we don't need the stateloader in this call in the first place and have an idea how to avoid it, but will address this separately).
With your interface, it would be something like:
if IsImmutableSnapshot() {
r.raftMu.AssertHeld()
return logstore.LoadEntries(..., r.raftMu.stateLoader, ...)
}
r.mu.AssertHeld()
return logstore.LoadEntries(..., r.mu.stateLoader, ...)
With my interface, the snapshot-ness would be a static parameter of the type:
func (replicaRaftStorage) Entries(...) {
r.mu.AssertHeld()
return logstore.LoadEntries(...)
}
func (replicaRaftStorageSnap) Entries(...) {
r.raftMu.AssertHeld()
return logstore.LoadEntries(...)
}
Longer-term, I prefer the second approach, but the first one makes sense in the short term while we need more flexibility before arriving at the right abstractions (or eliminating all read paths with only mu
held).
f41ae68
to
5f682a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/log.go
line 30 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"between when the ..."
Done
pkg/raft/rawnode.go
line 123 at r9 (raw file):
@nvanbenschoten
If so, what you're saying makes more sense because only a Ready handling cycle will mutate the persistent log storage, but then this comment could make that more clear.
Clarified "the underlying log storage" with a note that it is what's behind the Storage
interface.
5f682a4
to
14bd4e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
Looks a bit recursive (which I was trying to avoid), but otherwise looks ok to me.
Let's consider the
Storage.Entries
call. We basically want it to be flexible w.r.t. which mutex is the "main" one being held. Today, it's alwaysr.mu
, so we haver.mu.AssertHeld()
in theEntries()
implementation (and it then uses ther.mu.stateLoader
etc). But we also want to be able to holdraftMu
only (which, in this implementation, requires usingr.raftMu.stateLoader
instead; I think we don't need the stateloader in this call in the first place and have an idea how to avoid it, but will address this separately).With your interface, it would be something like:
if IsImmutableSnapshot() { r.raftMu.AssertHeld() return logstore.LoadEntries(..., r.raftMu.stateLoader, ...) } r.mu.AssertHeld() return logstore.LoadEntries(..., r.mu.stateLoader, ...)With my interface, the snapshot-ness would be a static parameter of the type:
func (replicaRaftStorage) Entries(...) { r.mu.AssertHeld() return logstore.LoadEntries(...) } func (replicaRaftStorageSnap) Entries(...) { r.raftMu.AssertHeld() return logstore.LoadEntries(...) }Longer-term, I prefer the second approach, but the first one makes sense in the short term while we need more flexibility before arriving at the right abstractions (or eliminating all read paths with only
mu
held).
Made it even simpler for now, with no IsImmutableSnapshot
annotation. We're going to need a replicaRaftStorage
wrapper anyway (which will know that this is a snapshot), so exposing the method is unnecessary.
14bd4e2
to
153a169
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv and @tbg)
pkg/raft/log.go
line 662 at r12 (raw file):
return LogSnapshot{ first: l.firstIndex(), storage: storage,
Should we be calling storage.LogSnapshot()
here?
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
Made it even simpler for now, with no
IsImmutableSnapshot
annotation. We're going to need areplicaRaftStorage
wrapper anyway (which will know that this is a snapshot), so exposing the method is unnecessary.
The annotation method was a way for this library to require that implementors of LogStorage
also opt-in to the stronger requirements of LogStorageSnapshot
before unintentionally implementing both. It would also serve as a convenient place to hang off commentary about how application-layer immutability is being provided.
It's not particularly useful right now when there are no separate LogStorageSnapshot
implementations, but would be useful once we add some (e.g. MemoryStorageSnapshot
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)
pkg/raft/rawnode.go
line 124 at r12 (raw file):
// (exposed through the Storage interface) while the snapshot is being used. // Typically, this means that the application does not run a Ready() handling // cycle until the snapshot is released.
Would we gain anything if we instead used a consistent iterator? It would act like a snapshot, and could simplify the locking story even further - you could basically get a LogSnapshot
under replicaMu
and it would stay valid until you're done with it and is explicitly released. (Just a thought experiment with the aim to understand the constraints better, and not in the way of merging this PR).
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The annotation method was a way for this library to require that implementors of
LogStorage
also opt-in to the stronger requirements ofLogStorageSnapshot
before unintentionally implementing both. It would also serve as a convenient place to hang off commentary about how application-layer immutability is being provided.It's not particularly useful right now when there are no separate
LogStorageSnapshot
implementations, but would be useful once we add some (e.g.MemoryStorageSnapshot
).
Just caught up on this thread - suggestion to wrap up this discussion, as it seems to be what's holding this PR open and there'll be plenty of opportunity to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/raft/rawnode.go
line 124 at r12 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Would we gain anything if we instead used a consistent iterator? It would act like a snapshot, and could simplify the locking story even further - you could basically get a
LogSnapshot
underreplicaMu
and it would stay valid until you're done with it and is explicitly released. (Just a thought experiment with the aim to understand the constraints better, and not in the way of merging this PR).
Part of the log storage is the ad-hoc Sideloaded
storage which does not support consistent iterators/snapshots, so we should hold raftMu
or mu
to prevent a ready handling section messing with the log while we read it.
If not for this quirk, we could use a storage iterator/snapshot, though atm we are capable of doing this without support from the storage engine, at no extra cost, which seems nice.
We are [ab]using the fact that raftMu
is locked anyway, and we don't need more than one snapshot at a time. We would need > 1 only if there were multiple independent threads using a log snapshot to do some work: e.g. if we separate the proposer vs acceptor vs learner jobs, then the proposer needs a snapshot to send entries to followers, while the learner needs it to send committed entries to the state machine apply stack; in the meantime, the acceptor job may be appending entries to the log storage.
This snapshotting should be the application-side implementation detail. API-wise, raft should only say "this is a snapshot of the log storage" (and it cares only that it's immutable up to a particular log mark), and then the application implements it one way or another. In some future, raft log storage will be in its own engine, and taking an engine iterator/snapshot approach might be cheaper / more relevant.
The API allows reading from raft log while RawNode continues making progress. Practically, this means we can unlock Replica.mu after having obtained the log snapshot, and can safely read from it while only holding Replica.raftMu. Epic: none Release note: none
153a169
to
3fdfbef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/log.go
line 662 at r12 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we be calling
storage.LogSnapshot()
here?
Not sure how to best implement it, but the idea was:
- The change should be a no-op. There is existing code in
RawNode
calling log storage methods, so we want these to be served from the original log storage impl rather than a snapshot. So by default we don't callLogSnapshot()
. - Only when we want to use the snapshot with extended guarantees we want to call the new
LogSnapshot()
. Initially it'll be RACv2 only.
Since LogStorageSnapshot
type "extends" LogStorage
, it's fine to use the GCD of the two (the LogStorage
type) to implement the common functionality.
Changed the type of storage:
field back to LogStorage
, to this effect.
pkg/raft/storage.go
line 89 at r10 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Just caught up on this thread - suggestion to wrap up this discussion, as it seems to be what's holding this PR open and there'll be plenty of opportunity to revisit.
Leaving as is for now, and will adjust API (if needed) in the next PR that implements LogSnapshot()
for our raft storage.
TFTRs! Will follow up with the interface implementation soon. bors r=tbg |
The API allows reading from raft log while
RawNode
continues making progress. Practically, this means we can unlockReplica.mu
after having obtained the log snapshot, and can safely read from it while only holdingReplica.raftMu
.To be used for:
Ready
handler while only holdingraftMu
(instead of bothmu
andraftMu
).MsgApp
messages on demand, driven by RACv2. Today these messages can be constructed (and incur IO) eagerly on anyStep
while holdingmu
.Part of #128779
Part of #130955