-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: add MaxInflightBytes limit to the outgoing messages flow #14624
Conversation
Codecov Report
@@ Coverage Diff @@
## main #14624 +/- ##
==========================================
- Coverage 75.47% 75.45% -0.02%
==========================================
Files 457 457
Lines 37320 37330 +10
==========================================
+ Hits 28166 28168 +2
- Misses 7376 7385 +9
+ Partials 1778 1777 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ec57a2e
to
d528f8b
Compare
d1f3221
to
626be8a
Compare
626be8a
to
53e5398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have only skimmed this now, I think this will be close to ready when the precursor PR is in. 👍🏽
53e5398
to
12424c6
Compare
637cd6a
to
cd3de31
Compare
cc @ahrtr you probably want to take a look as well. |
615df18
to
30c9fdb
Compare
Overall looks good to me. Probably we should add a check: the |
Please also add two changelog items, one for this feature, and the other for #14633 |
30c9fdb
to
215b799
Compare
@ahrtr I'm on the fence here, since both are soft limits that can be violated slightly anyway, and both are used independently. If they were hard limits, it would make more sense, because it would be possible to not send anything if the sizes are misconfigured. Other than this, I addressed all comments. Want to take another look? |
Maybe we should not overload Lines 228 to 230 in 215b799
|
Sounds good to me as a pattern to follow, and since there's precedent we should be consistent.
@pavelkalinnikov while you're here mind giving MaxSizePerMsg a similar comment to the one we wrote for MaxInflightBytes (bandwidth-delay product etc), and make sure they cross-reference each other? I think @ahrtr has a point. Lines 460 to 462 in 215b799
It is very likely a misconfiguration to have The code above could be improved: rather than pulling entries until the sum is >= var maxBytes uint64
if pr.State == tracker.StateReplicate {
maxBytes = min(r.maxMsgSize, pr.Inflights.BytesBudget())
} else {
// When follower is being probed, send only one entry.
// TODO: should we send *no* entries? Or maybe the optimistic
// strategy of sending maxMsgSize worth of entries is actually
// better in practice? We've seen some *very divergent* raft logs
// in CRDB incidents so I'm inclined to be conservative here and
// send as little data as possible.
maxBytes = 1
}
var ents []pb.Entry
var erre error
// In a throttled StateReplicate only send empty MsgApp, to ensure progress.
// Otherwise, if we had a full Inflights and all inflight messages were in
// fact dropped, replication to that follower would stall. Instead, an empty
// MsgApp will eventually reach the follower (heartbeats responses prompt the
// leader to send an append), allowing it to be acked or rejected, both of
// which will clear out Inflights.
if maxBytes > 0 {
ents, erre = r.raftLog.entries(pr.Next, r.maxMsgSize)
}
if len(ents) == 0 && !sendIfEmpty {
return false
} and at that point it should be clear that it makes no sense to set |
e1cb79e
to
282b773
Compare
@tbg Added a couple of checks in the config, as suggested by you and @ahrtr. @tbg I like this bit: if pr.State == tracker.StateReplicate {
maxBytes = min(r.maxMsgSize, pr.Inflights.BytesBudget())
} else { The maxBytes := r.maxMsgSize
if pr.State == tracker.StateReplicate {
maxBytes = min(maxBytes, pr.Inflights.BytesBudget())
}
// ...
if maxBytes > 0 {
ents, erre = r.raftLog.entries(pr.Next, maxBytes)
} Although I think it would break the edge case of |
Could we map
Sounds good, let's do a follow-up soon while we remember this piece of code. |
The Inflights type has limits on the message size and the number of inflight messages. However, a single large entry that exceeds the size limit can still be sent. In combination with the max messages count limit, many large messages can be sent in a row and overflow the receiver. In effect, the "max" values act as "target" rather than hard limits. This commit adds an additional soft limit on the total size of inflight messages, which catches such situations and prevents the receiver overflow. Signed-off-by: Pavel Kalinnikov <[email protected]>
This commit plumbs the max total byte size of the Inflights type higher up the stack to the ProgressTracker. Signed-off-by: Pavel Kalinnikov <[email protected]>
Signed-off-by: Pavel Kalinnikov <[email protected]>
This commit introduces the max inflight bytes setting at the Config level, and tests that raft flow control honours it. Signed-off-by: Pavel Kalinnikov <[email protected]>
Signed-off-by: Pavel Kalinnikov <[email protected]>
282b773
to
0ef5df1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank yo @pavelkalinnikov
Raft flow control limits the message size and the number of inflight messages. However, a single large entry that exceeds the size limit can still be sent. In combination with the max messages count limit, many large messages can be sent in a row and overflow the receiver. In effect, the "max" size value acts as "target" which can be missed.
This PR adds an additional soft limit on the total size of inflight messages, which catches such situations and prevents the receiver overflow.