kvserver: investigate exceeding uncommitted entry size #100096

tbg · 2023-03-30T12:21:47Z

Describe the problem

See #99464. We were seeing lots of these messages:

appending new entries to log would exceed uncommitted entry size limit; dropping proposal

That's not good! These proposals will have to be picked up by refreshProposalsLocked which takes ~200ms.

If something changed recently that causes us to hit this more frequently (as seems to be the case, anecdotally) we need to add at least rudimentary flow control here.

It is somewhat expected to hit this in general. To generate proposals and to put them into raft requires only disk reads, but getting them committed requires a durable write, including likely round-trips to followers (to obtain quorum). The former can be done at much higher throughput than the latter.

To Reproduce

Running the test in #99464 reproduces this in 100% of all runs I did, averaging around ~70 related log messages per run.

~~I also seem to hit this reliably with the unit test BenchmarkReplicaProposal~~ (edit: not true, we're dropping proposals but not becaue of uncommitted entry size but because of no leader)

Expected behavior

Some kind of flow control, similar to the quota pool but locally. Ideally we would delay grabbing a latch for a write if we can tell that raft is not currently accepting more proposals. Maybe we want to avoid doing most of this buffering inside of RawNode anyway, and instead improve our proposal buffer to provide this backpressure signal instead. Unsure, needs more thought.

Additional data / screenshots

Adding better visibility into this issue in #100083.

Jira issue: CRDB-26247

Epic CRDB-39898

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-03-30T12:21:55Z

cc @cockroachdb/replication

blathers-crl · 2023-03-30T12:43:50Z

Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl · 2023-03-30T12:43:55Z

cc @cockroachdb/replication

tbg · 2023-03-30T19:45:52Z

Surprise plot twist - I added logging that would fire whenever we'd get ErrProposalDropped from rawNode.Propose in the main proposal path. The logging never fired when the warning that triggered this investigation did. So I hacked a debug.PrintStack into the warning, lo and behold it's always from a "nonstandard" stack, via maybeUnquiesceAndWakeLeaderLocked and not via actual proposals to the raft group 😕

Details

W230330 15:16:11.208906 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 381  1 appending new entries to log would exceed uncommitted entry size limit; dropping proposal
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382  stack: ‹goroutine 216 [running]:›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹runtime/debug.Stack()›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    GOROOT/src/runtime/debug/stack.go:24 +0x65›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftLogger).Warningf(0xc003ac2780, {0x5c20e7d?, 0x0?}, {0xc0034601a0?, 0x13?, 0x13?})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft.go:92 +0x6b›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*raft).appendEntry(0xc001fab080, {0xc002fc37a0?, 0x1, 0x1})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:767 +0x178›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.stepLeader(0xc001fab080, {0x2, 0x0, 0x1, 0x0, 0x0, 0x0, {0xc002fc37a0, 0x1, 0x1}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1239 +0x256e›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*raft).Step(0xc001fab080, {0x2, 0x0, 0x1, 0x0, 0x0, 0x0, {0xc002fc37a0, 0x1, 0x1}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1156 +0xed5›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*RawNode).Propose(...)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/rawnode.go:89›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).maybeUnquiesceAndWakeLeaderLocked(0xc001609900)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft_quiesce.go:92 +0x358›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroupLocked(0xc001609900, 0x1?, 0xc002d2ad20?)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1975 +0x347›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:753 +0x2fc›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:709 +0x17b›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc0016a6000, 0xc000adbbf0?)›

This smells like a buglet. I think we propose a big batch of entries (which goes in, but now there's negative uncommitted size left), and then we for some reason - likely accidentally - hit the "unquiesce" path here

cockroach/pkg/kv/kvserver/replica_raft.go

Lines 1955 to 1956 in c2460f1

    
           if r.mu.internalRaftGroup.BasicStatus().Lead == 0 { 
        
           	// If we don't know the leader, unquiesce unconditionally. As a

and try to propose an empty entry, which gets rejected (because there is not a scrap of budget left). I wouldn't be surprised if we hit this path on each raft iteration. Possibly Lead is just not properly maintained in the "edge" case of a single-voter RawNode. That should be easy to verify.

tbg · 2023-03-30T19:58:49Z

Looking a bit more, we can only hit this unquiesce path if quiesced¹ so it must be a little more subtle than that. How can the replica be quiesced if it refuses an empty entry (meaning it has uncommitted log entries)?

It should fail this check:

cockroach/pkg/kv/kvserver/replica_raft_quiesce.go

Lines 289 to 294 in 76afb00

    
           		log.Infof(ctx, "not quiescing: proposals pending") 
        
           	} 
        
           	return nil, nil, false 
        
           } 
        
           // Don't quiesce if there's outstanding quota - it can lead to deadlock. This 
        
           // condition is largely subsumed by the upcoming replication state check,

or this check:

cockroach/pkg/kv/kvserver/replica_raft_quiesce.go

Lines 318 to 323 in 76afb00

    
           		log.Infof(ctx, "not quiescing: raft ready") 
        
           	} 
        
           	return nil, nil, false 
        
           } 
        
           status := q.raftSparseStatusRLocked()

or this one:

cockroach/pkg/kv/kvserver/replica_raft_quiesce.go

Lines 385 to 391 in 76afb00

    
           		log.Infof(ctx, "not quiescing: commit (%d) != lastIndex (%d)", 
        
           			status.Commit, lastIndex) 
        
           	} 
        
           	return nil, nil, false 
        
           } 
        
           if len(pausedFollowers) > 0 {

You get the idea.

Given how easy this is to reproduce, probably best to slap it with a bit of printf and have another go.

https://github.com/cockroachdb/cockroach/blob/76afb004ce71fac3c2bb19f33ef8c247e2775211/pkg/kv/kvserver/replica_raft_quiesce.go#L96-L97 ↩

The bytes printed after "wrote" where the append bytes only, this was confusing. Consolidate. Also, no need to print whether it's sync or non-blocking-sync again because we already printed that in the timing section. Found in cockroachdb#100096. Epic: none Release note: None

tbg · 2023-03-31T08:03:35Z

My current thinking focuses on this kind of logging, which we see frequently:

I230330 15:15:06.798493 197 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 231 raft ready handling: 1.10s [append=0.00s, apply=0.91s, commit-batch=0.00s, other=0.19s], wrote 49 B [apply=61 MiB (1)]; node might be overloaded

First of all, how can we have written 49 B when apply is 61 MiB (in a single entry no less)? The 49B is the append batch¹ which is ~empty here; this is a logging buglet that I'll send a PR for.

Second, I think that because applying this giant 61MiB batch (this is not an outlier) takes ~1s, there is plenty of time for the tick loop to come around and decide to quiesce the range - this does not acquire raftMu, i.e. can happen while raft processing is ongoing. It turns out that what raft is tracking here are really "entries not applied on the leader" rather than uncommitted bytes. We decrement the counter in reaction to an MsgStorageApplyResp. Still, we only consider quiescing if there aren't any entries in the proposal map, and the entry should be in the map until the corresponding entry is applied, right? The answer is actually no, as I happen to know thanks to countless hours spent on CRDB-25287 - log application is weird in that it takes the entries out of the map at the beginning of application. But I still can't get it to pan out, because we also check that Commit = Apply.

What follows is my best attempt at a timeline (but there never is a point in time at which the group could quiesce while having uncommitted bytes):

proposal is pending
handling cycle 1: 60mb entry A and 60mb entry B arrives, are appended jointly to log, raft uncommitted size is now "very large" (far beyond budget) - raft will allow one MsgProp to overshoot and A and B are in the same one so they get in jointly.
handling cycle 2: A and B are marked as committed (i.e. can in principle be applied), but only A comes up for application because we limit² the number of bytes that can be applied at any given point in time. The "uncommitted log" is still above threshold.
if a tick happens now, replica will not quiesce (because proposal is pending)
application of A begins
- remove entry from proposals map
- if a tick happens now, still can't quiesce (because Commit != Applied)
apply finishes
if a tick happens now, still can't quiesce because Commit != Applied (B is not applied)

Perhaps there are additional interleavings between tick and raft handling that I'm not seeing. It is not great that interleavings are even possible here. It would be a sight easier if quiescence happened on the raft goroutine.

Time for another run with more logging...

tbg · 2023-03-31T08:30:52Z

Hmm, the additional logging paints the picture that there might be something wrong with the uncommitted size tracking.

First, we see the replica quiesce at index 33 (stack trace shows that this is, unsurprisingly, via a tick)

I230331 08:10:47.227906 200 kv/kvserver/replica_raft_quiesce.go:194 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 233  XXX SHOULD QUIESCE c=33 a=33 li=33

then, ~400ms later, it unquiesces (from the raft handling loop) and logs that it dropped the empty entry proposed as a result:

I230331 08:10:47.610231 205 kv/kvserver/replica_raft_quiesce.go:94 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›] 234  XXX UNQUIESCE li=33
W230331 08:10:47.610324 205 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›] 235  1 appending new entries to log would exceed uncommitted entry size limit; dropping proposal

and then we append 30mb worth of entries, which for some reason is not dropped:

I230331 08:10:47.722485 205 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 237  raft ready handling: 0.11s [append=0.05s, apply=0.00s, commit-append-non-blocking-sync=0.06s, other=0.00s], wrote [append-batch=30 MiB, append-ent=30 MiB (1), ]; node might be overloaded

and which then immediately gets applied:

I230331 08:10:48.178266 203 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 238  raft ready handling: 0.45s [append=0.00s, apply=0.36s, commit-append-=0.00s, other=0.09s], wrote [append-batch=49 B, apply=30 MiB (1)]; node might be overloaded

The mystery is that we're refusing to append an empty entry but then, without applying anything, accept 30mb of entries. When we apply entries, we step the MsgStorageApplyResp directly into raft, so at the end of the last application cycle that preceded the quiesce, the uncommitted entry size ought to have been zero. And even if it hadn't, I don't know of anything that would've reset the uncommitted entry size between the rejected proposal and the subsequently successful one.

Needs more digging, probably via a raft fork that dumps extra info while we have uncommitted entry size tracked.

pav-kv · 2023-04-06T11:05:08Z

The mystery is that we're refusing to append an empty entry but then, without applying anything, accept 30mb of entries.

Can this be due to etcd-io/raft#11? The uncommitted log size can be reset upon election.

UPD: enabled logging and did not see "campaigning" messages around these dropped proposals and appends.

pav-kv · 2023-04-06T16:25:28Z

@tbg

and then we append 30mb worth of entries, which for some reason is not dropped:

I230331 08:10:47.722485 205 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 237  raft ready handling: 0.11s [append=0.05s, apply=0.00s, commit-append-non-blocking-sync=0.06s, other=0.00s], wrote [append-batch=30 MiB, append-ent=30 MiB (1), ]; node might be overloaded

This doesn't look like a new 30 MB proposal being accepted. These 30 MB is probably what the uncommitted buffer contained, and the reason why it rejected the unquiescence proposal. What we see in this log message is that the 30 MB finally gets flushed to the storage (via raft Ready handling).

Pass the proposals corresponding to the `ents` slice into `proposeBatch`. Log into each proposal's context and also log a message whenever we're dropping proposals on the floor. See cockroachdb#100096. Epic: none Release note: None

The bytes printed after "wrote" where the append bytes only, this was confusing. Consolidate. Also, no need to print whether it's sync or non-blocking-sync again because we already printed that in the timing section. Found in cockroachdb#100096. Epic: none Release note: None

Pass the proposals corresponding to the `ents` slice into `proposeBatch`. Log into each proposal's context and also log a message whenever we're dropping proposals on the floor. See cockroachdb#100096. Epic: none Release note: None

100270: kvserver: touch up raft ready handling log r=erikgrinaker a=tbg The bytes printed after "wrote" where the append bytes only, this was confusing. Consolidate. Also, no need to print whether it's sync or non-blocking-sync again because we already printed that in the timing section. Found in #100096. Epic: none Release note: None Co-authored-by: Tobias Grieger <[email protected]>

tbg · 2023-04-18T09:39:37Z

I think I understand this better now. This test writes lots of large blobs, and leaves enough time between blobs to allow the range to quiesce.

So we start with a quiesced replica, which now gets a ~50MiB proposal. First, this goes into the proposal buffer, which queues an update check:

cockroach/pkg/kv/kvserver/replica_proposal_buf.go

Lines 335 to 346 in 1cd507a

    
           func (b *propBuf) insertIntoArray(p *ProposalData, idx int) { 
        
           	b.arr.asSlice()[idx] = p 
        
           	if idx == 0 { 
        
           		// If this is the first proposal in the buffer, schedule a Raft update 
        
           		// check to inform Raft processing about the new proposal. Everyone else 
        
           		// can rely on the request that added the first proposal to the buffer 
        
           		// having already scheduled a Raft update check. 
        
           		b.p.enqueueUpdateCheck() 
        
           	} 
        
           } 
        
           func (b *propBuf) flushRLocked(ctx context.Context) error {

This triggers raft processing, which calls withRaftGroupLocked (which in turn passes a closure that flushes the proposal buffer):

https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_raft.go#L749-L752

Peeking into withRaftGroupLocked, we see that it invokes the closure before unquiescing:

cockroach/pkg/kv/kvserver/replica_raft.go

Lines 1968 to 1992 in 1cd507a

    
           unquiesce, err := func(rangeID roachpb.RangeID, raftGroup *raft.RawNode) (bool, error) { 
        
           	return f(raftGroup) 
        
           }(r.RangeID, r.mu.internalRaftGroup) 
        
           if r.mu.internalRaftGroup.BasicStatus().Lead == 0 { 
        
           	// If we don't know the leader, unquiesce unconditionally. As a 
        
           	// follower, we can't wake up the leader if we don't know who that is, 
        
           	// so we should find out now before someone needs us to unquiesce. 
        
           	// 
        
           	// This situation should occur rarely or never (ever since we got 
        
           	// stricter about validating incoming Quiesce requests) but it's good 
        
           	// defense-in-depth. 
        
           	// 
        
           	// Note that maybeUnquiesceAndWakeLeaderLocked won't manage to wake up the 
        
           	// leader since it's unknown to this replica, and at the time of writing the 
        
           	// heuristics for campaigning are defensive (won't campaign if there is a 
        
           	// live leaseholder). But if we are trying to unquiesce because this 
        
           	// follower was asked to propose something, then this means that a request 
        
           	// is going to have to wait until the leader next contacts us, or, in the 
        
           	// worst case, an election timeout. This is not ideal - if a node holds a 
        
           	// live lease, we should direct the client to it immediately. 
        
           	unquiesce = true 
        
           } 
        
           if unquiesce { 
        
           	r.maybeUnquiesceAndWakeLeaderLocked() 
        
           }

So we first flush the proposal buffer, then unquiesce. But flushing a 50MiB proposal from the buffer into the unstable raft log will consume the unstable log budget in raft. So by the time we unquiesce, there is no more space and the unquiesce's append gets rejected. This isn't a problem - after all, if there's unstable log, raft already needs to distribute something to the followers and is thus going to wake them up.

I don't think there's anything new here in this cycle other than that we made the copy roachtest more aggressive and able to hit this reliably (or maybe it always hit this and we just never looked at the logs).

I tried out this little patch which proposes a "true" empty command (instead of a nonempty command containing a "nil" CRDB payload) - true empty commands are exempt from uncommitted log size tracking - like this:

@@ -89,11 +88,10 @@ func (r *Replica) maybeUnquiesceAndWakeLeaderLocked() bool {
        r.store.unquiescedReplicas.Unlock()
        r.maybeCampaignOnWakeLocked(ctx)
        // Propose an empty command which will wake the leader.
-       data := raftlog.EncodeRaftCommand(raftlog.EntryEncodingStandardWithoutAC, makeIDKey(), nil)
-       _ = r.mu.internalRaftGroup.Propose(data)
+       _ = r.mu.internalRaftGroup.Propose(nil /* data */)
        return true
 }

and voila, I'm (unsurprisingly) no longer seeing the messages. I'll send a PR for that patch.

@aliher1911 I believe you mentioned also having seen this message "randomly". Do you have something I can look at? Maybe there are multiple things going on.

We used to unquiesce via a "noop" but not nil log entry but it turns out that it can happen that raft is out of budget for nontrivial log entries when unquiescing. So, use a nil one which is identical to what raft proposes when leadership changes. Closes cockroachdb#100096. Epic: none Release note: None

tbg · 2023-04-18T09:58:23Z

Chatted with Erik, removing GA-blocker since there's nothing new/bad here. Will check in with Oleg when he's back to see where else he saw this message.

100083: kvserver: record metrics for ErrProposalDropped r=pavelkalinnikov a=tbg Touches #100096. Epic: none Release note: None 105093: sql: use datum alloc for crdb_internal stmt stats rows r=dt a=dt Happened to observe a cluster running a customer test suite which included a query that inspected stmt stats often, causing the CRDB node to spend a considerable amount of CPU time in production of the stmt stats vtable, in particular allocating (and then GC'ing) individual datums, especially given how wide this table has become with the addition of storage stats. This change uses a datum allocator to produce those rows to reduce the number of separate allocations from the runtime. Release note: none. Epic: none. 105197: statusccl: skip flaky TenantStatusAPI tests r=zachlite a=zachlite Informs #92382, #99770, #99559 Epic: none Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Zach Lite <[email protected]>

tbg added C-investigation Further steps needed to qualify. C-label will change. T-kv-replication labels Mar 30, 2023

tbg self-assigned this Mar 30, 2023

This was referenced Mar 30, 2023

roachtest: copyfrom/crdb-nonatomic/sf=1/nodes=1 failed #99464

Closed

kvserver: record metrics for ErrProposalDropped #100083

Merged

exalate-issue-sync bot removed the T-kv-replication label Mar 30, 2023

erikgrinaker added the GA-blocker label Mar 30, 2023

erikgrinaker added the T-kv-replication label Mar 30, 2023

erikgrinaker added branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Mar 30, 2023

tbg mentioned this issue Mar 30, 2023

[dnm] kvserver: more logging for #100096 #99482

Closed

tbg mentioned this issue Mar 31, 2023

kvserver: touch up raft ready handling log #100270

Merged

tbg mentioned this issue Apr 18, 2023

kvserver: use actual empty proposal to unquiesce #101719

Closed

tbg removed the GA-blocker label Apr 18, 2023

exalate-issue-sync bot unassigned tbg Apr 24, 2023

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Incoming in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: investigate exceeding uncommitted entry size #100096

kvserver: investigate exceeding uncommitted entry size #100096

tbg commented Mar 30, 2023 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Mar 30, 2023

blathers-crl bot commented Mar 30, 2023

blathers-crl bot commented Mar 30, 2023

tbg commented Mar 30, 2023 •

edited

Loading

tbg commented Mar 30, 2023

tbg commented Mar 31, 2023

tbg commented Mar 31, 2023

pav-kv commented Apr 6, 2023 •

edited

Loading

pav-kv commented Apr 6, 2023

tbg commented Apr 18, 2023 •

edited

Loading

tbg commented Apr 18, 2023

kvserver: investigate exceeding uncommitted entry size #100096

kvserver: investigate exceeding uncommitted entry size #100096

Comments

tbg commented Mar 30, 2023 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Mar 30, 2023

blathers-crl bot commented Mar 30, 2023

blathers-crl bot commented Mar 30, 2023

tbg commented Mar 30, 2023 • edited Loading

tbg commented Mar 30, 2023

Footnotes

tbg commented Mar 31, 2023

Footnotes

tbg commented Mar 31, 2023

pav-kv commented Apr 6, 2023 • edited Loading

pav-kv commented Apr 6, 2023

tbg commented Apr 18, 2023 • edited Loading

tbg commented Apr 18, 2023

tbg commented Mar 30, 2023 •

edited by exalate-issue-sync bot

Loading

tbg commented Mar 30, 2023 •

edited

Loading

pav-kv commented Apr 6, 2023 •

edited

Loading

tbg commented Apr 18, 2023 •

edited

Loading