Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: investigate exceeding uncommitted entry size #100096

Open
tbg opened this issue Mar 30, 2023 · 11 comments
Open

kvserver: investigate exceeding uncommitted entry size #100096

tbg opened this issue Mar 30, 2023 · 11 comments
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-investigation Further steps needed to qualify. C-label will change. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Mar 30, 2023

Describe the problem

See #99464. We were seeing lots of these messages:

appending new entries to log would exceed uncommitted entry size limit; dropping proposal

That's not good! These proposals will have to be picked up by refreshProposalsLocked which takes ~200ms.

If something changed recently that causes us to hit this more frequently (as seems to be the case, anecdotally) we need to add at least rudimentary flow control here.

It is somewhat expected to hit this in general. To generate proposals and to put them into raft requires only disk reads, but getting them committed requires a durable write, including likely round-trips to followers (to obtain quorum). The former can be done at much higher throughput than the latter.

To Reproduce

Running the test in #99464 reproduces this in 100% of all runs I did, averaging around ~70 related log messages per run.

I also seem to hit this reliably with the unit test BenchmarkReplicaProposal (edit: not true, we're dropping proposals but not becaue of uncommitted entry size but because of no leader)

Expected behavior

Some kind of flow control, similar to the quota pool but locally. Ideally we would delay grabbing a latch for a write if we can tell that raft is not currently accepting more proposals. Maybe we want to avoid doing most of this buffering inside of RawNode anyway, and instead improve our proposal buffer to provide this backpressure signal instead. Unsure, needs more thought.

Additional data / screenshots

Adding better visibility into this issue in #100083.

Jira issue: CRDB-26247

Epic CRDB-39898

@tbg tbg added C-investigation Further steps needed to qualify. C-label will change. T-kv-replication labels Mar 30, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 30, 2023

cc @cockroachdb/replication

@blathers-crl
Copy link

blathers-crl bot commented Mar 30, 2023

Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Mar 30, 2023

cc @cockroachdb/replication

@erikgrinaker erikgrinaker added branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Mar 30, 2023
@tbg
Copy link
Member Author

tbg commented Mar 30, 2023

Surprise plot twist - I added logging that would fire whenever we'd get ErrProposalDropped from rawNode.Propose in the main proposal path. The logging never fired when the warning that triggered this investigation did. So I hacked a debug.PrintStack into the warning, lo and behold it's always from a "nonstandard" stack, via maybeUnquiesceAndWakeLeaderLocked and not via actual proposals to the raft group 😕

Details
W230330 15:16:11.208906 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 381  1 appending new entries to log would exceed uncommitted entry size limit; dropping proposal
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382  stack: ‹goroutine 216 [running]:›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹runtime/debug.Stack()›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    GOROOT/src/runtime/debug/stack.go:24 +0x65›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftLogger).Warningf(0xc003ac2780, {0x5c20e7d?, 0x0?}, {0xc0034601a0?, 0x13?, 0x13?})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft.go:92 +0x6b›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*raft).appendEntry(0xc001fab080, {0xc002fc37a0?, 0x1, 0x1})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:767 +0x178›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.stepLeader(0xc001fab080, {0x2, 0x0, 0x1, 0x0, 0x0, 0x0, {0xc002fc37a0, 0x1, 0x1}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1239 +0x256e›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*raft).Step(0xc001fab080, {0x2, 0x0, 0x1, 0x0, 0x0, 0x0, {0xc002fc37a0, 0x1, 0x1}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/raft.go:1156 +0xed5›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹go.etcd.io/raft/v3.(*RawNode).Propose(...)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    go.etcd.io/raft/v3/external/io_etcd_go_raft_v3/rawnode.go:89›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).maybeUnquiesceAndWakeLeaderLocked(0xc001609900)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft_quiesce.go:92 +0x358›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).withRaftGroupLocked(0xc001609900, 0x1?, 0xc002d2ad20?)›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1975 +0x347›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:753 +0x2fc›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹    github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:709 +0x17b›
W230330 15:16:11.209204 216 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r63/1:‹/Table/104{-/8/8359…}›] 382 +‹github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc0016a6000, 0xc000adbbf0?)›

This smells like a buglet. I think we propose a big batch of entries (which goes in, but now there's negative uncommitted size left), and then we for some reason - likely accidentally - hit the "unquiesce" path here

if r.mu.internalRaftGroup.BasicStatus().Lead == 0 {
// If we don't know the leader, unquiesce unconditionally. As a

and try to propose an empty entry, which gets rejected (because there is not a scrap of budget left). I wouldn't be surprised if we hit this path on each raft iteration. Possibly Lead is just not properly maintained in the "edge" case of a single-voter RawNode. That should be easy to verify.

@tbg
Copy link
Member Author

tbg commented Mar 30, 2023

Looking a bit more, we can only hit this unquiesce path if quiesced1 so it must be a little more subtle than that. How can the replica be quiesced if it refuses an empty entry (meaning it has uncommitted log entries)?

It should fail this check:

log.Infof(ctx, "not quiescing: proposals pending")
}
return nil, nil, false
}
// Don't quiesce if there's outstanding quota - it can lead to deadlock. This
// condition is largely subsumed by the upcoming replication state check,

or this check:

log.Infof(ctx, "not quiescing: raft ready")
}
return nil, nil, false
}
status := q.raftSparseStatusRLocked()

or this one:

log.Infof(ctx, "not quiescing: commit (%d) != lastIndex (%d)",
status.Commit, lastIndex)
}
return nil, nil, false
}
if len(pausedFollowers) > 0 {

You get the idea.

Given how easy this is to reproduce, probably best to slap it with a bit of printf and have another go.

Footnotes

  1. https://github.com/cockroachdb/cockroach/blob/76afb004ce71fac3c2bb19f33ef8c247e2775211/pkg/kv/kvserver/replica_raft_quiesce.go#L96-L97

tbg added a commit to tbg/cockroach that referenced this issue Mar 31, 2023
The bytes printed after "wrote" where the append bytes only, this was
confusing. Consolidate. Also, no need to print whether it's sync or
non-blocking-sync again because we already printed that in the timing section.

Found in cockroachdb#100096.

Epic: none
Release note: None
@tbg
Copy link
Member Author

tbg commented Mar 31, 2023

My current thinking focuses on this kind of logging, which we see frequently:

I230330 15:15:06.798493 197 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 231 raft ready handling: 1.10s [append=0.00s, apply=0.91s, commit-batch=0.00s, other=0.19s], wrote 49 B [apply=61 MiB (1)]; node might be overloaded

First of all, how can we have written 49 B when apply is 61 MiB (in a single entry no less)? The 49B is the append batch1 which is ~empty here; this is a logging buglet that I'll send a PR for.

Second, I think that because applying this giant 61MiB batch (this is not an outlier) takes ~1s, there is plenty of time for the tick loop to come around and decide to quiesce the range - this does not acquire raftMu, i.e. can happen while raft processing is ongoing. It turns out that what raft is tracking here are really "entries not applied on the leader" rather than uncommitted bytes. We decrement the counter in reaction to an MsgStorageApplyResp. Still, we only consider quiescing if there aren't any entries in the proposal map, and the entry should be in the map until the corresponding entry is applied, right? The answer is actually no, as I happen to know thanks to countless hours spent on CRDB-25287 - log application is weird in that it takes the entries out of the map at the beginning of application. But I still can't get it to pan out, because we also check that Commit = Apply.

What follows is my best attempt at a timeline (but there never is a point in time at which the group could quiesce while having uncommitted bytes):

  • proposal is pending
  • handling cycle 1: 60mb entry A and 60mb entry B arrives, are appended jointly to log, raft uncommitted size is now "very large" (far beyond budget) - raft will allow one MsgProp to overshoot and A and B are in the same one so they get in jointly.
  • handling cycle 2: A and B are marked as committed (i.e. can in principle be applied), but only A comes up for application because we limit2 the number of bytes that can be applied at any given point in time. The "uncommitted log" is still above threshold.
  • if a tick happens now, replica will not quiesce (because proposal is pending)
  • application of A begins
    • remove entry from proposals map
    • if a tick happens now, still can't quiesce (because Commit != Applied)
  • apply finishes
  • if a tick happens now, still can't quiesce because Commit != Applied (B is not applied)

Perhaps there are additional interleavings between tick and raft handling that I'm not seeing. It is not great that interleavings are even possible here. It would be a sight easier if quiescence happened on the raft goroutine.

Time for another run with more logging...

Footnotes

  1. https://github.com/cockroachdb/cockroach/blob/c2460f1b0fc94c8fcbd98273cec35061494fc2dd/pkg/kv/kvserver/replica_raft.go#L653

  2. https://github.com/cockroachdb/cockroach/blob/5e6698e5aa8d2904409a72e0fa174b98c8b89a66/pkg/base/config.go#L261-L262

@tbg
Copy link
Member Author

tbg commented Mar 31, 2023

Hmm, the additional logging paints the picture that there might be something wrong with the uncommitted size tracking.

First, we see the replica quiesce at index 33 (stack trace shows that this is, unsurprisingly, via a tick)

I230331 08:10:47.227906 200 kv/kvserver/replica_raft_quiesce.go:194 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 233  XXX SHOULD QUIESCE c=33 a=33 li=33

then, ~400ms later, it unquiesces (from the raft handling loop) and logs that it dropped the empty entry proposed as a result:

I230331 08:10:47.610231 205 kv/kvserver/replica_raft_quiesce.go:94 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›] 234  XXX UNQUIESCE li=33
W230331 08:10:47.610324 205 go.etcd.io/raft/v3/raft.go:767 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›] 235  1 appending new entries to log would exceed uncommitted entry size limit; dropping proposal

and then we append 30mb worth of entries, which for some reason is not dropped:

I230331 08:10:47.722485 205 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 237  raft ready handling: 0.11s [append=0.05s, apply=0.00s, commit-append-non-blocking-sync=0.06s, other=0.00s], wrote [append-batch=30 MiB, append-ent=30 MiB (1), ]; node might be overloaded

and which then immediately gets applied:

I230331 08:10:48.178266 203 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 238  raft ready handling: 0.45s [append=0.00s, apply=0.36s, commit-append-=0.00s, other=0.09s], wrote [append-batch=49 B, apply=30 MiB (1)]; node might be overloaded

The mystery is that we're refusing to append an empty entry but then, without applying anything, accept 30mb of entries. When we apply entries, we step the MsgStorageApplyResp directly into raft, so at the end of the last application cycle that preceded the quiesce, the uncommitted entry size ought to have been zero. And even if it hadn't, I don't know of anything that would've reset the uncommitted entry size between the rejected proposal and the subsequently successful one.

Needs more digging, probably via a raft fork that dumps extra info while we have uncommitted entry size tracked.

@pav-kv
Copy link
Collaborator

pav-kv commented Apr 6, 2023

The mystery is that we're refusing to append an empty entry but then, without applying anything, accept 30mb of entries.

Can this be due to etcd-io/raft#11? The uncommitted log size can be reset upon election.

UPD: enabled logging and did not see "campaigning" messages around these dropped proposals and appends.

@pav-kv
Copy link
Collaborator

pav-kv commented Apr 6, 2023

@tbg

and then we append 30mb worth of entries, which for some reason is not dropped:

I230331 08:10:47.722485 205 kv/kvserver/store_raft.go:655 ⋮ [T1,n1,s1,r62/1:‹/{Table/61-Max}›,raft] 237  raft ready handling: 0.11s [append=0.05s, apply=0.00s, commit-append-non-blocking-sync=0.06s, other=0.00s], wrote [append-batch=30 MiB, append-ent=30 MiB (1), ]; node might be overloaded

This doesn't look like a new 30 MB proposal being accepted. These 30 MB is probably what the uncommitted buffer contained, and the reason why it rejected the unquiescence proposal. What we see in this log message is that the 30 MB finally gets flushed to the storage (via raft Ready handling).

tbg added a commit to tbg/cockroach that referenced this issue Apr 12, 2023
Pass the proposals corresponding to the `ents` slice
into `proposeBatch`. Log into each proposal's context
and also log a message whenever we're dropping proposals
on the floor.

See cockroachdb#100096.

Epic: none
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Apr 14, 2023
The bytes printed after "wrote" where the append bytes only, this was
confusing. Consolidate. Also, no need to print whether it's sync or
non-blocking-sync again because we already printed that in the timing section.

Found in cockroachdb#100096.

Epic: none
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Apr 14, 2023
Pass the proposals corresponding to the `ents` slice
into `proposeBatch`. Log into each proposal's context
and also log a message whenever we're dropping proposals
on the floor.

See cockroachdb#100096.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Apr 14, 2023
100270: kvserver: touch up raft ready handling log r=erikgrinaker a=tbg

The bytes printed after "wrote" where the append bytes only, this was
confusing. Consolidate. Also, no need to print whether it's sync or
non-blocking-sync again because we already printed that in the timing section.

Found in #100096.

Epic: none
Release note: None


Co-authored-by: Tobias Grieger <[email protected]>
@tbg
Copy link
Member Author

tbg commented Apr 18, 2023

I think I understand this better now. This test writes lots of large blobs, and leaves enough time between blobs to allow the range to quiesce.

So we start with a quiesced replica, which now gets a ~50MiB proposal. First, this goes into the proposal buffer, which queues an update check:

func (b *propBuf) insertIntoArray(p *ProposalData, idx int) {
b.arr.asSlice()[idx] = p
if idx == 0 {
// If this is the first proposal in the buffer, schedule a Raft update
// check to inform Raft processing about the new proposal. Everyone else
// can rely on the request that added the first proposal to the buffer
// having already scheduled a Raft update check.
b.p.enqueueUpdateCheck()
}
}
func (b *propBuf) flushRLocked(ctx context.Context) error {

This triggers raft processing, which calls withRaftGroupLocked (which in turn passes a closure that flushes the proposal buffer):

https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_raft.go#L749-L752

Peeking into withRaftGroupLocked, we see that it invokes the closure before unquiescing:

unquiesce, err := func(rangeID roachpb.RangeID, raftGroup *raft.RawNode) (bool, error) {
return f(raftGroup)
}(r.RangeID, r.mu.internalRaftGroup)
if r.mu.internalRaftGroup.BasicStatus().Lead == 0 {
// If we don't know the leader, unquiesce unconditionally. As a
// follower, we can't wake up the leader if we don't know who that is,
// so we should find out now before someone needs us to unquiesce.
//
// This situation should occur rarely or never (ever since we got
// stricter about validating incoming Quiesce requests) but it's good
// defense-in-depth.
//
// Note that maybeUnquiesceAndWakeLeaderLocked won't manage to wake up the
// leader since it's unknown to this replica, and at the time of writing the
// heuristics for campaigning are defensive (won't campaign if there is a
// live leaseholder). But if we are trying to unquiesce because this
// follower was asked to propose something, then this means that a request
// is going to have to wait until the leader next contacts us, or, in the
// worst case, an election timeout. This is not ideal - if a node holds a
// live lease, we should direct the client to it immediately.
unquiesce = true
}
if unquiesce {
r.maybeUnquiesceAndWakeLeaderLocked()
}

So we first flush the proposal buffer, then unquiesce. But flushing a 50MiB proposal from the buffer into the unstable raft log will consume the unstable log budget in raft. So by the time we unquiesce, there is no more space and the unquiesce's append gets rejected. This isn't a problem - after all, if there's unstable log, raft already needs to distribute something to the followers and is thus going to wake them up.

I don't think there's anything new here in this cycle other than that we made the copy roachtest more aggressive and able to hit this reliably (or maybe it always hit this and we just never looked at the logs).

I tried out this little patch which proposes a "true" empty command (instead of a nonempty command containing a "nil" CRDB payload) - true empty commands are exempt from uncommitted log size tracking - like this:

@@ -89,11 +88,10 @@ func (r *Replica) maybeUnquiesceAndWakeLeaderLocked() bool {
        r.store.unquiescedReplicas.Unlock()
        r.maybeCampaignOnWakeLocked(ctx)
        // Propose an empty command which will wake the leader.
-       data := raftlog.EncodeRaftCommand(raftlog.EntryEncodingStandardWithoutAC, makeIDKey(), nil)
-       _ = r.mu.internalRaftGroup.Propose(data)
+       _ = r.mu.internalRaftGroup.Propose(nil /* data */)
        return true
 }

and voila, I'm (unsurprisingly) no longer seeing the messages. I'll send a PR for that patch.

@aliher1911 I believe you mentioned also having seen this message "randomly". Do you have something I can look at? Maybe there are multiple things going on.

tbg added a commit to tbg/cockroach that referenced this issue Apr 18, 2023
We used to unquiesce via a "noop" but not nil log entry but it turns out that
it can happen that raft is out of budget for nontrivial log entries when
unquiescing. So, use a nil one which is identical to what raft proposes when
leadership changes.

Closes cockroachdb#100096.

Epic: none
Release note: None
@tbg tbg removed the GA-blocker label Apr 18, 2023
@tbg
Copy link
Member Author

tbg commented Apr 18, 2023

Chatted with Erik, removing GA-blocker since there's nothing new/bad here. Will check in with Oleg when he's back to see where else he saw this message.

craig bot pushed a commit that referenced this issue Jun 20, 2023
100083: kvserver: record metrics for ErrProposalDropped r=pavelkalinnikov a=tbg

Touches #100096.

Epic: none
Release note: None


105093: sql: use datum alloc for crdb_internal stmt stats rows r=dt a=dt

Happened to observe a cluster running a customer test suite which included a query that inspected stmt stats often, causing the CRDB node to spend a considerable amount of CPU time in production of the stmt stats vtable, in particular allocating (and then GC'ing) individual datums, especially given how wide this table has become with the addition of storage stats.

This change uses a datum allocator to produce those rows to reduce the number of separate allocations from the runtime.

Release note: none.
Epic: none.

105197: statusccl: skip flaky TenantStatusAPI tests r=zachlite a=zachlite

Informs #92382, #99770, #99559
Epic: none
Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Zach Lite <[email protected]>
@exalate-issue-sync exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-investigation Further steps needed to qualify. C-label will change. T-kv KV Team
Projects
No open projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

3 participants