Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: record metrics for ErrProposalDropped #100083

Merged
merged 15 commits into from
Jun 20, 2023

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 30, 2023

Touches #100096.

Epic: none
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Mar 30, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg force-pushed the log-err-prop-dropped branch from 0bcffbd to 7329261 Compare April 12, 2023 13:37
@tbg tbg marked this pull request as ready for review April 12, 2023 13:44
@tbg tbg requested a review from a team April 12, 2023 13:44
@tbg tbg force-pushed the log-err-prop-dropped branch from 7329261 to 8f57211 Compare April 12, 2023 13:57
@tbg tbg force-pushed the log-err-prop-dropped branch from 8f57211 to 2833d47 Compare April 14, 2023 10:56
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumping some comments for now. The new version seems confusing too at the first glance. Proposed a simpler solution, if I understood the problem correctly. PTAL.

props []*ProposalData, // must match ents slice
) error {
if len(ents) != len(props) {
return errors.AssertionFailedf("ents and props don't match up: %v and %v", ents, props)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error would contain quite a lot of data, including proposals. How about only reporting len(ents) and len(props)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe - but it's a clear programming error, so I don't want to go crazy and definitely don't want to under-report information, in case the error is rare. I'll leave as is unless you feel strongly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how far up the stack this error can make, whether it can make into text logs or get outside the process. It can contain user data, right? So that's a bit sensitive. Maybe err on the safe side, and include only necessary info in the error message. See a similar panic in maybeDeductFlowTokens.

pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal_buf.go Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
@tbg tbg force-pushed the log-err-prop-dropped branch from 2833d47 to 263ef05 Compare June 9, 2023 12:16
@tbg tbg requested a review from a team as a code owner June 9, 2023 14:14
@tbg tbg marked this pull request as draft June 9, 2023 14:17
@tbg tbg changed the title kvserver: log on ErrProposalDropped kvserver: record metrics for ErrProposalDropped Jun 9, 2023
@tbg
Copy link
Member Author

tbg commented Jun 9, 2023

Starting to pick this back up, no need to re-review yet so I converted to draft. I'll reach out when it's ready.

@tbg tbg force-pushed the log-err-prop-dropped branch from 098c6e7 to 5f00a81 Compare June 12, 2023 07:15
@tbg tbg marked this pull request as ready for review June 12, 2023 12:13
@tbg tbg requested a review from pav-kv June 12, 2023 12:13
@tbg
Copy link
Member Author

tbg commented Jun 12, 2023

@pavelkalinnikov this is now ready.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean PR, thanks.

pkg/kv/kvserver/replica_proposal_buf.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/metrics.go Show resolved Hide resolved
tbg added 6 commits June 20, 2023 16:42
Enables a future change.

Epic: none
Release note: None
Epic: none
Release note: None
Epic: none
Release note: None
Epic: none
Release note: None
Epic: none
Release note: None
Epic: none
Release note: None
tbg added 8 commits June 20, 2023 16:42
With noop impl for now.

Epic: none
Release note: None
One step to removing repetitive handling at the caller.

Epic: none
Release note: None
We'll pass the small part into `proposeBatch`.

Epic: none
Release note: None
The impl is still a no-op, so this change is a no-op, too.

Epic: none
Release note: None
This counts the number of entries raft drops on the floor. There are two reasons for this:

- uncommitted log size exceeded
- no leader known

They're not currently discriminated. Either way, we want neither of these to
occur, but we know from experience that they do. This metric will give us a
sense of the frequency and allow us to more conclusively correlate with splits,
etc.

Epic: none
Release note: None
Epic: none
Release note: None
This allows us to discriminate between MsgProp that are dropped because no
leader is known vs those for which the leader is unwilling to accept more
proposals (uncommitted log size being the relevant reason today[^1]).

[^1]: raft will also drop conf changes if a pending conf change is still
pending, but since our conf changes are raft-inflight exactly when the
RangeDescriptor on the Replica has an intent, we don't hit this case in CRDB.

Epic: none
Release note: none
@tbg tbg force-pushed the log-err-prop-dropped branch from a6dfcb1 to d0f7aff Compare June 20, 2023 14:46
@tbg
Copy link
Member Author

tbg commented Jun 20, 2023

bors r=pavelkalinnikov
TFTR! Addressed your comments in the affirmative.

@craig
Copy link
Contributor

craig bot commented Jun 20, 2023

Build failed (retrying...):

@tbg
Copy link
Member Author

tbg commented Jun 20, 2023

bors r-

@craig
Copy link
Contributor

craig bot commented Jun 20, 2023

Canceled.

@tbg tbg force-pushed the log-err-prop-dropped branch from d0f7aff to d163b05 Compare June 20, 2023 15:02
@tbg
Copy link
Member Author

tbg commented Jun 20, 2023

bors r=pavelkalinnikov

Removing the dropped param set off the linter since it now understood an unhandled error it didn't before.

@craig
Copy link
Contributor

craig bot commented Jun 20, 2023

Build succeeded:

@craig craig bot merged commit 91f0de4 into cockroachdb:master Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants