-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: record metrics for ErrProposalDropped #100083
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
0bcffbd
to
7329261
Compare
7329261
to
8f57211
Compare
8f57211
to
2833d47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumping some comments for now. The new version seems confusing too at the first glance. Proposed a simpler solution, if I understood the problem correctly. PTAL.
props []*ProposalData, // must match ents slice | ||
) error { | ||
if len(ents) != len(props) { | ||
return errors.AssertionFailedf("ents and props don't match up: %v and %v", ents, props) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error would contain quite a lot of data, including proposals. How about only reporting len(ents)
and len(props)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe - but it's a clear programming error, so I don't want to go crazy and definitely don't want to under-report information, in case the error is rare. I'll leave as is unless you feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how far up the stack this error can make, whether it can make into text logs or get outside the process. It can contain user data, right? So that's a bit sensitive. Maybe err on the safe side, and include only necessary info in the error message. See a similar panic
in maybeDeductFlowTokens
.
Starting to pick this back up, no need to re-review yet so I converted to draft. I'll reach out when it's ready. |
098c6e7
to
5f00a81
Compare
@pavelkalinnikov this is now ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clean PR, thanks.
Enables a future change. Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
With noop impl for now. Epic: none Release note: None
One step to removing repetitive handling at the caller. Epic: none Release note: None
We'll pass the small part into `proposeBatch`. Epic: none Release note: None
Epic: none Release note: None
The impl is still a no-op, so this change is a no-op, too. Epic: none Release note: None
This counts the number of entries raft drops on the floor. There are two reasons for this: - uncommitted log size exceeded - no leader known They're not currently discriminated. Either way, we want neither of these to occur, but we know from experience that they do. This metric will give us a sense of the frequency and allow us to more conclusively correlate with splits, etc. Epic: none Release note: None
Epic: none Release note: None
This allows us to discriminate between MsgProp that are dropped because no leader is known vs those for which the leader is unwilling to accept more proposals (uncommitted log size being the relevant reason today[^1]). [^1]: raft will also drop conf changes if a pending conf change is still pending, but since our conf changes are raft-inflight exactly when the RangeDescriptor on the Replica has an intent, we don't hit this case in CRDB. Epic: none Release note: none
a6dfcb1
to
d0f7aff
Compare
bors r=pavelkalinnikov |
Build failed (retrying...): |
bors r- |
Canceled. |
d0f7aff
to
d163b05
Compare
bors r=pavelkalinnikov Removing the |
Build succeeded: |
Touches #100096.
Epic: none
Release note: None