Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: make the entries encoding maintainable #131561

Open
pav-kv opened this issue Sep 28, 2024 · 2 comments
Open

raft: make the entries encoding maintainable #131561

pav-kv opened this issue Sep 28, 2024 · 2 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@pav-kv
Copy link
Collaborator

pav-kv commented Sep 28, 2024

Entries in raft contain the user-defined commands in a []byte slice, which our application layer has freedom to define. Today, we have the ever-growing encodings list + ad-hoc entry entry.Data parsing sprinkled in random places. This gets harder to maintain and reason about.

The pattern is that the first byte of Data contains the "encoding", and we then parse the entry differently, depending on this byte. We often want to sneak peek into the entry's "header" without parsing it entirely, because in most cases it is a protobuf with non-zero unmarshaling cost.

It would be great to replace this ad-hoc encoding with a clean/maintainable solution that allows:

  1. Zero/low cost of unmarshaling.
  2. Especially in the cases when we only need to check certain things about the entry, e.g. "is this a sideloaded entry?".

One approach to this is replacing the ad-hoc encoding with flatbuffers. We would treat the raftpb.Entry.Data field as a flatbuffer, with a well-defined / maintainable schema. It would be composed of “header” + data. The header would contain things like:

  • “is sideloaded” bit for routing the entries load/store to the right sub-storage
  • “sideloaded size”, for size accounting in log truncations stack (at the moment it needs to look at the file to know its size, which is inconvenient and e.g. the decoupled log trunc stack omits sideloaded files accounting in some cases)
  • the AC/RACv2 headers
  • maybe more

Then all the header checks (like “do we have a RACv2 header?”) would be cheap. As a bonus, we would have no allocations in the entries unmarshaling stack.

As a flip-side, this doesn’t integrate super well with protos, so maybe we would need some wrappers around these flatbuffers to convert to custom types or protos. Though at the scale of just one entries type this doesn’t sound too bad, and is probably better anyway than all the bug-prone parsing we have today. Another flip-side is that this requires a one-time migration that overwrites all raft logs.

However, this logs scan/migration is an opportunity to:

Jira issue: CRDB-42604

@pav-kv pav-kv added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Sep 28, 2024
@sumeerbhola
Copy link
Collaborator

Another flip-side is that this requires a one-time migration that overwrites all raft logs.

That seems worrisome.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Nov 18, 2024

Yes, but:

  • it pays down some tech debt
  • it reduces cost
  • we have some other work that may require a raft log migration, e.g. ones mentioned in the issue description and kvserver: investigate SingleDelete for raft log truncation #8979. So this migration cost could be paid once if timed with other changes.
  • raft logs are small since they are compacted eagerly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants