Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding checksum to each page #492

Open
cenkalti opened this issue May 15, 2023 · 7 comments
Open

Adding checksum to each page #492

cenkalti opened this issue May 15, 2023 · 7 comments

Comments

@cenkalti
Copy link
Member

This may help detecting corruptions. I'm creating this issue to discuss:

  • feasibility
  • backwards compatibility
  • performance
    implications.
@cenkalti
Copy link
Member Author

Copying my comment from #505 (comment).


There are many sources of corruption that we cannot control:

  • Faulty hardware (disk, memory, CPU)
  • Unsafe filesystems, misconfiguration
  • Other faulty/malicious processes on the same host
  • Cosmic rays

Most databases put checksum to each page.

@cenkalti cenkalti self-assigned this May 19, 2023
@serathius
Copy link
Member

serathius commented May 19, 2023

This goes into Merkele tree proposal.
For etcd project we would like to have a mechanism for fast calculation of checksum on any range of keys. So it's not only needs to be calculated per page but also for subtree https://en.wikipedia.org/wiki/Merkle_tree.

If someone is interested in this issue, please reachout to me, I'm happy to provide requirements and help with the design.

@cenkalti
Copy link
Member Author

cenkalti commented May 19, 2023

Merkle-tree makes sense where content is mostly read-only such as Bittorrent V2 or IPFS but I don't recommend using it for a database where write/read ratio can be high. With Merkle-tree, updating a leaf page would cause updating every page up to the root, causing very high write amplification which affects performance dramatically.

My proposal is about ensuring physical integrity of database pages at low level (protection against faulty hardware, bitflips, etc.).

etcd's problem looks related but not exactly overlaps with this bbolt's problem.

  • bbolt needs a mechanism for detecting corruption in any part of the database.
  • etcd needs a mechanism for having an identifier to represent the state of database in time.

Because etcd uses FreelistMapType (randomized iteration of maps in Go), replicas already have different B+ trees on disk which makes it impossible to use Merkle-tree hashes in page level. etcd's problem should be addressed in key space rather than page level.

@serathius
Copy link
Member

serathius commented May 19, 2023

With Merkle-tree, updating a leaf page would cause updating every page up to the root, causing very high write amplification which affects performance dramatically.

Correct me if I'm wrong, but as a MVCC bbolt is already updating every page up the root during a write?

My proposal is about ensuring physical integrity of database pages at low level (protection against faulty hardware, bitflips, etc.).

Database integrity not only depends on integrity of data, but also on integrity of executed instructions. As terrifying as it seems a bitflip can happen in your if branch logic and change execution path. :P

Because etcd uses FreelistMapType (randomized iteration of maps in Go), replicas already have different B+ trees on disk which makes it impossible to use Merkle-tree hashes in page level.

Looks like a blocker for implementing it on bbolt level. Still happy to collaborate and work on design to address all the action items in https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md#action-items

@cenkalti
Copy link
Member Author

Correct me if I'm wrong, but as a MVCC bbolt is already updating every page up the root during a write?

You're right. I was wrong about that.

@ahrtr
Copy link
Member

ahrtr commented May 20, 2023

Adding checksum needs super careful consideration. The impact on performance is also a big concern to consider. It needs an overall design, recovery is the point, so it must be considered in the first place. Note that the existing Check can already detect most of the data corruption.

Again, I'd like us to spend more effort to analyze all the existing data corruption cases before we move on adding any new big features; otherwise, it will add more technical debts.

@github-actions github-actions bot added the stale label Apr 17, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024
@serathius serathius reopened this May 9, 2024
@serathius
Copy link
Member

@serathius serathius removed the stale label May 9, 2024
@github-actions github-actions bot added the stale label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants