-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(redb-store): Optimization for small file import in redb store #2062
Conversation
Hash HashAndFormat BlobFormat
Use Self::POSTCARD_MAX_SIZE for fixed size
...to allow for stores that use io even for the lookup, such as a redb store. Also use iroh_base::Hash instead of blake3::Hash, and make gc delete use batches.
# Conflicts: # iroh-bytes/src/get/db.rs # iroh/src/downloader/get.rs
# Conflicts: # iroh/src/client.rs # iroh/src/node.rs # iroh/src/rpc_protocol.rs
Also align existing comments with reality
The split is now an impl detail of the batch writer
# Conflicts: # iroh-bytes/src/store/file.rs
## Description refactor(iroh-bytes): Weak entry map Get rid of some of the drop complexity by storing a weak handle map. The handles themselves have a strong reference, so they are still easy to use. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [ ] Self-review. - [ ] Documentation updates if relevant. - [ ] Tests if relevant.
# Conflicts: # iroh-bytes/src/store/bao_file.rs # iroh-bytes/src/store/file.rs # iroh-bytes/src/store/file/tests.rs # iroh-bytes/src/store/traits.rs
So of course there is a downside. What you report as true is always up to 500ms or whatever ahead of what is actually persisted. I feel that this is acceptable in this case, but you need to be aware of it. Especially if we use the same approach for the doc store, you could get any combination of the doc store being ahead of the blob store or vice versa. So the doc store must not assume any entry state. E.g. doc store inserts some data, reports entry complete, crash after 10ms, entry is no longer complete. |
# Conflicts: # iroh-cli/src/commands/blob.rs # iroh-cli/src/commands/doctor.rs # iroh-cli/src/commands/start.rs # iroh-cli/tests/cli.rs # iroh/Cargo.toml
# Conflicts: # iroh-bytes/src/store/file.rs
# Conflicts: # iroh-cli/src/commands/start.rs # iroh/examples/rpc.rs # iroh/src/node.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are all addressed, so approving this!
If there are corner cases or unexpected situations, we hopefully catch them during dogfooding, so better to get this in now than later IMO.
Edit: Needs rebase
# Conflicts: # iroh-bytes/src/store/file.rs # iroh-bytes/src/store/file/import_flat_store.rs # iroh-bytes/src/store/file/tables.rs # iroh-bytes/src/store/file/test_support.rs # iroh-bytes/src/store/file/tests.rs # iroh-bytes/src/store/file/util.rs # iroh-bytes/src/store/file/validate.rs # iroh/tests/gc.rs
Description
Optimization for small file import in redb store.
This batches writes and reads up to some delay and vastly improves performance especially for writes. Linux kernel import goes from minutes to seconds.
Notes & open questions
Currently delete is just a normal operation. I think it should be changed to a toplevel operation because crashing with a delete transaction being not committed would lead to inconsistent state: files would be gone on disk and there in the db.File deletions of any of the "owned" files are now handled outside the transaction. That requires keeping track of deletes and undeletes within a transaction. E.g. if within a single transaction it is determined that hash 0xabcd is to be deleted, it gets marked for deletion, then it gets created again. We must not delete the file at the end of the transaction.
Change checklist