-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat: iroh-sync #1216
Closed
Closed
[WIP] feat: iroh-sync #1216
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Frando
changed the title
[WIP] feat: integration of sync and bytes
[WIP] feat: integrate sync with iroh-bytes
Jul 12, 2023
Frando
force-pushed
the
sync-gossip-bytes
branch
from
July 12, 2023 15:29
7c62c02
to
6a81447
Compare
iroh-sync/src/ranger.rs
Outdated
let mut out = Vec::new(); | ||
|
||
// TODO: can these allocs be avoided? | ||
let mut items = Vec::new(); | ||
let mut inserted = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems potentially expensive, should probably be a callback instead
Frando
force-pushed
the
sync-gossip-bytes
branch
from
July 26, 2023 21:46
c4b38e1
to
088f36e
Compare
Frando
changed the title
[WIP] feat: integrate sync with iroh-bytes
[WIP] feat: iroh-sync
Jul 26, 2023
Frando
force-pushed
the
sync-gossip-bytes
branch
3 times, most recently
from
July 28, 2023 16:49
de61c04
to
22ad89a
Compare
Frando
force-pushed
the
Frando/gossip
branch
2 times, most recently
from
August 4, 2023 13:40
44ae5a6
to
19eef0f
Compare
* removes content support from iroh-sync * adds a quick-and-dirty writable database to iroh-bytes (will be replaced with a better generic writable database soon) * adds a `Downloader` to queue get requests for individual hashes from individual peers * adds a `BlobStore` that combines the writable db with the downloader * adds a `Doc` abstraction that combines an iroh-sync `Replica` with a `BlobStore` to download content from peers on-demand * updates the sync repl example to plug it all together * also adds very basic persistence to `Replica` (encode to byte string) and uses this in the repl example
* make the REPL in the sync example work properly with rustyline for editing and reading input, shell-style argument parsing and clap for parsing commands * add a docs store for opening and closing docs * add author to doc struct
* start refactoring store into its own module * implement more details * works again * draft fs db and integrate error handling * fill out more of the implemenation * lifetime sadness * self referential fight: Rust 0 - Dig 1 * basic tests and range fixes * introduce Store trait and update tests to test against both impls * implement remove * integrate new storage into the example * implement iterators * fixes and more tests * clippy and deny cleanup
Frando
force-pushed
the
sync-gossip-bytes
branch
from
August 7, 2023 10:59
594c54a
to
7c6cb35
Compare
This was referenced Aug 7, 2023
[WIP] feat(iroh-sync): download from peer that informed us about a change
34 tasks
This PR is now outdated because it is not rebased. Closing in favor of #1333 . |
This was referenced Aug 15, 2023
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 24, 2023
## Description This PR adds `iroh-sync`, a document synchronization protocol, to iroh, and integrates with `iroh-net`, `iroh-gossip` and `iroh-bytes`. * At the core is the `iroh-sync` crate, with a set reconciliation algorithm implemented by @dignifiedquire. See [the old iroh-sync repo](https://github.com/n0-computer/iroh-sync/) for the prehistory and #1216 for the initial PR (fully included in this PR, and by now outdated) * Iroh sync is integrated in the iroh node, with iroh-gossip, in the RPC interface, and the CLI. * `LiveSync` is the handle to an actor that integrates sync with [gossip](#1149 ) to broadcast and receive document updates from peers. For each open document a gossip swarm is joined with a `TopicId` derived from the doc namespace. * mod `download` contains the new downloader. It will be improved in #1344 . * mod `client` is the new high-level RPC client. It currently only has methods for dealing with docs and sync, other things should be added once we merged this. CLI commands for sync are in `commands/sync.rs`. Will be much better with #1356 . * `examples/sync.rs` has a REPL to modify and sync docs. It does a full setup without using the iroh console. Also includes code to sync directories, and a hammer command for load testing. * The PR also introduces `iroh::client::Iroh`, a wrapper around the RPC client, and `iroh::client::Doc`, a wrapper around RPC client for a single document ## Notes & open questions #### Should likely happen before merge: * [x] Make `iroh_sync::Store:::list_authors` and `list_replicas` return iterators `iroh-sync` *fixed in #1366 * * [ ] Add `iroh_sync::Store::close_replica` * [x] `ContentStatus` in `on_insert` callback is reported as `Ready` if the content is still `baomap::PartialEntry` (in-process download) *fixed in a8e8093* #### Can happen after merge, but before `0.6` release * [ ] Implement `AuthorImport` and `AuthorShare` RPC & CLI commands * [ ] sync store `list_namespaces` and `list_authors` internally collect, return iterator instead * [ ] Fix cross-compiles to arm/android. See cross-rs/cross#1311 * [ ] Ensure that fingerpring calculation is efficient and/or cached for large documents. Currently calculating the initial fingerprint iterates once over all entries in a document. * [ ] Make content downloads be more reliable * [ ] Add some way to download content from peers independent of the first insertion event for a remote entry. The downloader with retries is tracked in #1334 and 1344, but independent of that, we still would currently only ever try to queue a download when the `on_insert` callback triggers, which is only once. There should be a way, even if manual for now, to try to download missing content in a replica from peers. * [ ] during `iroh-sync` sync include info if content is available for each entry * [ ] Add basic peer management and persistence. Currently live sync will stop to do anything after a node restart. * [ ] Persist the addressbook of peers for a document, to reconnect when restarting the node * [ ] Implement `PeerAdd` and `PeerList` RPC & CLI commands. The latter needs changes in `iroh-net` to expose information of currently-connected peers and their peer info. * [ ] Make read-only replicas possible * [ ] Improve reliablity of replica sync. * sync is triggered on each `NeighborUp` event from gossip. check that we don't sync too much. * maybe include peer info in gossip messages, to queue syncs with those (but not all at once) * track and exchange the timestamp of last full sync for peers, to know if you missed gossiped message and react accordingly * add more tests with peers coming and leaving #### Open questions * [ ] `iroh_sync::EntrySignature` should the signatures include a namespace prefix? * [ ] do we want the 1:1 mapping of `NamespaceId`and gossip `TopicId`, or would the topic id as a hash be better? #### Other TODOs collected from the code * [ ] Port `hammer` and `fs` commands from REPL example to iroh cli * [ ] docs/naming: settle terminology about keypairs, private/secret/signing keys, public keys/identifiers and make docs and symbols consistent * [ ] Make `bytes_get` streaming in the RPC interface * [ ] Allow to unset the subscription on a replica * [ ] `iroh-sync` shouldn't depend on `iroh-bytes` only for `Hash` type -> #1354 * [ ] * [ ] Move `sync::live::PeerSource` to iroh-net or even better -> #1354 * [ ] `StoreInstance::put` propagate error and verify timestamp is reasonable. * [ ] `StoreInstance::get_range` implement inverted range * [ ] `iroh_sync`: Remove some items only used in tests (marked with #[cfg(test)]) * [ ] `iroh_sync` fs store: verify get method fetches all keys with this namespace * [ ] `ranger::SimpleStore::get_range`: optimize * [ ] `ranger::Peer` avoid allocs? * [ ] `fs::StoreInstance::get_fingerprint` optimize * [ ] `SyncEngine::doc_subscribe` remove unwrap, handle error ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [ ] Tests if relevant. --------- Co-authored-by: dignifiedquire <[email protected]> Co-authored-by: Asmir Avdicevic <[email protected]> Co-authored-by: Kasey <[email protected]>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 24, 2023
## Description This adds a REPL to the main `iroh` binary. It builds upon #1216. * The REPL embeds all commands that operate via the RPC client - which, currently, is everything but `provide` (which should be called `start`), `get` and `doctor`. * The REPL has two new, REPL-only commands: `set-doc` and `set-author`. `set-doc` changes the state of the REPL: author and document will be displayed above the input line, and the document commands (`set`, `get`, `list` , `share` etc) will be available top-level. * Most of the changes in `src/commands.rs` only move code and the `rpc_port` option around, the individual commands are not changed in this PR. ## Notes & open questions * The document and author IDs in the `list` commands are currently printed in hex. Should change to base32. * I'm not yet super sure about the `set-doc` and `set-author` commands. Another path might be to dig more into a pwd-like structure and have a `cd` command or so. This could then also move further into documents. I'm not sure how the author fits in here. * When in the level of a document, there's a conflict between the `doc list` command (which is just `list` then) and the top-level `list` command (to list blobs and collections). Not sure yet what the solution is. For now I embedded only the `sync` commands on the doc level, but I think I'd prefer to have the global set of commands not change between levels. Maybe we just rename the top-level `list` command to `blobs` and group the other blob-related commands (`add', a tbd `get` via RPC, possibly `export`) there. * The REPL embeds all the existing RPC CLI commands. For this I changed the structure of the `src/cli/commands.rs` to a) split between `RpcCommands` and `FullCommands` , the latter are the ones that start an actual iroh node (plus doctor, wasn't sure about that for now). The former all work atop the RPC client. For them to not create a new RPC client for each REPL command, I moved the `--rpc-port` option to the top level. This is not super correct, because it does not apply to `provide`, `get`, `doctor`. Clap does not allow to scope an argument to a set of subcommands by default, see [this discussion](clap-rs/clap#5070 (comment)). Still thinking about what the cleanest solution is. ## Change checklist - [ ] Self-review. - [ ] Documentation updates if relevant. - [ ] Tests if relevant. --------- Co-authored-by: dignifiedquire <[email protected]> Co-authored-by: Asmir Avdicevic <[email protected]> Co-authored-by: Kasey <[email protected]> Co-authored-by: Brendan O'Brien <[email protected]>
matheus23
pushed a commit
that referenced
this pull request
Nov 14, 2024
## Description This PR adds `iroh-sync`, a document synchronization protocol, to iroh, and integrates with `iroh-net`, `iroh-gossip` and `iroh-bytes`. * At the core is the `iroh-sync` crate, with a set reconciliation algorithm implemented by @dignifiedquire. See [the old iroh-sync repo](https://github.com/n0-computer/iroh-sync/) for the prehistory and #1216 for the initial PR (fully included in this PR, and by now outdated) * Iroh sync is integrated in the iroh node, with iroh-gossip, in the RPC interface, and the CLI. * `LiveSync` is the handle to an actor that integrates sync with [gossip](#1149 ) to broadcast and receive document updates from peers. For each open document a gossip swarm is joined with a `TopicId` derived from the doc namespace. * mod `download` contains the new downloader. It will be improved in #1344 . * mod `client` is the new high-level RPC client. It currently only has methods for dealing with docs and sync, other things should be added once we merged this. CLI commands for sync are in `commands/sync.rs`. Will be much better with #1356 . * `examples/sync.rs` has a REPL to modify and sync docs. It does a full setup without using the iroh console. Also includes code to sync directories, and a hammer command for load testing. * The PR also introduces `iroh::client::Iroh`, a wrapper around the RPC client, and `iroh::client::Doc`, a wrapper around RPC client for a single document ## Notes & open questions #### Should likely happen before merge: * [x] Make `iroh_sync::Store:::list_authors` and `list_replicas` return iterators `iroh-sync` *fixed in #1366 * * [ ] Add `iroh_sync::Store::close_replica` * [x] `ContentStatus` in `on_insert` callback is reported as `Ready` if the content is still `baomap::PartialEntry` (in-process download) *fixed in a8e8093* #### Can happen after merge, but before `0.6` release * [ ] Implement `AuthorImport` and `AuthorShare` RPC & CLI commands * [ ] sync store `list_namespaces` and `list_authors` internally collect, return iterator instead * [ ] Fix cross-compiles to arm/android. See cross-rs/cross#1311 * [ ] Ensure that fingerpring calculation is efficient and/or cached for large documents. Currently calculating the initial fingerprint iterates once over all entries in a document. * [ ] Make content downloads be more reliable * [ ] Add some way to download content from peers independent of the first insertion event for a remote entry. The downloader with retries is tracked in #1334 and 1344, but independent of that, we still would currently only ever try to queue a download when the `on_insert` callback triggers, which is only once. There should be a way, even if manual for now, to try to download missing content in a replica from peers. * [ ] during `iroh-sync` sync include info if content is available for each entry * [ ] Add basic peer management and persistence. Currently live sync will stop to do anything after a node restart. * [ ] Persist the addressbook of peers for a document, to reconnect when restarting the node * [ ] Implement `PeerAdd` and `PeerList` RPC & CLI commands. The latter needs changes in `iroh-net` to expose information of currently-connected peers and their peer info. * [ ] Make read-only replicas possible * [ ] Improve reliablity of replica sync. * sync is triggered on each `NeighborUp` event from gossip. check that we don't sync too much. * maybe include peer info in gossip messages, to queue syncs with those (but not all at once) * track and exchange the timestamp of last full sync for peers, to know if you missed gossiped message and react accordingly * add more tests with peers coming and leaving #### Open questions * [ ] `iroh_sync::EntrySignature` should the signatures include a namespace prefix? * [ ] do we want the 1:1 mapping of `NamespaceId`and gossip `TopicId`, or would the topic id as a hash be better? #### Other TODOs collected from the code * [ ] Port `hammer` and `fs` commands from REPL example to iroh cli * [ ] docs/naming: settle terminology about keypairs, private/secret/signing keys, public keys/identifiers and make docs and symbols consistent * [ ] Make `bytes_get` streaming in the RPC interface * [ ] Allow to unset the subscription on a replica * [ ] `iroh-sync` shouldn't depend on `iroh-bytes` only for `Hash` type -> #1354 * [ ] * [ ] Move `sync::live::PeerSource` to iroh-net or even better -> #1354 * [ ] `StoreInstance::put` propagate error and verify timestamp is reasonable. * [ ] `StoreInstance::get_range` implement inverted range * [ ] `iroh_sync`: Remove some items only used in tests (marked with #[cfg(test)]) * [ ] `iroh_sync` fs store: verify get method fetches all keys with this namespace * [ ] `ranger::SimpleStore::get_range`: optimize * [ ] `ranger::Peer` avoid allocs? * [ ] `fs::StoreInstance::get_fingerprint` optimize * [ ] `SyncEngine::doc_subscribe` remove unwrap, handle error ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [ ] Tests if relevant. --------- Co-authored-by: dignifiedquire <[email protected]> Co-authored-by: Asmir Avdicevic <[email protected]> Co-authored-by: Kasey <[email protected]>
matheus23
pushed a commit
that referenced
this pull request
Nov 14, 2024
## Description This adds a REPL to the main `iroh` binary. It builds upon #1216. * The REPL embeds all commands that operate via the RPC client - which, currently, is everything but `provide` (which should be called `start`), `get` and `doctor`. * The REPL has two new, REPL-only commands: `set-doc` and `set-author`. `set-doc` changes the state of the REPL: author and document will be displayed above the input line, and the document commands (`set`, `get`, `list` , `share` etc) will be available top-level. * Most of the changes in `src/commands.rs` only move code and the `rpc_port` option around, the individual commands are not changed in this PR. ## Notes & open questions * The document and author IDs in the `list` commands are currently printed in hex. Should change to base32. * I'm not yet super sure about the `set-doc` and `set-author` commands. Another path might be to dig more into a pwd-like structure and have a `cd` command or so. This could then also move further into documents. I'm not sure how the author fits in here. * When in the level of a document, there's a conflict between the `doc list` command (which is just `list` then) and the top-level `list` command (to list blobs and collections). Not sure yet what the solution is. For now I embedded only the `sync` commands on the doc level, but I think I'd prefer to have the global set of commands not change between levels. Maybe we just rename the top-level `list` command to `blobs` and group the other blob-related commands (`add', a tbd `get` via RPC, possibly `export`) there. * The REPL embeds all the existing RPC CLI commands. For this I changed the structure of the `src/cli/commands.rs` to a) split between `RpcCommands` and `FullCommands` , the latter are the ones that start an actual iroh node (plus doctor, wasn't sure about that for now). The former all work atop the RPC client. For them to not create a new RPC client for each REPL command, I moved the `--rpc-port` option to the top level. This is not super correct, because it does not apply to `provide`, `get`, `doctor`. Clap does not allow to scope an argument to a set of subcommands by default, see [this discussion](clap-rs/clap#5070 (comment)). Still thinking about what the cleanest solution is. ## Change checklist - [ ] Self-review. - [ ] Documentation updates if relevant. - [ ] Tests if relevant. --------- Co-authored-by: dignifiedquire <[email protected]> Co-authored-by: Asmir Avdicevic <[email protected]> Co-authored-by: Kasey <[email protected]> Co-authored-by: Brendan O'Brien <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds
iroh-sync
, a document synchronization protocol, to iroh, and integrates withiroh-net
,iroh-gossip
andiroh-bytes
.At the core is the
iroh-sync
crate, with a set reconciliation algorithm implemented by @dignifiedquire. See https://github.com/n0-computer/iroh-sync/ for its prehistory. TODO: More detailsThe PR is structured as follows:
iroh-sync
crate contains the basic data structures. It has a dependency oniroh_bytes
at the moment, but only for itsHash
type. It does not perform any IO, the implementation is in-memory only. Its basic data structures is aReplica
, which is the local instance of a multi-writer key value store.iroh::sync
, there areconnect_and_sync
andhandle_connection
functions that integrate withiroh_net
to perform the set reconciliation operation between two peersiroh::sync::live::LiveSync
is the handle to an actor that integrates sync with gossip: Peers join an iroh-gossip topic for each open document, and theLiveSync
then watches for changes in these documents and broadcasts the new entry to other peer overiroh-gossip
. The entries are signed, so incoming entries are validated and then inserted into the local replica.iroh::sync::content
module integrates the replicas withiroh-bytes
by providing aDoc
abstraction that combines a replica with a way to fetch and insert blobs into a localiroh-bytes
database.Finally,
iroh/examples/sync.rs
has a REPL to interact with documents. You can open documents, invite peers to join the doc, and insert / list / get entries. The example also provides basic persistence for documents by serializing them into a file on disc.Here's a rough list of changes from the original PR in #1177:
Downloader
to queue get requests for individual hashes from individual peersBlobStore
that combines the writable db with the downloaderDoc
abstraction that combines an iroh-syncReplica
with aBlobStore
to download content from peers on-demandReplica
(encode to bytes) and uses this in the repl exampleNotes & open questions
cargo test
doesn't compile atm. Only thesync
example works.Doc
abstraction that combines a replica with a content blob store. We will have to think if we want this and what API it should expose. It's quite nice from a user perspective I think.DownloadMode
that is eitherAlways
orManual
. IfManual
, then no content downloads are triggered. IfAuto
, it will always try to immediately download the content for newly added keys. We might also want to add aFiltered
mode or so that executes a callback to allow app level heuristics on what to downloadWritableFileDatabase
is a quick-and-dirty intermediate solution until thegeneric-db-rewrite
arrivesDownloader
is not as quick and dirty, but should be replaced with functionality from iroh-bytes or moved to there and extended to really work well with fetching stuff from multiple peers, possibly in parallel in the futureChange checklist