feat(iroh-net): combine discovery services and add heuristics when to start discovery #2056

Frando · 2024-03-04T15:48:22Z

Description

Changes the resolve method of the discovery trait to return a Stream of DiscoveryItems
Adds a CombinedDiscovery to combine multiple discovery services
Always uses the discovery service when connecting to a node to which we don't have any address yet
Start the quinn connect as soon as a first discovery result came in, but continue the discovery until we have a successful connection
Move the discovery trait from module magicsock to new module discovery

Notes & open questions

Should we also attempt a discovery if we have some address already?
Add tests

Change checklist

Self-review.
Documentation updates if relevant.
Tests if relevant.

dignifiedquire · 2024-03-04T17:40:07Z

iroh-net/src/discovery.rs

+    ///
+    /// This will be called from a tokio task, so it is safe to spawn new tasks.
+    /// These tasks will be run on the runtime of the [`super::MagicEndpoint`].
+    fn publish(&self, _info: &AddrInfo) {}


why are these not async?

I can answer this at least for the original discovery trait.

They are intended to be "fire and forget". At the site where you call these you do not want to wait for them to complete. E.g. if we publish to a HTTP PUT endpoint we don't want to wait for successful publish before continuing, since that would probably create havoc inside the magicsock and in any case not help anybody.

E.g. in case of the pkarr discovery, this just puts the record somewhere and starts a loop to publish and republish to the DHT. Publishing to the DHT can take seconds, you don't want to wait for that. And what are you going to do if it fails? E.g. you are offline.

well from a task management perspective I would have expected a handling more like this

// in the magicsock let publish_task = tokio::task::spawn(discovery.publish()); // track task or drop depending on the management

because this way you can eg cancel things on shutdown

The task is owned by the discovery service, so it's the job of the discovery service to cancel the task on drop. I think there is not necessarily is a 1:1 relation between tasks that need to be spawned and publish calls.

E.g. in the pkarr discovery there is just a single long lived republish task owned by the discovery, and calling publish just replaces the value to be published with the new one.

how would I communicate that I want the publish to end then? there is no shutdown method or anything like that if this is service is long running

Should we maybe pass CancellationTokens into both resolve and publish? This would give full flexibility to both. Downside is that implementors would have to use tokio::select! likely. Still, even for the resolve case it could make implementations simpler. Depending on how a resolver works, detecting that the returned stream was dropped may not be straightforward or instant. E.g. with channels, you might only realize once you try to sent, and so you might have been doing uneeded work already up to that point.

Or we could add an optional fn shutdown() to the discovery trait which will be called in MagicEndpoint::close and can be used by the discovery service to abort all tasks it started.

I like the cancellation token approach

When I saw the doc comment here I was wondering why the signature does not give you a handle to the runtime to spawn instead of having it just in the documentation that "tokio spawn is fine".

Given the cancellation discussion, maybe it could even be a JoinSet that you're politely asked to spawn into, and then you get aborting for free when the magicsock drops the joinset.

Leaving this for a followup and created an issue here: #2066

iroh-net/src/discovery.rs

…sock actor (#2058) ## Description While working on #2056 I spotted that we use the actor inbox with return channels for information that is readily available already on the shared inner magicsock. This removes the unneeded complexity and thus simplifies `get_mapping_addr`, `endpoint_info` and `endpoint_infos` to return the information non-async and infallible. Yay! ## Notes & open questions  ## Change checklist - [x] Self-review.

Frando · 2024-03-07T10:54:48Z

I added more docs and tests, and implemented a DiscoveryTask with the heuristics as discussed in Discord:

connect called with NodeAddr
If NodeAddr contains direct_addrs or derp_url: Add to magicsock
If no direct_addrs or derp_url provided, and no info in magicsock, start discovery and wait for first result
If we haven't received recently on any path from the node, start task in background:
- Check: Did we receive recently? If so, done.
- If user provided new addresses, wait for a delay of 500ms
- Check again: Did we receive recently? If so - done. If not - start discovery.
start connect_quinn. This starts in parallel to the discovery (maybe) still running.
Quinn will retry automatically for up to 10s in our config.
Abort discovery task once a connection is made

This is ready to review now. I will rebase the DNS discovery on top of this also I think.

dignifiedquire · 2024-03-07T12:31:42Z

iroh-net/src/discovery.rs

+    pub addr_info: AddrInfo,
+}
+
+/// A discovery service that combines multiple discovery sources.


can you add a comment that it resolves in parallel vs serial

maybe even call it ConcurrentDiscovery?

dignifiedquire · 2024-03-07T12:32:21Z

iroh-net/src/discovery.rs

+    /// Start a discovery task.
+    pub fn start(ep: MagicEndpoint, node_id: NodeId) -> Result<Self> {
+        if ep.discovery().is_none() {
+            bail!("No discovery services configured");


could use ensure

dignifiedquire

some nits but overall nice improvements

dignifiedquire reviewed Mar 4, 2024

View reviewed changes

iroh-net/src/discovery.rs Outdated Show resolved Hide resolved

Frando force-pushed the feat/combined-discovery branch from dd18f90 to 489ee46 Compare March 6, 2024 11:06

Frando mentioned this pull request Mar 6, 2024

refactor(iroh-net): remove unneeded async interactions with the magicsock actor #2058

Merged

1 task

Frando force-pushed the feat/combined-discovery branch from a0604fc to fb60d74 Compare March 7, 2024 10:45

Frando added 9 commits March 7, 2024 11:45

feat: combine discovery services

09a7973

refactor: impl from iterator for combined discovery

a088781

tests: add tests for combined discovery

42f1f3b

chore: fmt

8bd2ea3

chore: fix doc link

6008c66

chore: clippy

5a0fd66

refactor: better controls for discovery tasks

4c81604

cleanup

19e9ae7

docs and cleanup

2db91c8

Frando force-pushed the feat/combined-discovery branch from fb60d74 to 2db91c8 Compare March 7, 2024 10:45

Frando changed the title ~~(WIP) feat: combine discovery services~~ feat: combine discovery services Mar 7, 2024

more docs

ed52ced

chore: clippy & doclinks

25c847b

Frando requested review from rklaehn and dignifiedquire March 7, 2024 10:56

Frando changed the title ~~feat: combine discovery services~~ feat: combine discovery services and add heuristics when to start discovery Mar 7, 2024

Frando changed the title ~~feat: combine discovery services and add heuristics when to start discovery~~ feat(iroh-net): combine discovery services and add heuristics when to start discovery Mar 7, 2024

Frando added 2 commits March 7, 2024 12:23

better tracing logs

b5f9728

fix instrumentation

adbc41f

Frando force-pushed the feat/combined-discovery branch from d6e3ac7 to adbc41f Compare March 7, 2024 11:29

Frando added 2 commits March 7, 2024 12:56

refactor: do not require static lifetime on discovery streams

9d62fea

chore: clippy

b5d5d55

dignifiedquire reviewed Mar 7, 2024

View reviewed changes

dignifiedquire approved these changes Mar 7, 2024

View reviewed changes

flub approved these changes Mar 7, 2024

View reviewed changes

Frando mentioned this pull request Mar 7, 2024

feat: node discovery via DNS #2045

Merged

3 tasks

Frando added 2 commits March 8, 2024 00:14

rename to ConcurrentDiscovery

b380b19

chore: doc link

9383427

Frando mentioned this pull request Mar 11, 2024

(iroh-net) Manage tasks for Discovery::publish in MagicEndpoint #2066

Open

Frando added this pull request to the merge queue Mar 11, 2024

Merged via the queue into main with commit f4d3fab Mar 11, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iroh-net): combine discovery services and add heuristics when to start discovery #2056

feat(iroh-net): combine discovery services and add heuristics when to start discovery #2056

Frando commented Mar 4, 2024 •

edited

Loading

dignifiedquire Mar 4, 2024

rklaehn Mar 5, 2024

dignifiedquire Mar 5, 2024

rklaehn Mar 5, 2024

dignifiedquire Mar 5, 2024

Frando Mar 6, 2024

Frando Mar 7, 2024

dignifiedquire Mar 7, 2024

flub Mar 7, 2024

Frando Mar 11, 2024

Frando commented Mar 7, 2024

dignifiedquire Mar 7, 2024

flub Mar 7, 2024

dignifiedquire Mar 7, 2024

dignifiedquire left a comment

feat(iroh-net): combine discovery services and add heuristics when to start discovery #2056

feat(iroh-net): combine discovery services and add heuristics when to start discovery #2056

Conversation

Frando commented Mar 4, 2024 • edited Loading

Description

Notes & open questions

Change checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Frando commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dignifiedquire left a comment

Choose a reason for hiding this comment

Frando commented Mar 4, 2024 •

edited

Loading