Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Polykey Node Discovery #584

Merged
merged 9 commits into from
Oct 25, 2023
Merged

Local Polykey Node Discovery #584

merged 9 commits into from
Oct 25, 2023

Conversation

amydevs
Copy link
Contributor

@amydevs amydevs commented Oct 16, 2023

Description

This PR integrates MDNS into the NodeConnectionManager to allow for the discovery of nodes on the local network, as well as disable holepunching for connections to local network addresses.

This PR should only focus on the Step 1 of #537 spec, as modifying the node graph to accept multiple IP addresses will require ACL changes that default disable the sharing of local addresses with seed nodes, as well as expansion to support multiple IP addresses.

This PR will encapsulate MDNS in the NodeConnectionManager, as well as make methods such as getConnection to be able to query MDNS for known hosts through the newly added findNodesAll and findNodesLocal methods. findNodesAll is a method that calls both the original findNode and findNodesLocal to aggregate a list of all NodeAddresses. This is notably only used in connection-related methods rather than those related to Kademlia (NodeGraph).

When the NodeConnectionManager is started, MDNS will be started, with a Polykey service registered, and a query for Polykey services started.

When the NodeConnectionManager is stopped, MDNS will be stopped alongside all pending queries, and deregistering all registered services.

Current scopes: external and local.

Issues Related

Tasks

  • 1. Integrate MDNS into NodeConnectionManager
  • 2. Separate MDNS and Kademlia Behaviour
  • 3. Update NodeConnectionManager Methods to Support Multiple IP Addresses
    • getConnectionWithAddress
    • getConnection
  • 4. Select a well-known port and multicast address for Polykey usage.
  • 5. Modify Network Domain to Support Link-Local Addresses with a Interface ID Suffix.
  • 6. Creating of connections should happen in stages, where local addresses should be tried for first before external addresses

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@amydevs
Copy link
Contributor Author

amydevs commented Oct 16, 2023

When a Polykey agent has it's host bound to ::0 and is trying to connect to a node at some ipv6 address, it is failing due to the formatting of my MDNS supplied ipv6 addresses.

When i encapsulate my IPv6 addresses in brackets (i.e [::1]), they are not recognized as hosts by resolveHostnames in src/network/utils.ts of (calls isHost to check if the provided address is a valid host). resolveHostnames will hence return no addresses at all, and will cause no QUICConnections to be made.

When i don't encapsulate my IPv6 addresses in brackets, they are recognized as hosts by resolveHostnames in src/network/utils.ts. However, the std::net IPv6Addr parse function fails only on link-local IPv6 addresses that aren't encapsulated in brackets (i.e fe80::260:8ff:fe52:f9d8) fail to be parsed as ipv6 addresses.

So, we're going to be ignoring ipv6 for now regarding MDNS.

See #584 (comment)

@ghost
Copy link

ghost commented Oct 16, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@CMCDragonkai
Copy link
Member

If it only fixes a portion of #537, don't say Fixes #537, say Related #537 in the OP.

@CMCDragonkai
Copy link
Member

Is the bracket encapsulated format the correct format?

If so, just align on that, and then just update isHost to make it work.

@CMCDragonkai
Copy link
Member

QUIC definitely supports ipv6. Try testing with non-link local IPv6 addresses first.

@CMCDragonkai
Copy link
Member

The square braces should only come into play with URIs, generally speaking none of internal network code should be using square braced hosts. So it should not be expected to work.

@CMCDragonkai
Copy link
Member

The only time we use square braces is on input and output, like for example the seed node configuration.

@CMCDragonkai
Copy link
Member

Can you check if maybe a scoped ipv6 address is needed, and if so, perhaps that needs to be passed all the way down. In that case the isHost or isIPv6 check needs to support scoped ipv6 address.

@amydevs
Copy link
Contributor Author

amydevs commented Oct 16, 2023

Can you check if maybe a scoped ipv6 address is needed, and if so, perhaps that needs to be passed all the way down. In that case the isHost or isIPv6 check needs to support scoped ipv6 address.

it seems that ip-num does not correctly parse nor validate ipv6 link-local addresses, this will have to be addressed

@amydevs
Copy link
Contributor Author

amydevs commented Oct 16, 2023

it turns out there were two issues with MDNS integration that showed up on attempting to connect to a node via ipv6.

  1. Link-local ip addresses will fail to be parsed by std::net Rust library if they do not contain the network interface name or scope id.
  2. Promise.all([Promise.race(connProms)]) was causing getMultiConnection to fail where there is more than one address provided. In the case of MDNS discovery, there were more than one address discovered on my machine due IPv4 + IPv6 interfaces.

@amydevs amydevs force-pushed the feature-local-discovery branch 2 times, most recently from 5c5c7ca to 364a503 Compare October 16, 2023 07:47
@CMCDragonkai
Copy link
Member

it turns out there were two issues with MDNS integration that showed up on attempting to connect to a node via ipv6.

  1. Link-local ip addresses will fail to be parsed by std::net Rust library if they do not contain the network interface name or scope id.
  2. Promise.all([Promise.race(connProms)]) was causing getMultiConnection to fail where there is more than one address provided. In the case of MDNS discovery, there were more than one address discovered on my machine due IPv4 + IPv6 interfaces.

See #585, @tegefaulkes any fixes relating to multi-connection?

So how are you working around ip-num support?

@amydevs amydevs force-pushed the feature-local-discovery branch from 364a503 to d7ec6c7 Compare October 16, 2023 07:58
@CMCDragonkai
Copy link
Member

Can you explain the result of this PR? What changes to the behaviour to agent startup? You should expand the spec, since this is a rather large major feature.

@tegefaulkes
Copy link
Contributor

I'll do a proper review tomorrow.

@amydevs
Copy link
Contributor Author

amydevs commented Oct 16, 2023

it turns out there were two issues with MDNS integration that showed up on attempting to connect to a node via ipv6.

  1. Link-local ip addresses will fail to be parsed by std::net Rust library if they do not contain the network interface name or scope id.
  2. Promise.all([Promise.race(connProms)]) was causing getMultiConnection to fail where there is more than one address provided. In the case of MDNS discovery, there were more than one address discovered on my machine due IPv4 + IPv6 interfaces.

See #585, @tegefaulkes any fixes relating to multi-connection?

So how are you working around ip-num support?

i copied over the fixes yeah

@CMCDragonkai
Copy link
Member

You can rebase now, since it has been merged to staging.

@amydevs amydevs force-pushed the feature-local-discovery branch from c0f0fa6 to 3f3e6df Compare October 17, 2023 23:52
@CMCDragonkai
Copy link
Member

Are all the tasks there still todo, and are there more tasks to do? When can @tegefaulkes review this?

@amydevs amydevs force-pushed the feature-local-discovery branch 2 times, most recently from 7128daf to ae8b5f8 Compare October 18, 2023 05:50
@amydevs amydevs changed the title Feature local discovery Local Polykey Node Discovery Oct 18, 2023
@amydevs
Copy link
Contributor Author

amydevs commented Oct 18, 2023

Are all the tasks there still todo, and are there more tasks to do? When can @tegefaulkes review this?

I think review should be ready, although I have to make the MDNS tests to be more concise, and haven't tested it in a local network over multiple devices yet.

@CMCDragonkai
Copy link
Member

Are you going to tick off the tasks then?

@CMCDragonkai
Copy link
Member

Can you elaborate on the current problem atm.

@CMCDragonkai
Copy link
Member

MatrixAI/js-mdns#28

@amydevs amydevs force-pushed the feature-local-discovery branch from 9a82784 to 51627bc Compare October 24, 2023 22:38
@amydevs
Copy link
Contributor Author

amydevs commented Oct 25, 2023

I need to have some empirical results on what you mean by "slow". @amydevs

i mean iterating over every address is a bit inefficient, especially because i have to do it twice, once when turning the queried MDNS services into NodeAddresses, and the second time here

@amydevs amydevs force-pushed the feature-local-discovery branch 2 times, most recently from d0ce5a3 to 78f01a2 Compare October 25, 2023 02:13
@amydevs
Copy link
Contributor Author

amydevs commented Oct 25, 2023

The latest commit makes it so that starting a connection (getConnectionWithAddresses) will first try the local addresses, and then the external addresses. This is currently slow and requires an O(n) operation where n is the number of addresses. A solution would be to be able to pass in multiple arrays of addresses for each address scope. if this currently isn't a concern, commit can be dropped.

const addressGroups: { local: Array<NodeAddress>, external: Array<NodeAddress> } = { local: [], external: [] };
    for (const address of addresses) {
      if (address.scopes.includes('local')) {
        addressGroups.local.push(address);
      }
      else {
        addressGroups.external.push(address);
      }
    }
    let timeoutDivisions = 0;
    for (const addressGroup of Object.values(addressGroups)) {
      if (addressGroup.length !== 0) {
        timeoutDivisions++;
      }
    }
    this.logger.debug(`Getting NodeConnection for ${targetNodeIdEncoded}`);
    return await this.connectionLocks
      .withF([targetNodeIdString, Lock, ctx], async () => {
        this.logger.debug(`acquired lock for ${targetNodeIdEncoded}`);
        // Attempting a multi-connection for the target node
        const timeout = ctx.timer.getTimeout() / timeoutDivisions;
        for (const addressGroup of Object.values(addressGroups)) {
          if (addressGroup.length === 0) {
            continue;
          }
          let results = await this.establishMultiConnection(
            [targetNodeId],
            addressGroup,
            {
              signal: ctx.signal,
              timer: timeout
            },
          );
          // Should be a single result.
          for (const [, connAndTimer] of results) {
            return connAndTimer;
          }
        }
        // Should throw before reaching here
        utils.never();
      })

I'm thinking you should do something like this?

* Try all local

* Try all global

That way local set is prioritised first.

Timeout wise, it would be shared between the 2... it might not be a problem, but maybe it needs to increase. Needs empirical test, tell me what the result will be.

let timeoutDivisions = 0;
    const addressGroups: { local: Array<NodeAddress>, external: Array<NodeAddress> } = { local: [], external: [] };
    for (const address of addresses) {
      let scope = address.scopes.includes('local') ? 'local' : 'external';
      // If this is the first time an addressGroup has had an address added, the timeout divisions must be incremented.
      if (addressGroups[scope].length === 0) {
        timeoutDivisions++;
      }
      addressGroups[scope].push(address);
    }
    this.logger.debug(`Getting NodeConnection for ${targetNodeIdEncoded}`);
    return await this.connectionLocks
      .withF([targetNodeIdString, Lock, ctx], async () => {
        this.logger.debug(`acquired lock for ${targetNodeIdEncoded}`);
        // Attempting a multi-connection for the target node using local addresses
        const timeout = ctx.timer.getTimeout() / timeoutDivisions;
        let results: Map<NodeIdString, ConnectionAndTimer> | undefined;
        if (addressGroups.local.length !== 0) {
          results = await this.establishMultiConnection(
            [targetNodeId],
            addressGroups.local,
            {
              signal: ctx.signal,
              timer: timeout
            },
          );
        }
        // If there are no results from the attempted local connections, attempt a multi-connection for the target node using external addresses
        if (results == null || results.size === 0) {
          results = await this.establishMultiConnection(
            [targetNodeId],
            addressGroups.external,
            {
              signal: ctx.signal,
              timer: timeout
            },
          );
        }
        // Should be a single result.
        for (const [, connAndTimer] of results) {
          return connAndTimer;
        }
        // Should throw before reaching here
        utils.never();
      })

i've fixed this

Copy link
Contributor

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things but otherwise seems fine.

Don't forget to lint, I saw an unused import somewhere and linting should catch that.

If all the tests pass then you can merge.

src/PolykeyAgent.ts Outdated Show resolved Hide resolved
src/PolykeyAgent.ts Outdated Show resolved Hide resolved
src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved
src/nodes/NodeConnectionManager.ts Outdated Show resolved Hide resolved
@amydevs amydevs force-pushed the feature-local-discovery branch 4 times, most recently from baf982c to 5c4e258 Compare October 25, 2023 04:24
@CMCDragonkai
Copy link
Member

I need a better spec for scopes. What is meaning and consequences of it?

What is external? What is local? Why is there no "global"? What can this be expanded to? Copy content to #537.

@amydevs amydevs force-pushed the feature-local-discovery branch from 5c4e258 to c34cede Compare October 25, 2023 05:25
@amydevs
Copy link
Contributor Author

amydevs commented Oct 25, 2023

I need a better spec for scopes. What is meaning and consequences of it?

What is external? What is local? Why is there no "global"? What can this be expanded to? Copy content to #537.

I already expanded on it in the spec before:

#537

@amydevs amydevs force-pushed the feature-local-discovery branch from 7518a30 to 08c579e Compare October 25, 2023 05:33
fix: changed NodeAddress type and added ability for `NodeConnectionManager.findNodes` to use MDNS

chore: updated @matrixiai/mdns to 1.1.2

fix: temporarily removed IPv6 addresses from MDNS NodeAddress data

fix: `NodeConnectionManager.findNodes` has been separated into 3 methods to not have NodeGraph-related methods interfere with MDNS-based discovery

fix: paramaritized the enabling of the MDNS stack within
NodeConnectionManager and disabled use of MDNS in all unrelated
NodeConnectionManager tests

fix: added back IPv6 addresses in MDNS NodeAddress data

fix: NCM getMultiConnection was failing when more than one NodeAddress
was provided

fix: `NodeConnectionManager.seednodes.test.ts` now includes scopes for node addresses

fix: renamed findNodeLocal and findNodeAll to be less ambiguous

chore: inline documentation for `NodeConnectionManager.findNodeLocal`

feat: created `agentMdnsGroups` and `agneMdnsPort` MDNS configuration in `config.defaultsSystem`

chore: bump `@matrixai/mdns` to `v1.1.6`

feat: better handling of disabling iceProm for `local` `NodeAddress`s in `NodeConnectionManager.establishSingleConnection`

fix: client handler input types for `NodesFind` reverted

fix: errors are now thrown if a connection is attempted with no addresses provided

fix `networkUtils.isHost` is now able to validate link-local addresses

fix: temporarily disabled connecting to link-local addresses with `NodeConnection`

chore: bump `@matrixai/mdns` to `v1.2.0`

fix: reduce `any` typecasts in `NodeConnectionManager.findNodeLocal`

fix: broken jest tests due to mandatory `scopes` property of `NodeAddress`

[ci-skip]
…ected dep to `NodeConnectionManager`

fix: `PolyKeyAgent` now correctly injects `mdns` into the constructor of `NodeConnectionManager`

fix: await mdns.start

fix: `MDNS`ids are now determined by first 16 bits of node id

[ci-skip]
@amydevs amydevs force-pushed the feature-local-discovery branch from 08c579e to 97ea38e Compare October 25, 2023 05:36
@amydevs amydevs force-pushed the feature-local-discovery branch from 97ea38e to 66e4916 Compare October 25, 2023 05:47
…not conflict with existing PolykeyAgents on host

[ci-skip]
…efaulted` for `PolykeyAgent.start`

[ci-skip]
…NS queries, so that connectivity liveness is preserved

fix `NodeConnectionManager.findNodeAll` has been given a default time out of `NodeConnectionManager.connectionConnectTimeoutTIme`

[ci-skip]
…ated into multiple stages so that if connecting with `local` `NodeAddress`s succeeds, `external` `NodeAddress`s will be ignored

fix: `NodeConnectionManager.getConnectionWithAddresses` now attempts connections with `local` scoped addresses before `external` scoped addresses

fix: correct typing on `establishMultiConnection` for `ctx` param

[ci-skip]
[ci-skip]
@amydevs amydevs force-pushed the feature-local-discovery branch from 66e4916 to e698dfd Compare October 25, 2023 05:55
@amydevs amydevs merged commit 921624d into staging Oct 25, 2023
@amydevs amydevs mentioned this pull request Nov 10, 2023
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants