Decentralized Signalling #618

CMCDragonkai · 2023-11-07T02:33:12Z

Description

This introduces decentralized signalling which allows the PK nodes to:

Use SRV records to resolve the seed nodes - this supports Testnet & Mainnet Status Page testnet.polykey.com mainnet.polykey.com #599, but is also necessary to allow dynamic seed nodes
Connect to only a subset of the seed nodes, rather than always connecting to all seed nodes - thus allowing us to scale the seed nodes as the network gets larger
Find the nearest signalling node to use to try and help connect to a target node, this process is iterative, starting from the final target node, and walking backwards through a chain of nodes that are connected to the final target node
It will have to remove the hardcoded seed node list in order to use SRV records, however this PR won't try to solve the CSR feature for Keys domain to allow KN root certificates to be signed by an external CA #154 problem with having a root certificate of the nodes, so that will be done later to re-introduce trusted seed nodes
It should mean that the load on the seed nodes decreases as it can rely on other nodes to participating in the signalling coordination - this should be optional, as some nodes may not want to do so
This will also be maintaining random connections in order to maintain connectivity of the network graph assuming all nodes were to be behind the NAT. This randomness may be directed with some policy.

Issues Fixed

Fixes Decentralised NAT Signalling #365
Related Decentralised NAT Relaying #182
Related CLI Beta Launch Polykey-CLI#40
Related MDNS integration and NodeGraph structure expansion #537 - especially because the MDNS nodes aren't part of the node graph atm - so they represent a different source of nodes, so ICE connections may need to consider these separately
Related Decentralised NAT Punch-Through Simulation #372 - this PR was only for simulating code, it is superseded by this PR

Tasks

remaining tasks.

1. finish fixing up NodeManager
2. Update findNode to take a pingTimeoutTime for setting smaller timeouts for each connection attempt
3. Implement sticky connection logic. This is a combo of deciding which nodes to keep long-running connections to. And to attempt re-connecting when a long-running connection drops. I need to work out some details for this. Potentially scale timeouts by the closeness metric.
4. re-implement the MDNS functionality.
~~5. Implement periodic updating the connectedTime in the NodeGraph for active nodes.~~ - differed till later, will create an issue
6. Review remaining tests. Cull irrelevant ones and convert ones we want to keep.
7. review usage of the nodes domain in other parts of the code and fix their usage.

Final checklist

ghost · 2023-11-07T02:33:58Z

👇 Click on the image for a new way to code review

Legend

CMCDragonkai · 2023-11-07T06:18:50Z

I'm thinking that whatever function does the connection, let's call it connectTo(nodeId, ip, port) needs attempt a direct connection, and in the background run a signalling operation.

If the DC succeeds, it needs to cancel the signalling operation.

The background/concurrent signalling operation, needs to try and find an appropriate signaller. If there is an appropriate signaller, we just start using them straight away.

If there isn't an appropriate signaller, and the direct connection didn't work, we now can try to find another node that is close to the target node and try the same thing with that target node.

This how we walk backwards (backtrack) from the target node towards a close enough node to connect to first, and the forward track from there towards the target node. You'd backtrack to a node that is close to the target, but also closer to your own node.

However to even call connectTo we would need to already have the IP and port. In the scenario where we don't have the IP and port at all, and we have to use kademlia's find node operation to get close nodes that will eventually give us information about the target node. In this case, we're always forward tracking towards the target node, repeatedly doing connectTo(nodeId, ip, port) with some node close to the target node until we can try with the target node. In this case, we may have built up information about which signalling node is appropriate to use.

Related to maze traversal and path finding here

CMCDragonkai · 2023-11-08T22:43:43Z

It seems NodeGraph.getOldestNode can be replaced with NodeGraph.getBucket by adding one more parameter limit: number = Infinity.

You just have to use sort = 'lastUpdated';

The limit can be injected into the 2 iterators over the NodeData. The good thing is that this call would get you an entire NodeBucket to work from.

CMCDragonkai · 2023-11-09T00:02:45Z

I've replaced getOldestNodeId with getBucket instead with a limit parameter. Will need to update tests later.

CMCDragonkai · 2023-11-09T00:11:38Z

In reviewing NodeGraph, I think the most important function to get a handle on is the NodeGraph.getClosestNodes. That ultimately is the basis of returning close nodes.

If we add extra data to the NodeGraph that applies to the NodeData type.

/**
 * This is the record value stored in the NodeGraph.
 */
type NodeData = {
  /**
   * The address of the node.
   */
  address: NodeAddress;
  /**
   * Unix timestamp of when it was last updated.
   */
  lastUpdated: number;
};

There are some suggestions that can help here:

Making use of the scopes in reference to MDNS integration and NodeGraph structure expansion #537 (comment) - this could be used to imply some special nature of the records, perhaps that they are a public node? The reality it's not a proof of a public node, it is just simply that you believe it is a public IP address.
Adding provenance information - so we know who provided us this information. This would make the lastUpdated a bit more useful.
Maybe whether an address should be stable and not idled out? Actually this seems like something that the NodeManager would instead maintain.
Anything else?

CMCDragonkai · 2023-11-09T17:58:58Z

A quick and dirty solution that @tegefaulkes suggested is for us to only maintain connections to our closest seed node. That way, we continue to be centralized, but it will allow us to scale the seed nodes for MatrixAI/Polykey-CLI#40

CMCDragonkai · 2023-11-09T18:04:01Z

I noticed that NodeGraph.nodeBucketLimit was hardcoded to 20, this is now configurable with during creation and construction.

CMCDragonkai · 2023-11-09T18:20:57Z

Now if NodeGraph does have nodeBucketLimit set. That should mean that NodeGraph.setNode should also be aware of this throw an exception if one is setting a node more than the bucket limit.

It doesn't make sense to have that setting be in NodeGraph but not respected by NodeGraph.setNode.

CMCDragonkai · 2023-11-09T18:32:59Z

But nodeBucketLimit is only used in 2 places. NodeGraph.resetBuckets and NodeGraph.getClosestNodes.

For NodeGraph.getClosestNodes it could just have no limit at all, or a limit of 1 to default it. So then we can leave it out.

In NodeGraph.resetBuckets, it uses it to compare against countNew, and if it is greater than the countNew, it will increment the count.

It is needed here because during remapping, there could more nodes being added to a bucket greater than the required bucket limit.

In that case, NodeGraph.setNode should be throwing an exception if the count exceeds the bucket limit thus the constraint is maintained here.

CMCDragonkai · 2023-11-09T18:48:07Z

There's even an exception for this already ErrorNodeGraphOversizedBucket but it is never been used! @tegefaulkes there's nothing ensuring that the buckets size has to be kept to 20.

CMCDragonkai · 2023-11-09T18:49:25Z

I'm changing it to ErrorNodeGraphBucketLimit and it will now be thrown on NodeGraph.setNode.

CMCDragonkai · 2023-11-09T18:54:59Z

In terms of bucket refresh, I think:

That is buckets are refreshed if no lookups have occurred.

This condition cannot be dependent on the lastUpdated value for each record in the bucket.

This is because lastUpdated just means the time the record was last updated (set or updated).

This doesn't tell us anything about whether a bucket has had a lookup occur in the last hour.

I haven't got around to checking NodeManager yet, but comments @tegefaulkes?

CMCDragonkai · 2023-11-09T18:58:07Z

In terms of bucket refresh, I think:

That is buckets are refreshed if no lookups have occurred.

This condition cannot be dependent on the lastUpdated value for each record in the bucket.

This is because lastUpdated just means the time the record was last updated (set or updated).

This doesn't tell us anything about whether a bucket has had a lookup occur in the last hour.

I haven't got around to checking NodeManager yet, but comments @tegefaulkes?

A proper condition on this would have to be maintained by the NodeManager, not NodeGraph.

We could even make it a completely in-memory data, rather than storing it into the DB. If we persist this information, this can go into the NodeGraph.nodeGraphMetaDbPath.

However if we do this, then NodeManager needs to be resetting up the background tasks with this persisted data.

Thoughts? @tegefaulkes

CMCDragonkai · 2023-11-09T20:51:06Z

Looking at NodeGraph.getClosestNodes @tegefaulkes your comment here is hard to understand. Can you clarify what you are talking about here?

    // Buckets map to the target node in the following way;
    // 1. 0, 1, ..., T-1 -> T
    // 2. T -> 0, 1, ..., T-1
    // 3. T+1, T+2, ..., 255 are unchanged
    // We need to obtain nodes in the following bucket order
    // 1. T
    // 2. iterate over 0 ---> T-1
    // 3. iterate over T+1 ---> K

CMCDragonkai · 2023-11-09T21:18:29Z

In NodeGraph.getClosestNodes why do you add the last bucket to the results array. Why do you do this and then do a slicing/truncation to the limit at the end? @tegefaulkes

CMCDragonkai · 2023-11-09T21:25:39Z

Introduction of ErrorNodeGraphBucketLimit broke a few tests:

    ✕ get bucket with multiple nodes (2268 ms)
    ✕ get all buckets (1910 ms)
    ✕ reset buckets (with seed=2137511502) (120 ms)
    ✕ reset buckets to an existing node should remove node (with seed=2137511502) (154 ms)
    ✕ reset buckets is persistent (70 ms)

It's possible they were not respecting the bucket limit.

CMCDragonkai · 2023-11-09T21:33:21Z

Ok I can see a sleep being used in get bucket with multiple nodes, and the reason is simple, the lastUpdated is set by NodeGraph. We actually need to parameterise this, so that it can be tested.

Furthermore, the exact time we update our NodeGraph.setNode can actually be dependent on the NodeManager, the main reason is if you need to compare timing from different parts of the program, if concurrent things are to have occurred at the same time, it might make more sense if they have matching timestamps.

So setNode will be expanded with an optional lastUpdated parameter defaulted to the getUnixtime().

CMCDragonkai · 2023-11-09T21:42:01Z

Replacing sleep with:

    const lastUpdatedNow = utils.getUnixtime();
    const lastUpdatedTimes = Array.from({ length: nodeBucketLimit }, (_, i) => {
      return lastUpdatedNow - i * 100;
    });

The get bucket with multiple nodes now runs in 172 ms as opposed to originally 2279ms.

CMCDragonkai · 2023-11-09T21:47:29Z

@amydevs the introduction of scopes wasn't done properly, as the tests in NodeGraph.test.ts was asserting that they were NodeAddress types when scopes now must be at the very least an empty array.

CMCDragonkai · 2023-11-09T23:57:04Z

In terms of bucket refresh, I think:

That is buckets are refreshed if no lookups have occurred.

This condition cannot be dependent on the lastUpdated value for each record in the bucket.

This is because lastUpdated just means the time the record was last updated (set or updated).

This doesn't tell us anything about whether a bucket has had a lookup occur in the last hour.

I haven't got around to checking NodeManager yet, but comments @tegefaulkes?

I think based on this, even unsuccessful node lookups is a valid bucket operation, in which case the bucket refresh delay should be refreshed.

Refresh buckets is already occurring in memory.

CMCDragonkai · 2023-11-10T16:34:14Z

In the NodeGraph.test.ts we can no longer use as NodeAddress, as that is breaking the real type, which now includes scopes. Beware of this @amydevs @tegefaulkes, I'm making the change here.

CMCDragonkai · 2023-11-10T16:35:30Z

Currently the NodeGraph still isn't capable of storing multiple addresses for the same node. I thought we were meant to have this as per #537? @amydevs

amydevs · 2023-11-10T16:37:52Z

Currently the NodeGraph still isn't capable of storing multiple addresses for the same node. I thought we were meant to have this as per #537? @amydevs

Yes, this is meant to be done by the Step 2 of #537 but #584 only completed Step 1. This was done for the sake of time, as it was deemed that expanding the NodeGraph required more consideration and many changes.

amydevs · 2023-11-10T16:38:52Z

In the NodeGraph.test.ts we can no longer use as NodeAddress, as that is breaking the real type, which now includes scopes. Beware of this @amydevs @tegefaulkes, I'm making the change here.

@CMCDragonkai where we're using as NodeAddress, i think we should be using satisfies NodeAddress instead.

CMCDragonkai · 2023-12-04T07:53:07Z

tegefaulkes · 2023-12-04T08:01:45Z

I squashed as much as I could. I left these as wip commits and just made sure the included a description as well.

test-bottom.ts

CMCDragonkai · 2023-12-04T08:05:12Z

The second WIP commit could be dropped. It adds scaffolding.

CMCDragonkai · 2023-12-04T08:06:00Z

Generally my NodeGraph changes should be put together.

Make sure no spurious comments are left there, I had a bunch of stuff relating to nodes/types.ts.

[ci skip]

small name and commentary updates to `NodeGraph` [ci skip]

[ci skip]

This includes creating a `connectionsQueue` utility class for coordinating rate limits and shared queues between the direct connection loop and the signalled connection loop. The two loops run concurrently while sharing found data between each other. When the connection is found, any pending connections are cancelled and awaited for clean up. [ci skip]

[ci skip]

…Node` loop Previously it used the timeout provided for the whole `findNode` operations. This means that a failing connection would take up the whole timeout. [ci skip]

[ci skip]

closest nodes timeout after 2 hours, furthest after 1 min. [ci skip]

[ci skip] wip: fixing delay names [ci skip]

Also added some basic multi-connection logic to `NodeConnectionManager` [ci skip]

[ci skip]

CMCDragonkai self-assigned this Nov 7, 2023

CMCDragonkai mentioned this pull request Nov 7, 2023

Decentralised NAT Punch-Through Simulation #372

Closed

2 tasks

CMCDragonkai mentioned this pull request Nov 8, 2023

Local Discovery Support with MDNS MatrixAI/Polykey-CLI#53

Merged

8 tasks

CMCDragonkai requested a review from tegefaulkes November 9, 2023 00:11

CMCDragonkai force-pushed the feature-decentralized-signalling branch from 34b15c5 to 81e9034 Compare November 9, 2023 16:36

CMCDragonkai force-pushed the feature-decentralized-signalling branch from 8a4b405 to 62c578d Compare November 10, 2023 16:09

tegefaulkes mentioned this pull request Dec 4, 2023

Complete NodeManager tests from decentralised nodes refactor #642

Closed

CMCDragonkai commented Dec 4, 2023

View reviewed changes

test-bottom.ts Outdated Show resolved Hide resolved

tegefaulkes mentioned this pull request Dec 4, 2023

replace deprecated testProp with test.prop #643

Closed

CMCDragonkai and others added 17 commits December 4, 2023 19:11

wip: refactoring NodeManager

6f9d7d8

[ci skip]

feat: refactored NodeConnectionManager

03cd4af

[ci skip]

wip: NodeGraph multi-address functionality

df580c7

[ci skip]

wip: expanding tests for NodeConnectionManager

274ba42

[ci skip]

wip: expanding NodeManager and findNode logic

be196d6

[ci skip]

feat: refactored NodeGraph

f88e281

[ci skip]

wip: small fixes and clean up

62a123e

small name and commentary updates to `NodeGraph` [ci skip]

wip: expanding NodeManager tests

82c058e

[ci skip]

wip: fixing up NodeManager nodesGraph related methods

ff41815

[ci skip]

wip: findNode now returns nodeContactAddressData

7c54aa1

[ci skip]

fix: using a shorter local timeout for each connection in the `find…

c55c39c

…Node` loop Previously it used the timeout provided for the whole `findNode` operations. This means that a failing connection would take up the whole timeout. [ci skip]

wip: cleaning out old nodes tests

81cc0b8

[ci skip]

feat: timeout time for idle connection scales with node closeness

2d058d7

closest nodes timeout after 2 hours, furthest after 1 min. [ci skip]

feat: connections are now periodically re-established

cf5741f

[ci skip] wip: fixing delay names [ci skip]

feat: re-implemented MDNS into NodeManager

ebe6377

Also added some basic multi-connection logic to `NodeConnectionManager` [ci skip]

tests: all nodes tests passing

b0772be

[ci skip]

tegefaulkes force-pushed the feature-decentralized-signalling branch from 4b2e2fb to 6eb23ac Compare December 4, 2023 08:11

tests: all tests passing

37ef800

[ci skip]

tegefaulkes force-pushed the feature-decentralized-signalling branch from 6eb23ac to 37ef800 Compare December 4, 2023 08:25

tegefaulkes merged commit 28f0001 into staging Dec 4, 2023

tegefaulkes mentioned this pull request Dec 13, 2023

Optimise NodeConnectionManager.getClosestGlobalNodes to concurrently contact nodes #448

Closed

2 tasks

CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decentralized Signalling #618

Decentralized Signalling #618

CMCDragonkai commented Nov 7, 2023 •

edited by tegefaulkes

Loading

ghost commented Nov 7, 2023 •

edited by ghost

Loading

Legend

CMCDragonkai commented Nov 7, 2023 •

edited

Loading

CMCDragonkai commented Nov 8, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023 •

edited

Loading

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023 •

edited

Loading

CMCDragonkai commented Nov 10, 2023

CMCDragonkai commented Nov 10, 2023

amydevs commented Nov 10, 2023

amydevs commented Nov 10, 2023

CMCDragonkai commented Dec 4, 2023

tegefaulkes commented Dec 4, 2023

CMCDragonkai commented Dec 4, 2023

CMCDragonkai commented Dec 4, 2023

Decentralized Signalling #618

Decentralized Signalling #618

Conversation

CMCDragonkai commented Nov 7, 2023 • edited by tegefaulkes Loading

Description

Issues Fixed

Tasks

Final checklist

ghost commented Nov 7, 2023 • edited by ghost Loading

Legend

CMCDragonkai commented Nov 7, 2023 • edited Loading

CMCDragonkai commented Nov 8, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023 • edited Loading

CMCDragonkai commented Nov 9, 2023

CMCDragonkai commented Nov 9, 2023 • edited Loading

CMCDragonkai commented Nov 10, 2023

CMCDragonkai commented Nov 10, 2023

amydevs commented Nov 10, 2023

amydevs commented Nov 10, 2023

CMCDragonkai commented Dec 4, 2023

tegefaulkes commented Dec 4, 2023

CMCDragonkai commented Dec 4, 2023

CMCDragonkai commented Dec 4, 2023

CMCDragonkai commented Nov 7, 2023 •

edited by tegefaulkes

Loading

ghost commented Nov 7, 2023 •

edited by ghost

Loading

CMCDragonkai commented Nov 7, 2023 •

edited

Loading

CMCDragonkai commented Nov 9, 2023 •

edited

Loading

CMCDragonkai commented Nov 9, 2023 •

edited

Loading