-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decentralised NAT Punch-Through Simulation #372
Conversation
The script in The script will output the graph to JSON, however, to visualise it better you can use https://observablehq.com/@d3/force-directed-graph#ForceGraph - it will display the JSON file as a graph. |
I can corroborate that top-K always results in clustering. Proven with: import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';
type NodeGraph = Array<[number, NodeId, number, bigint]>;
// 1 byte node ids
function generateNodeIds(amount: number) {
if (amount < 0 || amount > 256) { throw new RangeError() };
const nodeIds: Array<NodeId> = Array.from(
{ length: amount },
(_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
);
return nodeIds;
}
function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
// index, node ID, bucket index, distance
const results: Array<[
number, NodeId, number, bigint
]> = [];
for (let i = 0; i < nodeIds.length; i++) {
if (nodeId.equals(nodeIds[i])) {
continue;
}
let bucketIndex;
let distance;
bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
results.push(
[
i,
nodeIds[i],
bucketIndex,
distance
]
);
}
return results;
}
function closestNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
const resultsSorted = [...nodeGraph].sort(([, , , distance1], [, , , distance2]) => {
if (distance1 < distance2) return -1;
if (distance1 > distance2) return 1;
return 0;
});
const closestK = resultsSorted.slice(0, limit);
return closestK;
}
async function main () {
const visitedNodes = new Set<number>();
const pendingNodes: Array<[number, NodeId]> = [];
const nodeIds = generateNodeIds(256);
const K = 128;
const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
const closestK1 = closestNodes(nodeGraph1, K);
for (const [index,nodeId] of closestK1) {
pendingNodes.push([index, nodeId]);
}
while (pendingNodes.length > 0) {
const [index, nodeId] = pendingNodes.shift() as [number, NodeId];
visitedNodes.add(index);
const nodeGraph = calculateNodeGraph(nodeIds, nodeId);
const closestK = closestNodes(nodeGraph, K);
for (const [index, nodeId] of closestK) {
if (!visitedNodes.has(index)) pendingNodes.push([index, nodeId]);
}
}
console.log(visitedNodes);
console.log(visitedNodes.size);
}
main(); Adjust the Assuming 1 byte NodeIds which means 8 bits and 256 possible NodeIds. At At
Basically this means top-K strategy can only ensure full connectivity if you connect at least half of all possible node IDs. |
Trying out "bottom-K" strategy. import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';
type NodeGraph = Array<[number, NodeId, number, bigint]>;
// 1 byte node ids
function generateNodeIds(amount: number) {
if (amount < 0 || amount > 256) { throw new RangeError() };
const nodeIds: Array<NodeId> = Array.from(
{ length: amount },
(_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
);
return nodeIds;
}
function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
// index, node ID, bucket index, distance
const results: Array<[
number, NodeId, number, bigint
]> = [];
for (let i = 0; i < nodeIds.length; i++) {
if (nodeId.equals(nodeIds[i])) {
continue;
}
let bucketIndex;
let distance;
bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
results.push(
[
i,
nodeIds[i],
bucketIndex,
distance
]
);
}
return results;
}
function farthestNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
const resultsSorted = [...nodeGraph].sort(([, , , distance1], [, , , distance2]) => {
if (distance1 < distance2) return 1;
if (distance1 > distance2) return -1;
return 0;
});
const closestK = resultsSorted.slice(0, limit);
return closestK;
}
async function main () {
const visitedNodes = new Set<number>();
const pendingNodes = new Set<number>();
const nodeIds = generateNodeIds(256);
const K = 65;
const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
const closestK1 = farthestNodes(nodeGraph1, K);
for (const [index,nodeId] of closestK1) {
pendingNodes.add(index);
}
while (pendingNodes.size > 0) {
const [index] = pendingNodes;
pendingNodes.delete(index);
visitedNodes.add(index);
const nodeGraph = calculateNodeGraph(nodeIds, nodeIds[index]);
const closestK = farthestNodes(nodeGraph, K);
for (const [index, nodeId] of closestK) {
if (!visitedNodes.has(index)) pendingNodes.add(index);
}
}
console.log(visitedNodes);
console.log(visitedNodes.size);
}
main(); Here you only need bottom-K of 65 to get full connectivity of 256. However bottom-K isn't very aligned with our kademlia system. |
Next thing to try out @emmacasolin would be a mix of top K and bottom K. @tegefaulkes also suggested random K, as in just choose a random selection of node IDs. Furthermore this is all in the ideal case where every node has the complete node graph and all node IDs are utilised. In production, nodes do not have the complete node graph, and not all node IDs are utilised, in fact node IDs are "used" at random. So we can add these constraints on top after figuring out what maintains connectivity in the ideal situation. |
Trying out the random-K strategy seems to work REALLY NICELY! import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';
type NodeGraph = Array<[number, NodeId, number, bigint]>;
// 1 byte node ids
function generateNodeIds(amount: number) {
if (amount < 0 || amount > 256) { throw new RangeError() };
const nodeIds: Array<NodeId> = Array.from(
{ length: amount },
(_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
);
return nodeIds;
}
function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
// index, node ID, bucket index, distance
const results: Array<[
number, NodeId, number, bigint
]> = [];
for (let i = 0; i < nodeIds.length; i++) {
if (nodeId.equals(nodeIds[i])) {
continue;
}
let bucketIndex;
let distance;
bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
results.push(
[
i,
nodeIds[i],
bucketIndex,
distance
]
);
}
return results;
}
function randomNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
const results: NodeGraph = [];
const usedJs = new Set<number>();
for (let i = 0; i < limit; i++) {
let j;
while (true) {
j = Math.floor(Math.random() * nodeGraph.length);
if (!usedJs.has(j)) break;
}
usedJs.add(j);
results.push(nodeGraph[j]);
}
return results;
}
async function main () {
const visitedNodes = new Set<number>();
const pendingNodes = new Set<number>();
const nodeIds = generateNodeIds(256);
const K = 6;
const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
const closestK1 = randomNodes(nodeGraph1, K);
for (const [index,nodeId] of closestK1) {
pendingNodes.add(index);
}
while (pendingNodes.size > 0) {
const [index] = pendingNodes;
pendingNodes.delete(index);
visitedNodes.add(index);
const nodeGraph = calculateNodeGraph(nodeIds, nodeIds[index]);
const closestK = randomNodes(nodeGraph, K);
for (const [index, nodeId] of closestK) {
if (!visitedNodes.has(index)) pendingNodes.add(index);
}
}
console.log(visitedNodes);
console.log(visitedNodes.size);
}
main(); Even with just This must be a statistical question. If every single person knew 6 random people in society, what is the probability that everybody knows everybody transitively? Someone has probably worked out a formula for this. |
Increasing the number of possible node IDs requires the random K number to be larger to ensure reduce the probability of clustering. With 2 byte node IDs, we now have 65536 possible node IDs. Here I find that top K of 10 is not enough to ensure full connectivity, but top K of 20 is quite enough. import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';
type NodeGraph = Array<[number, NodeId, number, bigint]>;
// 2 byte node ids
function generateNodeIds(amount: number) {
if (amount < 0 || amount > 65536) { throw new RangeError() };
const nodeIds: Array<NodeId> = Array.from(
{ length: amount },
(_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 2))
);
return nodeIds;
}
function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
// index, node ID, bucket index, distance
const results: Array<[
number, NodeId, number, bigint
]> = [];
for (let i = 0; i < nodeIds.length; i++) {
if (nodeId.equals(nodeIds[i])) {
continue;
}
let bucketIndex;
let distance;
bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
results.push(
[
i,
nodeIds[i],
bucketIndex,
distance
]
);
}
return results;
}
function randomNodes(nodeIds: Array<NodeId>, limit: number, ownNodeId: NodeId): Array<[number, NodeId]> {
const results: Array<[number, NodeId]> = [];
const usedJs = new Set<number>();
for (let i = 0; i < limit; i++) {
let j;
while (true) {
j = Math.floor(Math.random() * nodeIds.length);
if (nodeIds[j].equals(ownNodeId)) continue;
if (!usedJs.has(j)) break;
}
usedJs.add(j);
results.push([j, nodeIds[j]]);
}
return results;
}
async function main () {
const visitedNodes = new Set<number>();
const pendingNodes = new Set<number>();
const nodeIds = generateNodeIds(65536);
const K = 14;
const randomK1 = randomNodes(nodeIds, K, nodeIds[77]);
for (const [index] of randomK1) {
pendingNodes.add(index);
}
while (pendingNodes.size > 0) {
const [index] = pendingNodes;
pendingNodes.delete(index);
visitedNodes.add(index);
const randomK = randomNodes(nodeIds, K, nodeIds[index]);
for (const [index] of randomK) {
if (!visitedNodes.has(index)) pendingNodes.add(index);
}
}
console.log(visitedNodes.size);
}
main(); With 32 byte or 256 bit node IDs, this becomes even more significant. At this point simulation won't help. We will need to work analytically the probability relationship. Some resources: |
Moving forward with the "Random K" approach, the first step is to run some simulations to determine how many connections we need per node for different densities of nodes. We know that in a real deployment situation the chance of every node id being in use at one time is practically 0, so we need a solution that works for low densities of nodes but that can also be scaled as the Polykey network grows with more users. SimulationsFor all of these simulations, the node ids are set to 1 byte (i.e. there are 256 possible node ids). Each simulation was run 5 times and the results below are averages from these. The number of nodes are the row headings and the number of connections each node attempts to make are the column headings. The data is the average number of disconnected nodes from the main cluster. Note it's the number of attempted connections, since a node may try to connect to a node id that has not been assigned to a node, in which case that connection won't be made. The average number of successful connections per node for each simulation is included in the full data at the bottom. WIP of data so far
Full data from simulations: For each simulation, I've calculated the average number of (outgoing) connections each node holds as well as the rate of connectedness among the nodes (number of nodes - disconnected nodes / number of nodes). 100 Nodes, 2 Conns
Average conns per node = 0.8; Average connectedness = 0.58 100 Nodes, 3 Conns
Average conns per node = 1.3; Average connectedness = 0.9 100 Nodes, 4 Conns
Average conns per node = 1.5; Average connectedness = 0.95 100 Nodes, 5 Conns
Average conns per node = 1.9; Average connectedness = 0.98 100 Nodes, 6 Conns
Average conns per node = 2.4; Average connectedness = 1 (0.998) |
Part of the reason random nodes work so well is that majority of nodes that would exist in a complete NG would be in the farthest bucket. 50% of the node ID space would exist in the farthest bucket. So most of the time you're getting the farthest bucket connections. Actually random-K means likely 50% of that random-K will come from the farthest bucket, in this case bucket 7. See these are a few time's I rolled the random K:
Half of all nodes in such a NG would be located in bucket index 7 at a high distance. However for most nodes when asking for the 20 closest nodes to fill up their NG at the beginning would mostly fill up at the beginning nodes that are closest to them and make connections to them. Therefore selecting randomly here is not truely representative. It might be of the seed nodes which get connections from all possible nodes first, but nodes by themselves aren't filling up their node graph in a uniform way. So for random-K to work, would we argue that all nodes shouldn't be necessarily asking for closest nodes, but also random nodes to fill up and by asking for random nodes, we would necessarily end up getting farther nodes too? |
Description
Decentralised NAT requires all nodes to be part of a connected network, such that each node can always be reached by following connections across the network without the necessity of centralised seed nodes. We want to achieve this while minimising the number of active connections that need to be maintained across the network. We can prototype this through visualisations and simulations, such as
ngraph
(for creating graph structures) andd3
(for visualisation).Issues Fixed
Tasks
ngraph
/d3
, allowing for rapid prototyping