Authenticate the sender of a hole-punching signalling message #148

joshuakarp · 2021-05-26T08:27:37Z

Upon receiving a hole-punching message (agentService:sendHolePunchMessage()), there is no reasonable means of ensuring that the message has come from the node ID stated in the message. Verification of a provided signature requires the public key of the source node.

In the future, we plan to move to Ed25519 keys (once this is supported in node-forge). In this event, the node ID will be the public key (instead of the public key fingerprint, as we require from RSA keys being 4096 bits). As such, authentication will be trivial, as we can simply verify the node ID (as the public key) against the signature.

The text was updated successfully, but these errors were encountered:

joshuakarp · 2021-11-16T07:34:29Z

Currently, the only way I can securely envisage solving this is:

A wants to connect to B. It sends a signed hole-punch message with the target node of B
B receives the hole-punching message, seeing it's originally from A
B attempts to form its own connection to A, such that it can retrieve the public key of A (sends its own hole-punching message)

However, this creates a chicken or egg situation. A would also presumably want to verify that B is the one wanting to connect.

CMCDragonkai · 2022-06-14T01:51:11Z

@tegefaulkes @emmacasolin is this issue still necessary? I'm thinking that #322 was solved simply because of the TLS used for connections. Now if we receive a hole punching message, all we have to do is check that the message's from is equal to the connection's node ID. In fact one might suggest that the message itself doesn't even need a from node id, simply because we would always use the node id where the message's connection came from.

tegefaulkes · 2022-06-20T00:26:43Z

I'm not sure that we provide and check the certs of the sending side. Having a quick skim of the code I can see that we verify the cert chain within src/network/ConnectionForward.ts:192 but not in src/network/ConnectionReverse.ts.

So for this to be considered done we should be verifying the cert chain of the connecting agent within ConnectionReverse.ts

CMCDragonkai · 2023-04-24T06:46:51Z

When we introduce js-quic into the PK codebase, and rework our hole punching system in the network domain to use QUIC, we must review this issue to see if it is still relevant.

This will work as long as when the signalling node only signals between the target node and the source node, and it does not rely on what the internal message details about the source node. The message should only contain the target node.

CMCDragonkai · 2023-07-10T14:54:07Z

Signalling is done via RPC. Which means the request to signal is always authenticated point to point. However between 2 nodes and a central node doing the signalling transitively. The central node may send a message to the receiving node. The receiving node does need to check that the message is in fact signed by the original node.

@tegefaulkes please confirm that there's a transitive authenticity check during the signalling request. If that's done, we can close this.

tegefaulkes · 2023-07-11T00:21:40Z

The node is verified when connecting to another node yes, so the relaying node can verify the details. But we don't have any kind of signature in the message AFAIK, so the 3rd party node can't verify the details currently.

All we need to do is add a signature to the message and have the end node verify the contents. Keep in mind that the requesting node and relaying node both add details to the message. So both need to add a signature to the relay message depending how secure we want to be about it.

At a minimum we want to prevent a node making requests for other nodes to prevent a ad-hoc botnet DDOS attack.

CMCDragonkai · 2023-07-11T00:50:16Z

Yea ok nest the original message with the relayed message and sign it. This should also prevent repeat relaying too. Sounds like we should constrain relaying attempts.

CMCDragonkai · 2023-08-11T06:40:56Z

I added this into the 6th testnet deployment as during NAT testing, we should be able to add this in.

CMCDragonkai · 2023-08-14T02:15:03Z

Requirements:

A's message to S is { target: SIGNED<{ nodeId: B, ip: IP, port: Port }>}.
A's call to S is fire and forget.
S takes A's external IP and external port and combines it with the signed message. It sends to B { source: { nodeId: A, ip: IP, port: PORT }, target }.
S's call to B is fire and forget.
B checks the authenticity of the target. It takes source.nodeId and checks if it signed the target. It checks if source.nodeId is equal to target.nodeId.

Now there are some problems here.

If A and S are collaborating. A could create a message with an IP and port that is invalid. S could use the same IP and port that is invalid. S could then send this message to B. What could happen here? This can cause B to send packets to another IP and port, which wouldn't be very useful due to MTLS, but it could be used for DOS attacks.

To avoid DOS attacks we could do 2 additional things:

A, S and B don't put ip and port in their messages at all. Instead the only thing used is nodeId and A's signature. This ensures that B would use its own address book to find A's IP address and port. This relies on the idea that the NodeGraph of B would be up to date, or can easily ask the kademlia structure what A's address is.
Rate limiting: S and B can both rate limit requests using a token bucket approach correlated to each pair of NodeIds of source and target.

Perhaps both approaches can be used?

CMCDragonkai · 2023-08-14T02:58:52Z

I think the 2 additional things will be required to reduce the impact of DOS attacks.

Rate limiting has to be applied at B so it has 4 levels of protection:

Max dialing timeout
Exponential backoff
Node Connection coalescing
Token Bucket rate limiting

On top of this, IP and port are not part of the signalling messages. B will look up A's address the normal way in order to hole punch to A.

We can test this on the testnet to see if this works well, and what tradeoffs there are.

@amydevs this whole system can be visualised at the node explorer level, at a lower level simcity-style.

CMCDragonkai · 2023-08-14T03:58:34Z

Notes on how to build a token bucket limiting system: https://chat.openai.com/share/31d16b49-d95b-4f20-8f34-feeeb68741fb.

It's actually quite similar to the Semaphore and subsequently LockBox. You start with a token bucket, then you make a collection of token buckets keyed by a string.

CMCDragonkai · 2023-08-14T04:06:08Z

I also asked how to do this with TTLs that expire each token bucket. Final design here:

A priority structure like a priority queue could be used, but as you've noted, re-ordering keys when their TTL gets updated could make operations slower.

If we want a highly efficient approach that avoids iterating over all buckets every time and re-ordering keys, we should indeed consider storing individual setTimeout timers for each token bucket. Though there will be a setTimeout for each bucket, modern JavaScript engines like V8 are quite efficient at handling a large number of timers, especially if those timers are not too frequent.

Here's how this approach can be implemented:
class TokenBucket {
    // ... (rest of the code)

    creationTimestamp: number;

    constructor(protected capacity: number, protected refillRatePerMillisecond: number, public ttl: number) {
        this.tokens = capacity;
        this.lastRefillTimestamp = performanceNow();
        this.creationTimestamp = performanceNow();
    }
}

class RateLimiter {
    private tokenBuckets: Map<string, TokenBucket>;
    private expirationTimers: Map<string, ReturnType<typeof setTimeout>>;

    constructor(private defaultTTL: number = 60000) {  // default TTL 1 minute
        this.tokenBuckets = new Map();
        this.expirationTimers = new Map();
    }

    private scheduleExpiration(key: string, ttl: number) {
        const timer = setTimeout(() => {
            this.tokenBuckets.delete(key);
            this.expirationTimers.delete(key);
        }, ttl);

        this.expirationTimers.set(key, timer);
    }

    getBucket(key: string, ttl?: number): TokenBucket {
        let bucket = this.tokenBuckets.get(key);
        if (!bucket) {
            bucket = new TokenBucket(10000, 2, ttl || this.defaultTTL);  // Scaled values for example
            this.tokenBuckets.set(key, bucket);
            this.scheduleExpiration(key, bucket.ttl);
        }
        return bucket;
    }

    consume(key: string, tokensToConsume: number = 10000): boolean { // Scaled default value for example
        return this.getBucket(key).consume(tokensToConsume);
    }
}
In this version:

Each TokenBucket gets its expiration timer via scheduleExpiration.

There's a separate map, expirationTimers, that keeps track of each bucket's expiration timer.

When a timer fires, it will remove the bucket and its associated timer from the respective maps.

This approach should be quite efficient, even with a large number of buckets. However, if you expect to have a very large number of buckets, it might be worth testing this against the other approach to ensure it meets your specific performance and memory requirements.

CMCDragonkai · 2023-08-14T04:07:12Z

Could be done as a js-rate-limiter.

tegefaulkes · 2023-08-24T01:43:56Z

I've been thinking about this some more. If the relaying node is a bad actor then it can make anything up. So there's no point doing any authentication since the relay can just lie about it. Any signed message can be fabricated and challenge-response can be lied about as well. Basically, a relay node can pretend to be a relay for any number of made up requesting nodes. So any kind of authentication or message signing is pointless here.

Right now the request-ee node only does a single punch attempt at a time to a node. Right now it's keyed on ip:port tuple but that still allows multiple attempts to an address if you just change the port number. So I think we need to rate limit on the IP address as well.

Any rate limiting beyond that needs to work in a distributed way, and I'm not sure there's an easy solution to that? I don't think the token bucket addresses that.

The only thing I can think of that could works is make a single request mildly expensive to make with proof of work. So long as it's expensive to make a lot of requests to hole punch a single target then it should be fine. So a request can include a mildy expensive to generate nonce with it. This may have to be generated as a challenge so there needs to be some back a forth between the two nodes through the relay. Or, the punching node can track recently used Nonces and reject any replays.

In any case, we can think on this a little more. I don't want to implement anything we're not sure on and the authentication is not critical to testing and #551.

CMCDragonkai · 2023-08-24T03:00:01Z

The signature is from the original node. The bad actor relay node cannot make up the signature. The original request must always be from the original node and signed by the original node... Of course additional information is provided by the relay node and that is signed by the relay node.

CMCDragonkai · 2023-08-24T03:52:13Z

So to prevent amplification we are going to just use a single global token bucket for all signalling requests for nodes. This avoids the complexity of deciding to rate limit per connecting node, or per receiving node. So it just prevents overall amplification of any given signalling node.

General denial of service can only be done with proof of work. We can have a separate issue for PoW later in the future.

We still need a single TokenBucket, but prepare for potentially more than 1, by creating both the TokenBucket and RateLimiter classes, put it into the src/utils. It should be easily moved out to js-rate-limiter.

tegefaulkes · 2023-08-25T02:58:43Z

Ok, I've implemented a rate limiting system for signalling requests. Currently its a limit per requesting node to avoid denying signalling for other nodes. I'm all the config for rate limiting is hard coded right now. It's limited to 10 relay requests per second. This my need tuning.

tegefaulkes · 2023-08-25T02:59:50Z

Later we'll need to apply simple proof of work to really limit requests in a decentralised way. I'll make a new issue for this.

tegefaulkes · 2023-08-25T03:22:52Z

New issue at #556 for proof of work limiting.

CMCDragonkai · 2023-10-18T06:52:49Z

The signature is from the original node. The bad actor relay node cannot make up the signature. The original request must always be from the original node and signed by the original node... Of course additional information is provided by the relay node and that is signed by the relay node.

I think this should still be done as a matter of principle. It's also good for auditing.

* general refactor of the signalling protocol. * Added signatures and verification to the signalling requests and relay messages. #148

CMCDragonkai · 2023-10-24T19:05:58Z

Further discussion about robust rate limiting will be in #556.

joshuakarp mentioned this issue May 28, 2021

Replace node-forge RSA Keypair With ed25519/x25519 Keypair #168

Closed

CMCDragonkai assigned joshuakarp Jul 3, 2021

CMCDragonkai added this to the Polykey CLI 1.0.0 Release milestone Jul 3, 2021

CMCDragonkai added development Standard development security Security risk labels Jul 3, 2021

CMCDragonkai assigned emmacasolin and unassigned joshuakarp Mar 11, 2022

CMCDragonkai mentioned this issue Mar 11, 2022

Tests for NAT-Traversal and Hole-Punching #357

Closed

12 tasks

emmacasolin mentioned this issue Apr 19, 2022

Investigate means of authenticating a node ID when adding to NodeGraph #322

Closed

9 tasks

emmacasolin mentioned this issue Jun 9, 2022

Tests for NAT-Traversal and Hole-Punching #381

Merged

11 tasks

CMCDragonkai changed the title ~~Authenticate the sender of a hole-punching message~~ Authenticate the sender of a hole-punching signalling message Jun 14, 2022

CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 24, 2022

CMCDragonkai unassigned emmacasolin Aug 5, 2022

CMCDragonkai mentioned this issue May 15, 2023

Integration Tests for testnet.polykey.com MatrixAI/Polykey-CLI#71

Closed

14 tasks

tegefaulkes mentioned this issue Feb 1, 2023

Transport Agnostic RPC implementation #498

Merged

19 tasks

CMCDragonkai assigned tegefaulkes Jul 10, 2023

CMCDragonkai mentioned this issue Jul 24, 2023

js-quic integration and Agent migration #525

Merged

26 tasks

tegefaulkes mentioned this issue Aug 21, 2023

NAT Signalling (Hole Punching) should be Fire & Forget, Coalesced, and Secured with Signatures #555

Merged

10 tasks

tegefaulkes mentioned this issue Aug 25, 2023

Proof of work rate limiting for NAT signalling and relaying #556

Open

tegefaulkes added a commit that referenced this issue Oct 24, 2023

feat: refactored hole punch signalling procedure

9cd245a

* general refactor of the signalling protocol. * Added signatures and verification to the signalling requests and relay messages. #148

tegefaulkes added a commit that referenced this issue Oct 24, 2023

feat: refactored hole punch signalling procedure

6123329

* general refactor of the signalling protocol. * Added signatures and verification to the signalling requests and relay messages. #148

tegefaulkes closed this as completed in #555 Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Authenticate the sender of a hole-punching signalling message #148

Authenticate the sender of a hole-punching signalling message #148

joshuakarp commented May 26, 2021

joshuakarp commented Nov 16, 2021

CMCDragonkai commented Jun 14, 2022

tegefaulkes commented Jun 20, 2022

CMCDragonkai commented Apr 24, 2023

CMCDragonkai commented Jul 10, 2023

tegefaulkes commented Jul 11, 2023

CMCDragonkai commented Jul 11, 2023

CMCDragonkai commented Aug 11, 2023

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

tegefaulkes commented Aug 24, 2023

CMCDragonkai commented Aug 24, 2023

CMCDragonkai commented Aug 24, 2023

tegefaulkes commented Aug 25, 2023

tegefaulkes commented Aug 25, 2023

tegefaulkes commented Aug 25, 2023

CMCDragonkai commented Oct 18, 2023 •

edited

Loading

CMCDragonkai commented Oct 24, 2023 •

edited

Loading

Authenticate the sender of a hole-punching signalling message #148

Authenticate the sender of a hole-punching signalling message #148

Comments

joshuakarp commented May 26, 2021

joshuakarp commented Nov 16, 2021

CMCDragonkai commented Jun 14, 2022

tegefaulkes commented Jun 20, 2022

CMCDragonkai commented Apr 24, 2023

CMCDragonkai commented Jul 10, 2023

tegefaulkes commented Jul 11, 2023

CMCDragonkai commented Jul 11, 2023

CMCDragonkai commented Aug 11, 2023

CMCDragonkai commented Aug 14, 2023 • edited Loading

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

CMCDragonkai commented Aug 14, 2023

tegefaulkes commented Aug 24, 2023

CMCDragonkai commented Aug 24, 2023

CMCDragonkai commented Aug 24, 2023

tegefaulkes commented Aug 25, 2023

tegefaulkes commented Aug 25, 2023

tegefaulkes commented Aug 25, 2023

CMCDragonkai commented Oct 18, 2023 • edited Loading

CMCDragonkai commented Oct 24, 2023 • edited Loading

CMCDragonkai commented Aug 14, 2023 •

edited

Loading

CMCDragonkai commented Oct 18, 2023 •

edited

Loading

CMCDragonkai commented Oct 24, 2023 •

edited

Loading