Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(webtransport): advertisement of webtransport multiaddrs seems to be broken #2568

Closed
SgtPooki opened this issue Sep 11, 2023 · 12 comments
Closed

Comments

@SgtPooki
Copy link
Member

When looking at https://probelab.io/ipfsdht/#kubo-version-distribution and https://probelab.io/ipfsdht/#dht-transport-distribution, we have the following distributions:

1582 (WT) / 31392 (all) = 5% WT
(26 (Kubo23) + 1300 (Kubo22) + 2574 (Kubo21) + 489 (Kubo20) + 389 (Kubo19)) = 4778 Kubo versions > 18
(4778 + 21068 + 2468 + 1677 + 234 + 480 + 7 + 3709) = 31953 Total Kubo Nodes
4778 / 31953 = 15% Kubo versions > 18

Seems like there is a discrepancy in amount of WT transports in the network (5%) compared to the number of versions (15%) that are supposed to be advertising WT successfully.

That 10% gap equates to roughly ~3139 nodes that should be advertising WT that aren't.

NOTE: The total kubo nodes and total transports are mismatched, meaning some of my math might be borked.

@aschmahmann
Copy link
Collaborator

aschmahmann commented Sep 12, 2023

A couple notes here:

  1. I wouldn't count Kubo19, many of the relevant fixes (e.g. correctly handle WebTransport addresses without certhashes #2239, identify: fix normalization of interface listen addresses #2250, Infer public webtransport addrs from quic-v1 addrs. #2251) are only post 0.27.1. Kubo v0.19.2 has go-libp2p 0.26.4 https://github.com/ipfs/kubo/blob/v0.19.2/go.mod#L70 while Kubo v0.20.0 has https://github.com/ipfs/kubo/blob/v0.20.0/go.mod#L48
  2. I would mostly be curious about peers that are already advertising public quic-v1 addresses as peers which should reasonably be advertising webtransport addresses. Within that group I'd be curious if the nodes not advertising public webtransport addresses are also not advertising private webtransport addresses (i.e. the transport is disabled) or if they're advertising private addresses (i.e. they for some reason don't think they're public despite Infer public webtransport addrs from quic-v1 addrs. #2251)
    • It looks like >2/3 of nodes support QUIC, but would check against the relevant kubo versions

My guess is this data is already being tracked by nebula, etc. which would hopefully make it relatively straightforward to investigate. Although @dennis-tra would know better.

Also, @SgtPooki if you've got any specific examples of nodes with kubo >= 20 advertising public quic-v1 but only private webtransport that'd be helpful to poke at.

@dennis-tra
Copy link
Contributor

Just some clarification:

1582 (WT) / (all) = 5% WT

These numbers are from a single crawl that happens ~midnight UTC each day. The number 31,392 is the total number of peers (online + offline) that we have found in the DHT. The 1,582 are the number of peers that advertise public WebTransport addresses. It gives precedence to the set of addresses as reported by the peer (if Nebula was able to connect) as opposed to the ones found in "peer records". This is because older peers just discard unknown codecs.

(26 (Kubo23) + 1300 (Kubo22) + 2574 (Kubo21) + 489 (Kubo20) + 389 (Kubo19)) = 4778 Kubo versions > 18

These numbers show unique PeerIDs for each agent version over the course of one full week (so not a single crawl). E.g., this means we have found 1,300 unique PeerIDs in the DHT running Kubo version 0.22.X within seven days of crawling. Naturally, this number will be higher than the number of unique peers with v0.22.x that we find in a single crawl.

(4778 + 21068 + 2468 + 1677 + 234 + 480 + 7 + 3709) = 31953 Total Kubo Nodes

Actually, that's 34,421 (the 2,468 were missing). So why is that number higher than 31,392 from above? Again, because these are cumulative values from a full week of crawls.


I would mostly be curious about peers that are already advertising public quic-v1 addresses as peers which should reasonably be advertising webtransport addresses. Within that group I'd be curious if the nodes not advertising public webtransport addresses are also not advertising private webtransport addresses (i.e. the transport is disabled) or if they're advertising private addresses (i.e. they for some reason don't think they're public despite

This is tracked and just requires some SQL wizardry. I'll come back here when I have the numbers 👍

@dennis-tra
Copy link
Contributor

I saw that I'm not storing private multiaddresses because I found that they'd pollute the database with uninteresting records. So, unfortunately, I don't have numbers on private addresses. It's just a two-line change. If you think it'll make sense also to track private addresses, I can change Nebula to also do that.

Still here are some numbers:

From the latest crawl:

crawled_peers dialable_peers undialable_peers
31,655 28,910 2,745
  • Dialable peers advertising quic-v1 addresses: 21,536
  • Dialable peers advertising webtransport addresses: 1,584

Webtransport agents:

agent_version count
kubo/0.21.0/brave 324
kubo/0.22.0/3f884d3/docker 163
kubo/0.22.0/ 146
kubo/0.22.0/desktop 135
kubo/0.20.0/ 122
kubo/0.21.0/ 101
kubo/0.20.0/b8c4725/docker 83
kubo/0.18.1/ 69
kubo/0.21.0/294db3e/docker 57
kubo/0.21.0/desktop 39
kubo/0.19.1/ 36
kubo/0.21.0/294db3e30 35
kubo/0.18.0/ 31
kubo/0.19.0/ 25
kubo/0.20.0/desktop 16
kubo/0.22.0/brave 15
kubo/0.22.0-dev/895963b95-dirty/docker 14
kubo/0.21.0/294db3e 11
kubo/0.19.2/ 10
kubo/0.18.1/675f8bd/docker 10

142 other


@SgtPooki
Copy link
Member Author

Thanks @dennis-tra

So it seems like this is still an issue because we're advertising 21k quic-v1 addresses, but only 1.5k webtransport addresses?

@SgtPooki
Copy link
Member Author

@dennis-tra

I saw that I'm not storing private multiaddresses because I found that they'd pollute the database with uninteresting records. So, unfortunately, I don't have numbers on private addresses. It's just a two-line change. If you think it'll make sense also to track private addresses, I can change Nebula to also do that.

Can we track only private webtransport addresses? I think that would give us a useful figure to compare against public webtransport & quic-v1 to answer Adin's Q:

if you've got any specific examples of nodes with kubo >= 20 advertising public quic-v1 but only private webtransport that'd be helpful to poke at.

@SgtPooki
Copy link
Member Author

I just booted up a kubo node on a digitalocean droplet and ran ipfs id and got the following output:

root@kubo-daemon:~# docker exec ipfs_host ipfs id
{
	"ID": "12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
	"PublicKey": "CAESIInsjSD2VZ/LKzhJ/b6Yg0jgaLe9dDxB1FfwpQEvUZ/D",
	"Addresses": [
		"/ip4/127.0.0.1/tcp/4001/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/127.0.0.1/udp/4001/quic-v1/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiAiPKE_44nP8AdIUlMf122E4LlL4rdHqdpf1RH3O-Qfig/certhash/uEiAzUdMEi1pvdPllW-zE34WkwmGmvXR_oLiGBVNgogReMg/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/127.0.0.1/udp/4001/quic/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/147.182.190.84/tcp/4001/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/147.182.190.84/udp/4001/quic-v1/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/147.182.190.84/udp/4001/quic-v1/webtransport/certhash/uEiAiPKE_44nP8AdIUlMf122E4LlL4rdHqdpf1RH3O-Qfig/certhash/uEiAzUdMEi1pvdPllW-zE34WkwmGmvXR_oLiGBVNgogReMg/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/147.182.190.84/udp/4001/quic/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/172.17.0.2/tcp/4001/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/172.17.0.2/udp/4001/quic-v1/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/172.17.0.2/udp/4001/quic-v1/webtransport/certhash/uEiAiPKE_44nP8AdIUlMf122E4LlL4rdHqdpf1RH3O-Qfig/certhash/uEiAzUdMEi1pvdPllW-zE34WkwmGmvXR_oLiGBVNgogReMg/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG",
		"/ip4/172.17.0.2/udp/4001/quic/p2p/12D3KooWK6mEKrQB4tXJ2wZLcLkkDtmXrsMN1vav5S7qezVM35sG"
	],
	"AgentVersion": "kubo/0.22.0/3f884d3/docker",
	"Protocols": [
		"/ipfs/bitswap",
		"/ipfs/bitswap/1.0.0",
		"/ipfs/bitswap/1.1.0",
		"/ipfs/bitswap/1.2.0",
		"/ipfs/id/1.0.0",
		"/ipfs/id/push/1.0.0",
		"/ipfs/kad/1.0.0",
		"/ipfs/lan/kad/1.0.0",
		"/ipfs/ping/1.0.0",
		"/libp2p/autonat/1.0.0",
		"/libp2p/circuit/relay/0.2.0/hop",
		"/libp2p/circuit/relay/0.2.0/stop",
		"/libp2p/dcutr",
		"/x/"
	]
}

which means that webtransport listening is working properly for ubuntu 22.04 (LTS) x64 via docker. I was able to dial the webtransport address from my local laptop kubo, but unable to dial it from https://helia-identify.on.fleek.co/

@SgtPooki
Copy link
Member Author

one interesting callout is that I can run kubo from the docker container and fail to connect to the webtransport address from helia-identify, but if i run the kubo daemon directly (downloaded kubo binary from dist.ipfs.tech), I can connect to it.

@dennis-tra
Copy link
Contributor

dennis-tra commented Sep 27, 2023

@SgtPooki could it be that UDP traffic isn't forwarded to the docker container? By specifying -p 1234:1234 one would only forward TCP traffic. -p 1234:1234/udp would also forward UDP traffic.

@sukunrt
Copy link
Member

sukunrt commented Oct 2, 2023

@dennis-tra do you have the breakdown of quic peers by agent? I'm specifically interested in the number of kubo v0.22 peers advertising a public quic-v1 address.

@aschmahmann
Copy link
Collaborator

aschmahmann commented Oct 3, 2023

IIUC to understand if there is a bug here/what bug exists it'd be great to get the numbers on the number of kubo >=v0.20 amino server nodes that are:

  1. advertising public quic-v1 and public webtransport addresses -> good
  2. advertising public quic-v1 and non-public webtransport addresses -> implies a bug
  3. advertising public quic-v1 and no webtransport addresses -> implies webtransport turned off

@dennis-tra is this easy enough to do with nebula, or a pain to modify/plumb through?

@dennis-tra
Copy link
Contributor

dennis-tra commented Oct 4, 2023

Sorry for taking so long to respond. Here are some numbers from the latest crawl:

General

  • Total Peers: 32,195
  • Reachable Peers: 29,562
  • Unreachable Peers: 2,633
  • Peers we could identify: 28,406 (that's the number of peers we know the agent version of)
  • Peers we could not identify: 3,789
  • Peers that advertised Multiaddresses: 32,023
    (less than the total number of peers - this could be because Nebula is filtering out private addresses. The missing peers could just have advertised private addresses - not sure).

@aschmahmann @SgtPooki

  • Peers with an agent version that starts with kubo/0.2*: 1,750
    Note that this graph shows way more kubo >= 0.20 peers. That's because in that graph we aggregate the number of distinct peers with that agent version over one week. Currently, that graph shows 4,726 Kubo 0.22 peers. That's way more because we're comparing a single snapshot (1,750) with all snapshots we've done over seven days. The same applies to this graph.
  • Peers with at least one quic-v1 address (which is not webtransport) + at least one webtransport address: 2,131
  • Peers with an agent version that starts with kubo/0.2* + at least one quic-v1 address (which is not webtransport) + at least one webtransport address: 1,608
  • Peers with an agent version that starts with kubo/0.2* + at least one quic-v1 address (which is not webtransport) + only private webtransport addresses: not possible because Nebula does not track private multi addresses :/
  • Peers with an agent version that starts with kubo/0.2* + at least one quic-v1 address (which is not webtransport) + no public webtransport address: 26

I could do a manual one-off run of the crawler where I enable the tracking of private addresses. However, given the number of peers with agent version >0.20 of 1,750 and the number of peers that look like they operate correctly 1,608 (92%), I would argue this won't give us much more insight. The numbers look fine to me.

@sukunrt

The number of peers with an agent version starting with kubo/0.22* and at least one quic-v1 address (not webtransport): 1,138

@SgtPooki
Copy link
Member Author

SgtPooki commented Oct 4, 2023

[...] the number of peers with agent version >0.20 of 1,750 and the number of peers that look like they operate correctly 1,608 (92%) [...]. The numbers look fine to me.

Thanks for diving into this, Dennis :)

@SgtPooki SgtPooki closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants