-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnect slowest peers #908
Comments
Synthesizing some initial thoughts: Factors to consider for dropping / ranking
How to measure factors
How to use factors for droppingI'd propose a multi-tiered system for both latency & throughput. Let me introduce some parameters:
The logic for dropping a peer is as follows:
The rationale for this is that we may want to drop a bad peer even if we have < For the sake of simplicity, though, we could start by only dropping peers when How to use factors for rankingCurrently, we use For the reasons mentioned in the "How to measure factors" section, I believe we shouldn't use However, we now have two factors in play: latency and throughput. How should we rank the peers along these two axes? This is something I would love input on, but broadly I get the sense that throughput is more important than latency. Other considerationsDropped peer blacklistWe should maintain a blacklist of peers that we have previously dropped so that if we re-discover them, we don't try to connect again. We could even consider persisting this blacklist when the node shuts down. This list should be implemented as a ring buffer of fixed size so that:
We could even play around with fancier structures like an LRU cache for this (haven't thought much about it) |
love this overview.
agree, we need a separate value here.
right, this should also depend on the number of connected peers.
we need to take into account that header requests are generally cheaper than blocks, fewer data to read from DB
we have this in |
Yeah, I think we should use the latency & throughput measurements defined above, but as far using them to sort peers I'm still not sure on how to combine the two metrics, very open to suggestions here,
Mm, good call - will look into the difference between "active" and "connected," but my reflex is that just using the number of connected peers is the right behavior. Since we're already connected with them, we will at some point try to communicate with them on a subprotocol, and then we'll be able to evaluate if they're slow/not and if they should be dropped.
Hmm, good point, but the pipeline structure may make this simpler for us. The throughput thresholds are relative to the current best peer, and at any given point, all the peers we're communicating with are all handling the same sorts of requests - i.e. either we're requesting headers from all peers, or we're requesting bodies from all peers, but not one from some and the other from the rest. So they'll all always be handling the same sort of request, and be within the same "throughput regime" if that makes sense. Unless... this breaks down with the new downloader component. Will we be requesting headers and bodies from peers simultaneously?
Ahh ok sick, my reflex here is to ban indefinitely if it was a "high-severity" disconnect and ban for some constant amount of time if it was a "low-severity" disconnect |
this is actually the same, should unify naming here or at least properly document. I chose
we also have this in the form of reth/crates/net/network/src/error.rs Lines 43 to 50 in eb11da8
|
Marking as blocked by #992 |
Have looked into the code a bit and specced this out a bit further: Consolidation into
|
Per discussion in #1051, throughput/latency should be evaluated w/in the context of the data requested. E.g., if some peer has been serving us full transactions, their latency will look abysmal compared to some peer that has been serving us transaction hashes. The current plan was to just compare throughput/latency to that of the best recorded values across all peers, but now I think it may be better to track best throughput/latency per request type. Lmk if this is overkill, it requires a lot more state per-peer. Maybe the lower latency in data-heavy requests will be offset by higher throughput, and we already have all the state we need to balance good peers serving small requests and good peers serving large requests |
100% agree on having different trackers per request type. By definition requesting a transaction by hash is going to be faster than requesting a block body. It's how geth does it as well in https://github.dev/ethereum/go-ethereum/blob/55f41d198c133d672cdba258cdca40a9cb23005c/p2p/msgrate/msgrate.go However, they only use this tracking system during sync, and not after. They also do not penalize peers for any timeouts, but they do keep track of who to best send a request to according to the trackers. To see this in action, check out https://github.dev/ethereum/go-ethereum/blob/master/eth/downloader/fetchers_concurrent.go |
Describe the feature
with #891 we aim to keep all peers busy with requests during sync.
Slow peers will be a bottleneck, and should get auto-disconnected if
TODO
NetworkState
Additional context
No response
The text was updated successfully, but these errors were encountered: