Sending large quantities of transactions slows the network down #3110

SkidanovAlex · 2020-08-07T22:23:22Z

We observed on the mocknet that sending more than 500 transactions per second slows the network down, and it starts processing fewer transactions than when it is not under such a heavy load.

The network performance must not be negatively affected by the number of transactions submitted.

…received block message (#3317) **Under heavy load (>10k tps sent by clients network-wide), the current Near node has not performed well (< 0.1 bps, ~100 tps processed, network-wide).** (See plot [here](https://drive.google.com/file/d/1xiaxvHSTJNKHjTYIztbwhFuOaFMbu1Mr/view?usp=sharing)) It turns out the largest contributor to this slowness was actually in the `Peer` actor. All messages received from a peer are funneled through the same actor (there is one such actor per peer), and peers will forward transactions to the next chunk producer in order to minimize the time until each transaction is first included in a block. However, when there are 10k transactions received per second across the network, this means all of them are pushed onto a single node and funneled through a small number of peer actors. As a result, there can be a long queue of messages waiting to be deserialized and dispatched to the appropriate handler. And there can be long runs of transaction messages in this queue, introducing a lot of latency in processing more "important" messages like blocks and their approvals. In this PR we fix this issue by identifying which messages are transactions _before_ deserializing and limiting the number of these message per block message received (across all peers). Messages in excess of this limit are simply dropped (never deserialized, never dispatched). **With this change, the network is able to maintain ~0.3 bps and ~600 tps throughput, even under heavy load.** (See plot below) ![tps_bps_with_network_throttling](https://user-images.githubusercontent.com/13788876/93106250-0a756780-f67e-11ea-9b15-708d1719d943.PNG) Fixes #3110 (sort of -- at least the network does not stall now) Notes: * This could lead to a bad UX in situations where the network is receiving more transactions than it can process because transactions will be randomly dropped. One solution to this could be to have a dedicated actor for deserializing and dispatching specifically transaction messages, though it should still have a finite queue to prevent dos attacks where many large transactions are used to run the node out of memory. * The current solution to detect when a message is a transaction without fully deserializing is a little hacky, and very brittle. It might be worth introducing this sort of type detection as a first class part of Borsh so that the logic would be automatically generated with the existing derive macros. * ~~In several places throughout the code base we are using `std::sync::RwLock`, whereas here I am making the choice to use `parking_lot::RwLock`. The reason for this choice is because of the [fairness guarantee](https://docs.rs/parking_lot/0.11.0/parking_lot/type.RwLock.html#fairness) on the `parking_lot` variant. Essentially, the `std` version does not make any guarantee about what thread will access the lock in the case of contention, and it could be that the thread which most recently held the lock will obtain it repeatedly. This is undesirable here because each peer actor runs in a separate thread and all of them should have equal access to the lock.~~ Replaced with atomic primitive. * Presently I am only detecting and limiting `ForwardTx` type message because `PeerMessage::Transaction` is never constructed or sent as far as I can tell. Should that message type be deprecated somehow?

SkidanovAlex added the A-chain Area: Chain, client & related label Aug 7, 2020

SkidanovAlex assigned birchmd Aug 7, 2020

weekly-digest bot mentioned this issue Aug 14, 2020

Weekly Digest (7 August, 2020 - 14 August, 2020) #3163

Closed

birchmd mentioned this issue Sep 14, 2020

Limit number of transaction messages deserialized by peer actors per received block message #3317

Merged

nearprotocol-bulldozer bot closed this as completed in #3317 Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending large quantities of transactions slows the network down #3110

Sending large quantities of transactions slows the network down #3110

SkidanovAlex commented Aug 7, 2020

Sending large quantities of transactions slows the network down #3110

Sending large quantities of transactions slows the network down #3110

Comments

SkidanovAlex commented Aug 7, 2020