You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been playing a bit around with latency injection and measuring throughput.
The setup is probably slightly broken and needs some more tuning, but it still already showed some surprising results.
Here's the measured throughput in the bulk benchmark for downloading 100MB of data when a given delay is injected for both directions (total RTT is 2 times that delay).
Delay
Windows
Linux
0ms
117MB/s
454MB/s
1ms
4MB/s
55MB/s
2ms
3.7MB/s
35MB/s
10ms
5.8MB/s
30MB/s
50ms
3.3MB/s
10MB/s
200ms
2.13MB/s
2.62MB/s
The variants with extra latency are not CPU bound - the library simply doesn't want to send data faster. If I run them for longer, the average throughput actually increases, which indicates the congestion controller is still raising the congestion window. This is also confirmed by stats.
E.g. for a 10ms delay
Delay
Windows 100MB
Windows 200MB
Linux 100MB
Linux 200MB
10ms
3.9MB/s
5.18MB/s
31MB/s
30MB/s
Changing congestion control to BBR makes it ramp up faster and get better numbers, but it still isn't great.
I'm not fully sure what causes the degradation even on Linux not the CPU bound, but on windows I noticed the following:
When 1ms latency (2ms RTT) is injected, the stats show a much higher RTT:
A bit more digging showed the extra latency is introduced by tokio timer precision (tokio-rs/tokio#5021). That causes the network simulation to forward packets later than intended - which would be a simulation-only issue. However the library should still compensate for the higher RTT by trying to increase the congestion window even more. It seems like won't do that due to pacing: With pacing, the full congestion window isn't used at once - instead packets are sent out in 2ms intervals, and being spaced out by timers. When the associated timer makes 16ms out of that 2ms, most of the congestion window isn't used. And it might not even be increased due to being deemed app-limited (not sure).
I tried disabling pacing, and indeed it increases throughput
Delay
Windows
Linux (default socket buffers)
Linux (2MB socket buffers)
10ms
30MB/s
5.8MB/s
48.5 MB/s
So the lack of timer precision in combination with pacing indeed limits throughput. Since the simulation and the impact of timer precision on that one further impacts results, it would however be nice to verify this in a real deployment.
I assume in a real world deployment where the peer has good pacing and acknowledges packets more often, the difference would be less strong since the endpoint is also woken up by packets from the peer instead of just from timers.
The text was updated successfully, but these errors were encountered:
I hacked up a higher precision timer for the network simulation (using a background thread and https://crates.io/crates/spin_sleep/1.1.1). This gets the windows version from 5MB/s to 50MB/s at 10ms delay - just like the Linux version. Both with pacing.
Unfortunately the better simulation wakes up the endpoint often enough that pacing accidentally also kicks in with higher precision. So this setup doesn't show yet what the impact of missed pacing timers on end users is. But I assume it's would be around the same degradation - towards 5MB/s. And less if less data is transmitted.
I've been playing a bit around with latency injection and measuring throughput.
The setup is probably slightly broken and needs some more tuning, but it still already showed some surprising results.
Here's the measured throughput in the bulk benchmark for downloading 100MB of data when a given delay is injected for both directions (total RTT is 2 times that delay).
The variants with extra latency are not CPU bound - the library simply doesn't want to send data faster. If I run them for longer, the average throughput actually increases, which indicates the congestion controller is still raising the congestion window. This is also confirmed by stats.
E.g. for a 10ms delay
Changing congestion control to BBR makes it ramp up faster and get better numbers, but it still isn't great.
I'm not fully sure what causes the degradation even on Linux not the CPU bound, but on windows I noticed the following:
When 1ms latency (2ms RTT) is injected, the stats show a much higher RTT:
So besides the 2ms latency we wanted to have, we actually get 30ms latency extra.
Compared on Linux:
There's 2ms extra latency.
A bit more digging showed the extra latency is introduced by tokio timer precision (tokio-rs/tokio#5021). That causes the network simulation to forward packets later than intended - which would be a simulation-only issue. However the library should still compensate for the higher RTT by trying to increase the congestion window even more. It seems like won't do that due to pacing: With pacing, the full congestion window isn't used at once - instead packets are sent out in 2ms intervals, and being spaced out by timers. When the associated timer makes 16ms out of that 2ms, most of the congestion window isn't used. And it might not even be increased due to being deemed app-limited (not sure).
I tried disabling pacing, and indeed it increases throughput
So the lack of timer precision in combination with pacing indeed limits throughput. Since the simulation and the impact of timer precision on that one further impacts results, it would however be nice to verify this in a real deployment.
I assume in a real world deployment where the peer has good pacing and acknowledges packets more often, the difference would be less strong since the endpoint is also woken up by packets from the peer instead of just from timers.
The text was updated successfully, but these errors were encountered: