Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using UDP Reverse Mode in localhost with high RTT netem settings #756

Closed
Pascalh2001 opened this issue Jun 18, 2018 · 5 comments

Comments

@Pascalh2001
Copy link

Hi ! I would like to submit the following bug :

  • Version of iperf3:
$ iperf3 --version
iperf 3.5+ (cJSON 1.5.2)
Linux lgrs34-pc-60 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication
~/iperf$ git rev-parse HEAD
1254e135fdc74cef7c11fe0bd5440a9b1046c484
  • Operating system (and distribution, if any):
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.4 LTS
Release:	16.04
Codename:	xenial

NOTE : iperf3 installed using git clone/.configure/make/make install
But I also had an issue with the xenial-available version, I can submit the log if required.

Bug Report

When using iperf3 in UDP reverse mode on localhost server with netem modifying RTT on interface to emulate a high RTT link.

  • Expected Behavior
    Same as when using low RTT measurements (see below).

  • Actual Behavior
    Client freeze after expected time (~15s) but server continue to proceed.
    Server then report an error (see below) 5s approximatively after the expected test termination time.
    After the server reported an error, the client reports an error too (see below).

  • Steps to Reproduce
    First :

$ sudo tc qdisc add dev lo root netem delay 0ms

Then, expected behavior :
Client :

$ sudo tc qdisc change dev lo root netem delay 0ms
$ iperf3 -t 15 -c localhost -p 5201 -R -u -b 0
Connecting to host localhost, port 5201
Reverse mode, remote host localhost is sending
[  5] local ::1 port 59991 connected to ::1 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  3.30 GBytes  28.3 Gbits/sec  0.010 ms  6492/168298 (3.9%)  
[  5]   1.00-2.00   sec  3.28 GBytes  28.2 Gbits/sec  0.008 ms  1612/162479 (0.99%)  
[  5]   2.00-3.00   sec  3.28 GBytes  28.2 Gbits/sec  0.002 ms  312/161344 (0.19%)  
[  5]   3.00-4.00   sec  3.32 GBytes  28.5 Gbits/sec  0.003 ms  502/163353 (0.31%)  
[  5]   4.00-5.00   sec  3.32 GBytes  28.5 Gbits/sec  0.013 ms  920/163665 (0.56%)  
[  5]   5.00-6.00   sec  3.35 GBytes  28.8 Gbits/sec  0.002 ms  620/164848 (0.38%)  
[  5]   6.00-7.00   sec  3.33 GBytes  28.6 Gbits/sec  0.028 ms  493/163742 (0.3%)  
[  5]   7.00-8.00   sec  3.31 GBytes  28.5 Gbits/sec  0.002 ms  909/163513 (0.56%)  
[  5]   8.00-9.00   sec  3.31 GBytes  28.4 Gbits/sec  0.005 ms  232/162616 (0.14%)  
[  5]   9.00-10.00  sec  3.33 GBytes  28.6 Gbits/sec  0.002 ms  309/163829 (0.19%)  
[  5]  10.00-11.00  sec  3.33 GBytes  28.6 Gbits/sec  0.018 ms  434/163830 (0.26%)  
[  5]  11.00-12.00  sec  3.32 GBytes  28.5 Gbits/sec  0.093 ms  732/163632 (0.45%)  
[  5]  12.00-13.00  sec  3.30 GBytes  28.4 Gbits/sec  0.006 ms  1324/163412 (0.81%)  
[  5]  13.00-14.00  sec  3.31 GBytes  28.5 Gbits/sec  0.002 ms  229/162720 (0.14%)  
[  5]  14.00-15.00  sec  3.33 GBytes  28.6 Gbits/sec  0.014 ms  442/164042 (0.27%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-15.04  sec  50.1 GBytes  28.6 Gbits/sec  0.000 ms  0/2455330 (0%)  sender
[  5]   0.00-15.00  sec  49.7 GBytes  28.5 Gbits/sec  0.014 ms  15562/2455323 (0.63%)  receiver

iperf Done.

Server :

$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from ::1, port 54266
[  5] local ::1 port 5201 connected to ::1 port 59991
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  3.31 GBytes  28.4 Gbits/sec  162420  
[  5]   1.00-2.00   sec  3.33 GBytes  28.6 Gbits/sec  163120  
[  5]   2.00-3.00   sec  3.29 GBytes  28.2 Gbits/sec  161290  
[  5]   3.00-4.00   sec  3.32 GBytes  28.5 Gbits/sec  162760  
[  5]   4.00-5.00   sec  3.34 GBytes  28.7 Gbits/sec  163680  
[  5]   5.00-6.00   sec  3.36 GBytes  28.9 Gbits/sec  164870  
[  5]   6.00-7.00   sec  3.34 GBytes  28.7 Gbits/sec  163700  
[  5]   7.00-8.00   sec  3.35 GBytes  28.7 Gbits/sec  164160  
[  5]   8.00-9.00   sec  3.31 GBytes  28.4 Gbits/sec  162220  
[  5]   9.00-10.00  sec  3.34 GBytes  28.7 Gbits/sec  163790  
[  5]  10.00-11.00  sec  3.34 GBytes  28.7 Gbits/sec  163630  
[  5]  11.00-12.00  sec  3.34 GBytes  28.7 Gbits/sec  163970  
[  5]  12.00-13.00  sec  3.33 GBytes  28.6 Gbits/sec  163200  
[  5]  13.00-14.00  sec  3.32 GBytes  28.5 Gbits/sec  162840  
[  5]  14.00-15.00  sec  3.34 GBytes  28.7 Gbits/sec  163830  
[  5]  15.00-15.04  sec   122 MBytes  29.1 Gbits/sec  5850  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-15.04  sec  50.1 GBytes  28.6 Gbits/sec  0.000 ms  0/2455330 (0%)  sender
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

Observed behavior :
Client :

$ sudo tc qdisc change dev lo root netem delay 250ms
$ iperf3 -t 15 -c localhost -p 5201 -R -u -b 0
Connecting to host localhost, port 5201
Reverse mode, remote host localhost is sending
[  5] local ::1 port 60837 connected to ::1 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  83.1 MBytes   697 Mbits/sec  0.003 ms  246064/250055 (98%)  
[  5]   1.00-2.00   sec  83.2 MBytes   698 Mbits/sec  0.002 ms  233014/237009 (98%)  
[  5]   2.00-3.00   sec  83.2 MBytes   698 Mbits/sec  0.002 ms  241550/245543 (98%)  
[  5]   3.00-4.00   sec  83.2 MBytes   698 Mbits/sec  0.002 ms  241263/245258 (98%)  
[  5]   4.00-5.00   sec  83.1 MBytes   697 Mbits/sec  0.003 ms  237618/241606 (98%)  
[  5]   5.00-6.00   sec  83.1 MBytes   697 Mbits/sec  0.002 ms  241153/245142 (98%)  
[  5]   6.00-7.00   sec  83.0 MBytes   696 Mbits/sec  0.003 ms  236932/240914 (98%)  
[  5]   7.00-8.00   sec  83.0 MBytes   697 Mbits/sec  0.002 ms  236338/240324 (98%)  
[  5]   8.00-9.00   sec  82.9 MBytes   696 Mbits/sec  0.002 ms  239651/243631 (98%)  
[  5]   9.00-10.00  sec  83.0 MBytes   696 Mbits/sec  0.003 ms  237388/241370 (98%)  
[  5]  10.00-11.00  sec  82.6 MBytes   693 Mbits/sec  0.002 ms  238092/242057 (98%)  
[  5]  11.00-12.00  sec  82.9 MBytes   695 Mbits/sec  0.002 ms  243856/247834 (98%)  
[  5]  12.00-13.00  sec  82.9 MBytes   696 Mbits/sec  0.002 ms  240720/244701 (98%)  
[  5]  13.00-14.00  sec  82.8 MBytes   695 Mbits/sec  0.002 ms  242165/246141 (98%)  
<NOTE : Here the client freezes until the server reports the error (see below), and then prints:>
iperf3: error - unable to receive control message: Connection reset by peer

Server :

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from ::1, port 54274
[  5] local ::1 port 5201 connected to ::1 port 60837
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  4.94 GBytes  42.4 Gbits/sec  242640  
[  5]   1.00-2.00   sec  4.95 GBytes  42.5 Gbits/sec  243240  
[  5]   2.00-3.00   sec  5.00 GBytes  42.9 Gbits/sec  245520  
[  5]   3.00-4.00   sec  4.99 GBytes  42.9 Gbits/sec  245290  
[  5]   4.00-5.00   sec  4.92 GBytes  42.2 Gbits/sec  241610  
[  5]   5.00-6.00   sec  4.99 GBytes  42.8 Gbits/sec  245120  
[  5]   6.00-7.00   sec  4.90 GBytes  42.1 Gbits/sec  240920  
[  5]   7.00-8.00   sec  4.89 GBytes  42.0 Gbits/sec  240220  
[  5]   8.00-9.00   sec  4.96 GBytes  42.6 Gbits/sec  243690  
[  5]   9.00-10.00  sec  4.91 GBytes  42.2 Gbits/sec  241320  
[  5]  10.00-11.00  sec  4.93 GBytes  42.3 Gbits/sec  242080  
[  5]  11.00-12.00  sec  5.04 GBytes  43.3 Gbits/sec  247780  
[  5]  12.00-13.00  sec  4.98 GBytes  42.8 Gbits/sec  244710  
[  5]  13.00-14.00  sec  5.01 GBytes  43.0 Gbits/sec  246160  
[  5]  14.00-15.00  sec  5.02 GBytes  43.1 Gbits/sec  246690  
[  5]  15.00-16.00  sec  4.97 GBytes  42.7 Gbits/sec  244220  
[  5]  16.00-17.00  sec  4.89 GBytes  42.0 Gbits/sec  240260  
[  5]  17.00-18.00  sec  4.89 GBytes  42.0 Gbits/sec  240120  
[  5]  18.00-19.00  sec  4.87 GBytes  41.9 Gbits/sec  239510  
iperf3: error - select failed: Bad file descriptor
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

Cheers,
Pascal

@esqt
Copy link

esqt commented Jun 19, 2018

This is what could be happening:

Netem stores the transmitted packets in a queue in order to simulate delay on the link. Because you didn't limit the throughput, packets are dropped when the queue is full (you store packets for 250 ms at a rate of 40 Gb/s, which is a queue length of 10 Gb i.e. more than 1 GB). Also, you are using a loopback interface, so the packets from the server to the client and the packets from the client to the server are stored in the same queue. What happens in the end is that the TCP connection used to transmit signaling information between the client and the host are dropped.

What you should do is to limit the throughput of the link in order to avoid dropping packets from the netem queue. An other option could be to increase the netem queue size but I don't know if it is possible (especially for a 10 Gb queue).

You could also try to use two separate computers rather than a loopback interface (so you would have two separate queues). At least the packets would only be dropped on one direction, maybe you would have a different behavior.

@Pascalh2001
Copy link
Author

Thank you for your feedback !
I see what you mean.
So the Client doesn't tell the Server the test duration prior to the test but instead it informs the server whenever it wants to stop receiving packets and this termination packet might be dropped because of netem.
Actually the loopback was used for testing.
I'm going to use iperf3 on real high BDP links and because it's pretty expensive, I needed to test first, but apparently this has side effects ^^
Thanks again
Pascal

@esqt
Copy link

esqt commented Jun 19, 2018

I agree that this is some kind of bug because the server should stop sending data after 15 seconds (I don't see why signalling is required to end the connection since the duration is specified at the beginning). This is somehow related to issue #753.
There should be no problems with high BDP links as long as no packets are dropped due to queue overflows (as long as the latency is lower than 5 seconds, which is the timeout value of iperf3).

@bmah888
Copy link
Contributor

bmah888 commented Jun 22, 2018

@AuNomDuLys: You basically identified the problem, thanks for helping out.

@Pascalh2001: Not sure what you expect iperf3 to do in this case. If you're going to test with -b 0 you are basically telling the sender to send as fast as it can go. This will cause queueing at the bottleneck link along the path (I guess it'd be the interface queue since you're just testing to localhost). Can we take a step back and see what it is you are trying to measure in the first place?

@bmah888
Copy link
Contributor

bmah888 commented Jun 26, 2020

Closing this bug for now. Actually solving this problem requires a considerable amount of redesign around how --reverse works (there are other, related problems). We should probably have an issue for dealing with that.

@bmah888 bmah888 closed this as completed Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants