Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet drop after dscp remark #304

Open
shizhenzhao opened this issue Apr 4, 2020 · 10 comments
Open

packet drop after dscp remark #304

shizhenzhao opened this issue Apr 4, 2020 · 10 comments

Comments

@shizhenzhao
Copy link

shizhenzhao commented Apr 4, 2020

I am simulating the following network. The link capacity of (h1 s1) and (h2 s2) are 30 Mbps, and the link capacity of all the other links are all 10 Mbps.
s1 lo: s1-eth1:h1-eth0 s1-eth2:s2-eth1 s1-eth3:s3-eth1 s1-eth4:s4-eth1
s2 lo: s2-eth1:s1-eth2 s2-eth2:h2-eth0 s2-eth3:s3-eth2 s2-eth4:s4-eth2
s3 lo: s3-eth1:s1-eth3 s3-eth2:s2-eth3
s4 lo: s4-eth1:s1-eth4 s4-eth2:s2-eth4
h1 h1-eth0:s1-eth1
h2 h2-eth0:s2-eth2

I use iperf to send udp traffic from h1 to h2:
h2 iperf -u -s -p 5566 -i 1 > server.log &
h1 iperf -u -c 10.0.0.2 -b 26M -p 5566 -t 5 --tos 0x08 > client.log

I want 8.5Mbps of the iperf traffic to go through the link (s1, s2), and split the rest of the traffic evenly between the path (s1, s4, s2) and (s1, s3, s2). We realized this routing mechanism using DSCP remark. Specifically, we set up a meter rule in s1 with rate=8500 and prec_level=1. And then, we forward the traffic with dscp=2 to the link (s1, s2), and split the traffic with dscp=4 between the path (s1, s4, s2) and (s1, s3, s2).

I was expecting no traffic loss in this case. However, iperf results showed about 27% packet loss. I checked the packet counters of s1/s2/s3/s4, and found the in packet num to s2/s3/s4 are smaller than the out packet num from s1.

My initial guess is that the switch or port buffer size is too small. I tried

  1. set the max_queue_size of a link as a large number: This only makes things worse, because not setting max_queue_size means infinite queue size.
  2. Increase N_PKT_BUFFERS from 256 to 2^12 in dp_buffers.c: I thought this may increase the switch buffer size. But unfortunately, it does not affect iperf results.

To reproduce this issue. Download and unzip test.zip first. (I am running the test in Ubuntu 16.04)
$ ryu-manager
$ sudo python exper3.py
$ sudo ./exper3.sh
Then, in mininet
$ h2 iperf -u -s -p 5566 -i 1 > server.log &
$ h1 iperf -u -c 10.0.0.2 -b 26M -p 5566 -t 5 --tos 0x08 > client.log
test.zip

@ederlf
Copy link
Collaborator

ederlf commented Apr 4, 2020

I don't know if you noticed, but your topology is a ring. If your application is flooding ARP traffic it is likely that the traffic is being replicated in a loop, causing a broadcast storm.

@shizhenzhao
Copy link
Author

I know it is a ring. So we set up static arp at hosts to avoid ARP flooding.

@ederlf
Copy link
Collaborator

ederlf commented Apr 4, 2020

From the first attempts:

  • I don't think it is a queue issue. If the split works, the load is less than the port capacity.
  • dp_buffers is not a port buffer, but for packets sent to the control plane.

I tried to reproduce but the iperf server simply does not output any result. So I cannot tell exactly what is the issue.

I have checked the link utilization with bwm-ng, and the distribution of traffic at s2 looks pretty close to what it should be.

bwm-ng -u bits -I s2-eth1,s2-eth3,s2-eth4

However, the traffic entering s1 and leaving s2, does not match.

bwm-ng -u bits -I s1-eth1,s2-eth2

This needs further investigation, but I'd ask you to try to execute the same test, but this time only from h1 to h2 directly through s2 without rate limiting. If the result is as expected, this might give a hint about where the problem is.

@shizhenzhao
Copy link
Author

Thanks for your response. In order to reproduce the issue, we need a controller running. In my case, I used ryu. I have updated the reproducing steps.

I have the observation that the traffic leaving s1-eth2 and entering s2-eth1 does not match. I also tried sending traffic from h1 to h2 without rate limiting. If we increase the bw of (s1, s2) to 30Mbps, then the traffic leaving s1-eth2 and entering s2-eth1 matches. If we keep (s1, s2) as 10Mbps, certainly there will be packet loss, because there is no sufficient capacity.

@ederlf
Copy link
Collaborator

ederlf commented Apr 4, 2020

Thanks for testing it.

It looks like they match all three flows in s2. But indeed, observing the packet count from s1 to s2, we can see that they differ. We can also see that the load balancing done by the group is perfect.

table="1", match="oxm{in_port="1", eth_dst="00:00:00:00:00:02", eth_type="0x800", ip_dscp="2"}", dur_s="222", dur_ns="708000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", **pkt_cnt="7564",** byte_cnt="11436768", insts=[apply{acts=[out{port="2"}]}]},
[{table="0", match="oxm{in_port="3", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="741000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", pkt_cnt="5424", byte_cnt="8201088", insts=[apply{acts=[out{port="2"}]}]},

{table="0", match="oxm{in_port="1", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="736000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", **pkt_cnt="3657**", byte_cnt="5529384", insts=[apply{acts=[out{port="2"}]}]},

{table="0", match="oxm{in_port="4", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="724000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", pkt_cnt="5425", byte_cnt="8202600", insts=[apply{acts=[out{port="2"}]}]},

So I'll look at what might be causing the reduced number of packets arrring in s2.

@shizhenzhao
Copy link
Author

shizhenzhao commented Apr 4, 2020

I checked the behavior of dscp remark with rate 8.5Mbps. The incoming rate is 26Mbps. In every second, the first 8.5Mbit of packets is forwarded to the link (s1,s2). This means that in the first 8.5/26 seconds, the incoming rate to (s1,s2) is actually 26Mbps. (The incoming rate to (s1, s2) in the rest of the second is 0.) Note that the link rate of (s1,s2) is just 10 Mbps. Will this cause packet loss?

@ederlf
Copy link
Collaborator

ederlf commented Apr 4, 2020

It is a possible reason. The rate of a flow is determined using the token bucket algorithm. So before the bucket overflows, it will not mark packets.

Therefore, all traffic is sent via s2 until the rate limit is detected by the switch.

@shizhenzhao
Copy link
Author

How frequently are the tokens being added to the bucket in ofsoftswitch? Is it one second? Can we increase the frequency? This may reduce the bustyness of the dscp remarked flow

@ederlf
Copy link
Collaborator

ederlf commented Apr 6, 2020

The tokens are added every 100ms. I have experimented with a smaller time.

Also, because marking actually happens when the bucket does not have available tokens, I tried to start with a full bucket, but no success.

@shizhenzhao
Copy link
Author

I conduct another two experiments:
1 I tried reducing the bandwidth and the iperf sending rate by 10X, and the packet loss disappears.
2 Based on the first experiments, I suspect that maybe ofsoftswitch does not have enough CPU to support high bandwidth. Then, I conduct the original experiment on a powerful server. Unfortunately, I see no improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants