-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[copp - NoPolicyTest] RX performance issue of ptf_nn_agent.py #308
Comments
ptf_nn_agent will send packet back from DUT to test server to count matched packets and insufficient socket buffer will cause packet drop on DUT Fix this issue by adding write socket buffer and nanomsg socket buffer on DUT. To add write socket buffer, add the following line in the file /etc/sysctl.conf on DUT
To add nanomsg socket buffer, add the options below in the ptf_nn_agent command on DUT
|
Thanks for the info! |
Hi, @maggiemsft Sorry to bother you. Need your help here. Do you have CoPP test experience and encounter the packet loss issue as we did? Thanks. |
Hi cytsai0409, I have some experience with CoPP test. I didn't write with a write buffer at all. Because I don't see any reason to change it, kernel is much faster in reading packets from the python test, so it's not a problem from my understanding I tested the CoPP test with TD2 and Mellanox Spectrum ASICs. Please let me know if you have more questions. Thanks |
Hi, Pavel: Thanks for your suggestions. |
Hi, Pavel: It works for setting read buffers to "--set-nn-rcv-buffer=109430400 --set-iface-rcv-buffer=109430400" and set net.core.rmem_max to 109430400 on our testbed. I have another question. We are using broadcom Tomahawk ASIC. Thanks for your help. ===================================================== |
Hi, Sure it's acceptable to set it to any value which helps your system to pass the test. We need this parameter because of the slowness of python. |
Hi, @pavel-shirshov About the read buffer "net.core.rmem_max", do we need to commit the code to increase its value to 109430400 on github? Or we just change its value in sysctl.conf when we are doing the CoPP test? Thanks. |
Hi Jason, It's up to you. If you can patch that value in your baseimage locally, without the repo change, it is the best way. If you can't do that you may create a PR to change the repo. Thanks |
Hi, Pavel: Ok, we will try to patch it locally. |
Close this issue since solution found. |
100K+ packets to CPU Ptf_nn_agent is not able to account for all the packets. Based on this thread sonic-net#308 I have increased both send/receive socket buffer both on Kernel and socket side. Issue is Seen on Broadom based Dell-6000 platform.
Hi,
I'm running the copp test on my box and found out there is large gap of the value of RX counter between CPU and ptf_nn_agent:
I'm using /proc/bcm/knet/stats to get the RX counter of CPU like following after test(Ex, DHCPTest):
root@switch2:/home/admin# cat /proc/bcm/knet/stats | grep "Rx0 packets"
Rx0 packets 100127 --> About 100K packets received by CPU
But the RX counter of ptf_nn_agent is only around 83K
2017-10-20 01:27:45 : DHCPTest
2017-10-20 01:28:20 :
2017-10-20 01:28:20 : Counters before the test:
2017-10-20 01:28:20 : If counter (0, n): (87, 0)
2017-10-20 01:28:20 : NN counter (0, n): (66637, 500002)
2017-10-20 01:28:20 : If counter (1, n): (2, 0)
2017-10-20 01:28:20 : NN counter (1, n): (2, 0)
2017-10-20 01:28:20 :
2017-10-20 01:28:20 : Counters after the test:
2017-10-20 01:28:20 : If counter (0, n): (87, 100000)
2017-10-20 01:28:20 : NN counter (0, n): (66637, 600002)
2017-10-20 01:28:20 : If counter (1, n): (83074, 0)
2017-10-20 01:28:20 : NN counter (1, n): (83074, 0)
2017-10-20 01:28:20 :
2017-10-20 01:28:20 : Sent through NN to local ptf_nn_agent: 100000
2017-10-20 01:28:20 : Sent through If to remote ptf_nn_agent: 100000
2017-10-20 01:28:20 : Recv from If on remote ptf_nn_agent: 83072
2017-10-20 01:28:20 : Recv from NN on from remote ptf_nn_agent: 83072
2017-10-20 01:28:20 :
2017-10-20 01:28:20 : test stats
2017-10-20 01:28:20 : Packet sent = 100000
2017-10-20 01:28:20 : Packet rcvd = 83072
2017-10-20 01:28:20 : Test time = 0:00:23.654488
2017-10-20 01:28:20 : TX PPS = 4227
2017-10-20 01:28:20 : RX PPS = 3511
2017-10-20 01:28:20 :
2017-10-20 01:28:20 : Checking constraints (NoPolicy):
2017-10-20 01:28:20 : rx_pps (3511) > NO_POLICER_LIMIT (840): True
2017-10-20 01:28:20 : total_rcv_pkt_cnt (83072) > pkt_rx_limit (90000): False
After doing some research, I added 1µs delay between each send packet:
for i in xrange(count):
testutils.send_packet(self, send_intf, packet)
time.sleep(1.0 / 1000000.0)
Also I got pass result when I rerun test test when above script modification:
2017-10-20 01:45:19 : DHCPTest
2017-10-20 01:46:07 :
2017-10-20 01:46:07 : Counters before the test:
2017-10-20 01:46:07 : If counter (0, n): (11, 0)
2017-10-20 01:46:07 : NN counter (0, n): (66685, 1100002)
2017-10-20 01:46:07 : If counter (1, n): (23, 0)
2017-10-20 01:46:07 : NN counter (1, n): (567847, 0)
2017-10-20 01:46:07 :
2017-10-20 01:46:07 : Counters after the test:
2017-10-20 01:46:07 : If counter (0, n): (15, 100000)
2017-10-20 01:46:07 : NN counter (0, n): (66689, 1200002)
2017-10-20 01:46:07 : If counter (1, n): (98760, 0)
2017-10-20 01:46:07 : NN counter (1, n): (666584, 0)
2017-10-20 01:46:07 :
2017-10-20 01:46:07 : Sent through NN to local ptf_nn_agent: 100000
2017-10-20 01:46:07 : Sent through If to remote ptf_nn_agent: 100000
2017-10-20 01:46:07 : Recv from If on remote ptf_nn_agent: 98737
2017-10-20 01:46:07 : Recv from NN on from remote ptf_nn_agent: 98737
2017-10-20 01:46:07 :
2017-10-20 01:46:07 : test stats
2017-10-20 01:46:07 : Packet sent = 100000
2017-10-20 01:46:07 : Packet rcvd = 98733
2017-10-20 01:46:07 : Test time = 0:00:34.352164
2017-10-20 01:46:07 : TX PPS = 2911
2017-10-20 01:46:07 : RX PPS = 2874
2017-10-20 01:46:07 :
2017-10-20 01:46:07 : Checking constraints (NoPolicy):
2017-10-20 01:46:07 : rx_pps (2874) > NO_POLICER_LIMIT (840): True
2017-10-20 01:46:07 : total_rcv_pkt_cnt (98733) > pkt_rx_limit (90000): True
Please refer the following as my testbed topology:
[testbed_server]---[fan-out switch]---[DUT]
PTF_host_node remote_node
172.20.200.202 172.20.192.94
Using CLI "python ptf_nn_agent.py --device-socket 0@tcp://172.20.192.94:10900 -i 0-3@Ethernet12&" to bring up the remote node by ptf_nn_agent.py on DUT.
And run the copp test with CLI 'ansible-playbook test_sonic.yml -i inventory --limit DUT --become --tags copp --extra-vars "ptf_host=172.20.200.202"' on testbed server.
Not sure there is anyone hit the same or similar situation as mine. Also please advice me if any, thanks.
Regards,
Kenie Liu
The text was updated successfully, but these errors were encountered: