Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After using the new version, there are errors with Intel's 82599 and E810 network cards, while Mellanox network cards are functioning normally. #321

Open
wangjun0728 opened this issue Mar 4, 2024 · 79 comments

Comments

@wangjun0728
Copy link

The DPDK version is 22.11. Currently, it appears that the DPDK errors are occurring due to the new version's checksum offload. Mellanox network cards seem to be operating normally. However, both E810 and 82599 network cards are displaying different error messages.

E810:
{bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 22.11.1 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}

error:
2024-03-04T10:57:01.102Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-04T10:57:01.105Z|00019|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-04T10:57:01.113Z|00020|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-04T10:57:01.167Z|00021|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-04T10:57:01.278Z|00022|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-04T10:57:01.599Z|00023|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

82599:
{bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 22.11.1 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"

error:
2024-03-04T11:04:52.740Z|00384|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-04T11:04:54.449Z|00385|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-04T11:04:55.492Z|00386|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-04T11:04:55.592Z|00387|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-04T11:04:56.644Z|00388|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

mellanox:
{bus_info="bus_name=pci, vendor_id=15b3, device_id=1017", driver_name=mlx5_pci, if_descr="DPDK 22.11.1 mlx5_pci", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="128", max_rx_pktlen="1618", max_rx_queues="1024", max_tx_queues="1024", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="32", n_rxq="2", n_txq="5", numa_id="3", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="false", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}

@igsilya
Copy link
Member

igsilya commented Mar 4, 2024

Hi , @wangjun0728 .

if_descr="DPDK 22.11.1 net_ice"

Please, try the newer 22.11. There are numerous fixes in drivers between 22.11.1 and 22.11.4.

@wangjun0728
Copy link
Author

Hi @igsilya ,I attempted to update DPDK to version 22.11.4, but the same error persists.

E810
{bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 22.11.4 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}
error:
2024-03-05T02:12:53.092Z|00050|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:12:55.112Z|00051|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:13:08.027Z|00052|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:13:14.458Z|00478|connmgr|INFO|br-int<->unix#3: 5 flow_mods 18 s ago (3 adds, 2 deletes) 2024-03-05T02:13:38.871Z|00053|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:14:39.946Z|00054|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event 2024-03-05T02:15:05.262Z|00055|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

82599:
{bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 22.11.4 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}
error:
2024-03-05T02:16:29.189Z|00414|netdev_dpdk|WARN|Dropped 1 log messages in last 29 seconds (most recently, 29 seconds ago) due to excessive rate 2024-03-05T02:16:29.189Z|00415|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/2 are valid: Operation not supported 2024-03-05T02:17:00.568Z|00023|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:00.568Z|00024|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:05.573Z|00025|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:05.573Z|00026|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:10.578Z|00027|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:20.589Z|00028|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T02:17:20.589Z|00029|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:35.604Z|00030|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 5 log messages in last 15 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T02:17:35.604Z|00031|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T02:17:44.560Z|00416|netdev_dpdk|WARN|Dropped 4 log messages in last 9 seconds (most recently, 3 seconds ago) due to excessive rate 2024-03-05T02:17:44.560Z|00417|netdev_dpdk|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

@igsilya
Copy link
Member

igsilya commented Mar 5, 2024

OK. I don't really know what could be wrong with an ice driver and I don't have any hardware to test with. The only suggestion here will be to try and update the firmware on the card in case you're not using the latest version.

For the other driver we can try to debug that, but we need to know how these invalid packets look like.
I prepared a small patch that would dump the invalid packets to the OVS log here: igsilya/ovs@3c34e86
Could you try it in your setup? You'll need to enable debug logging for netdev_dpdk module in order to see the dump.
The output should look something like this:

2024-03-05T14:18:48.161Z|00012|netdev_dpdk(pmd-c03/id:8)|DBG|ovs-p1: Invalid packet:
dump mbuf at 0x1180bce140, iova=0x2cb7ce400, buf_len=2176
  pkt_len=90, ol_flags=0x2, nb_segs=1, port=65535, ptype=0
  segment at 0x1180bce140, data=0x1180bce580, len=90, off=384, refcnt=1
  Dump data at [0x1180bce580], len=64
00000000: 33 33 00 00 00 16 AA 27 91 F9 4D 96 86 DD 60 00 | 33.....'..M...`.
00000010: 00 00 00 24 00 01 00 00 00 00 00 00 00 00 00 00 | ...$............
00000020: 00 00 00 00 00 00 FF 02 00 00 00 00 00 00 00 00 | ................
00000030: 00 00 00 00 00 16 3A 00 05 02 00 00 01 00 8F 00 | ......:.........

Also, what OVS version are you using? Maybe worth trying to update to the latest stable releases if you're not using them already.

@wangjun0728
Copy link
Author

Hi, @igsilya thank you very much for your reply. The output log after I tried using your patch is as follows:
2024-03-05T15:42:58.817Z|00012|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192764ec0, iova=0x192765180, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192764ec0, data=0x1927651c2, len=144, off=66, refcnt=1 Dump data at [0x1927651c2], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2........... 2024-03-05T15:42:58.817Z|00013|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192764ec0, iova=0x192765180, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192764ec0, data=0x1927651c2, len=144, off=66, refcnt=1 Dump data at [0x1927651c2], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2........... 2024-03-05T15:43:03.823Z|00014|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192775d00, iova=0x192775fc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192775d00, data=0x192776002, len=144, off=66, refcnt=1 Dump data at [0x192776002], len=64 00000000: 6C FE 54 2F 0D C0 68 91 D0 65 C6 C3 81 00 00 5C | l.T/..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 04 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 39 8A 56 17 C1 00 6A D6 C0 02 40 | &8..&9.V...j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 04 0A 35 | eX..2..........5 2024-03-05T15:43:03.823Z|00015|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192775d00, iova=0x192775fc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192775d00, data=0x192776002, len=144, off=66, refcnt=1 Dump data at [0x192776002], len=64 00000000: 6C FE 54 2F 0D C0 68 91 D0 65 C6 C3 81 00 00 5C | l.T/..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 04 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 39 8A 56 17 C1 00 6A D6 C0 02 40 | &8..&9.V...j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 04 0A 35 | eX..2..........5 2024-03-05T15:43:08.828Z|00016|netdev_dpdk(pmd-c02/id:87)|WARN|Dropped 3 log messages in last 10 seconds (most recently, 5 seconds ago) due to excessive rate 2024-03-05T15:43:08.828Z|00017|netdev_dpdk(pmd-c02/id:87)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported 2024-03-05T15:43:08.828Z|00018|netdev_dpdk(pmd-c02/id:87)|DBG|tun_port_p1: Invalid packet: dump mbuf at 0x192781900, iova=0x192781bc0, buf_len=2176 pkt_len=144, ol_flags=0x800800000000182, nb_segs=1, port=65535, ptype=0 segment at 0x192781900, data=0x192781c02, len=144, off=66, refcnt=1 Dump data at [0x192781c02], len=64 00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\ 00000010: 08 00 45 00 00 7E 00 00 40 00 40 11 D8 07 0A FD | ..E..~..@.@..... 00000020: 26 38 0A FD 26 36 C6 E1 17 C1 00 6A AD 0D 02 40 | &8..&6.....j...@ 00000030: 65 58 00 00 32 00 01 02 80 01 00 02 00 03 0E 9C | eX..2...........

Additionally, the OVS version I'm using is 2.17.5lts. However, now when I debug after merging the changes related to checksum and TSO, I encounter this issue. It was fine before the merge, and the main changes merged are as follows. However, it's not easy for me to fully upgrade OVS because I rely on the version of OVN.
https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=82705&state=3&q=&archive=both&delegate=

@igsilya
Copy link
Member

igsilya commented Mar 5, 2024

However, it's not easy for me to fully upgrade OVS because I rely on the version of OVN.

This should not be a problem. You should be able to upgrade OVS and OVN should still work just fine. The version of OVS you build OVN with and the one that you're using in runtime don't need to be the same. There is a build time dependency, because OVN is using some of the OVS libraries, but there is no runtime dependency because communication between OVS and OVN is happening over OpenFlow or OVSDB, which are stable protocols. Any version of OVN should be able to work with any version of OVS in runtime.

So, you can build OVN with the version of OVS shipped in a submodule and use a separate newer version of OVS deployed on a host. Assuming you're using static linking, there should be no issues. In fact, that is a recommended way of using OVS with OVN.

The checksum offloading patches had a lot of small issues, so I would not be surprised if some of the fixes got lost in backporting. I'll try to look at the dumps, but I'd still recommend you to just upgrade OVS on the node instead.

@igsilya
Copy link
Member

igsilya commented Mar 6, 2024

ol_flags=0x800800000000182

So, these are Geneve packets and the offload is requested for the outer IPv4 checksum.

Tunnel offloads were introduced in OVS 3.3, meaning they were not tested with DPDK older than 23.11. I would not be surprised that divers are missing some support or fixes. I don't think it makes sense to investigate this issue any further and I highly recommend you to just upgrade OVS and use it with supported version of DPDK.

@wangjun0728
Copy link
Author

Hi @igsilya ,I do understand the usage scenario of geneve messages. Currently, the 82599 network card does not support offload the outer IP checksum and outer UDP checksum. Thank you very much for your suggestion. I will try the latest version of OVS 3.3 as soon as possible and provide a verification reply as soon as possible. Thank you again for your reply.

@wangjun0728
Copy link
Author

Hi @igsilya ,I have completed the upgrade from OVS version to 3.3 and DPDK version to 23.11, but the same issue still exists。

E810:

`2024-03-07T07:42:56.712Z|00341|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) read message VHOST_USER_SET_VRING_ENABLE
2024-03-07T07:42:56.712Z|00342|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) set queue enable: 1 to qp idx: 6
2024-03-07T07:42:56.712Z|00343|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) read message VHOST_USER_SET_VRING_ENABLE
2024-03-07T07:42:56.712Z|00344|dpdk|INFO|VHOST_CONFIG: (/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock) set queue enable: 1 to qp idx: 7
2024-03-07T07:42:56.722Z|00017|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00018|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'disabled'
2024-03-07T07:42:56.722Z|00019|netdev_dpdk(ovs_vhost2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00020|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00021|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'disabled'
2024-03-07T07:42:56.722Z|00022|netdev_dpdk(ovs_vhost2)|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00023|netdev_dpdk(ovs_vhost2)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00024|netdev_dpdk(ovs_vhost2)|INFO|State of queue 3 ( rx_qid 1 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00025|netdev_dpdk(ovs_vhost2)|INFO|State of queue 4 ( tx_qid 2 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00026|netdev_dpdk(ovs_vhost2)|INFO|State of queue 5 ( rx_qid 2 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00027|netdev_dpdk(ovs_vhost2)|INFO|State of queue 6 ( tx_qid 3 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:56.722Z|00028|netdev_dpdk(ovs_vhost2)|INFO|State of queue 7 ( rx_qid 3 ) of vhost device '/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock' changed to 'enabled'
2024-03-07T07:42:59.383Z|00016|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:00.800Z|00017|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:00.803Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:00.810Z|00019|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:00.970Z|00020|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:01.255Z|00021|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:01.426Z|00022|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:01.682Z|00023|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:02.810Z|00024|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:03.272Z|00025|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:04.676Z|00026|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:04.810Z|00027|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:05.291Z|00028|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:07.325Z|00029|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:09.348Z|00030|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:11.351Z|00031|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:12.414Z|00032|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:13.361Z|00033|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:15.371Z|00034|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:27.544Z|00035|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event
2024-03-07T07:43:36.076Z|00504|connmgr|INFO|br-int<->unix#2: 5 flow_mods 32 s ago (2 adds, 3 deletes)
2024-03-07T07:43:57.440Z|00036|dpdk|WARN|ice_interrupt_handler(): OICR: MDD event

ovs-vsctl list open
_uuid : 85f32857-8cfb-4f91-9ffe-e28acb930545
bridges : [442c3a80-1b82-4670-aea5-e03d9d4b8b73, ffc69315-36f9-4dd3-b5f5-1dd2118aca21]
cur_cfg : 62
datapath_types : [netdev, system]
datapaths : {netdev=c2425cab-fc67-47fc-96cc-17cd7675ca91, system=45cef88b-7a8d-4f23-852a-f12131577982}
db_version : "8.5.0"
dpdk_initialized : true
dpdk_version : "DPDK 23.11.0"
external_ids : {hostname=xc03-compute2, ovn-bridge-datapath-type=netdev, ovn-encap-ip="10.253.38.55", ovn-encap-type=geneve, ovn-remote="tcp:[10.253.38.10]:6642,tcp:[10.253.38.9]:6642,tcp:[10.253.38.5]:6642", rundir="/var/run/openvswitch", system-id=xc03-compute2}
iface_types : [afxdp, afxdp-nonpmd, bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan]
manager_options : []
next_cfg : 62
other_config : {bundle-idle-timeout="3600", dpdk-extra=" -a 0000:af:00.1 -a 0000:af:00.0", dpdk-init="true", dpdk-socket-mem="2048", n-handler-threads="1", pmd-cpu-mask="0xf", vlan-limit="0"}
ovs_version : "3.3.1"
ssl : []
statistics : {}
system_type : cclinux
system_version : "22.09.2"

ovs-vsctl get interface tun_port_p0 status
{bus_info="bus_name=pci, vendor_id=8086, device_id=159b", driver_name=net_ice, if_descr="DPDK 23.11.0 net_ice", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", max_mac_addrs="64", max_rx_pktlen="1618", max_rx_queues="256", max_tx_queues="256", max_vfs="0", max_vmdq_pools="0", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="1", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="true", tx_out_udp_csum_offload="true", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}
ovs-vsctl get interface vh-userclient-8d1fca5d-dc status
{features="0x000000017060a783", mode=client, n_rxq="4", n_txq="4", num_of_vrings="8", numa="0", socket="/var/run/openvswitch/vh-userclient-8d1fca5d-dc-vhostuser.sock", status=connected, tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false", vring_0_size="1024", vring_1_size="1024", vring_2_size="1024", vring_3_size="1024", vring_4_size="1024", vring_5_size="1024", vring_6_size="1024", vring_7_size="1024"}`

82599:

`2024-03-07T07:46:37.430Z|00002|netdev_dpdk(pmd-c02/id:88)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-07T07:46:45.037Z|00002|netdev_dpdk(pmd-c03/id:86)|WARN|Dropped 21 log messages in last 8 seconds (most recently, 2 seconds ago) due to excessive rate
2024-03-07T07:46:45.037Z|00003|netdev_dpdk(pmd-c03/id:86)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
2024-03-07T07:46:57.483Z|00002|netdev_dpdk(pmd-c00/id:89)|WARN|Dropped 9 log messages in last 12 seconds (most recently, 5 seconds ago) due to excessive rate
2024-03-07T07:46:57.483Z|00003|netdev_dpdk(pmd-c00/id:89)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

ovs-vsctl list open
_uuid : 79b87ec7-4b02-4a77-a2c1-3943a68e8f79
bridges : [ab028efc-5f0a-48d4-a7aa-515681ba1c46, c2ecaf85-9a1b-4f9d-9a51-7e136737e3f7]
cur_cfg : 55
datapath_types : [netdev, system]
datapaths : {netdev=0e62f217-661e-46e3-906d-74a2eef05a3e, system=2a42c035-41fd-4727-b487-ee290a7f7f7c}
db_version : "8.5.0"
dpdk_initialized : true
dpdk_version : "DPDK 23.11.0"
external_ids : {hostname=xc03-compute3, ovn-bridge-datapath-type=netdev, ovn-encap-ip="10.253.38.56", ovn-encap-type=geneve, ovn-remote="tcp:[10.253.38.9]:6642,tcp:[10.253.38.5]:6642,tcp:[10.253.38.10]:6642", rundir="/var/run/openvswitch", system-id=xc03-compute3}
iface_types : [afxdp, afxdp-nonpmd, bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan]
manager_options : []
next_cfg : 55
other_config : {bundle-idle-timeout="3600", dpdk-extra=" -a 0000:18:00.1 -a 0000:18:00.0", dpdk-init="true", dpdk-socket-mem="2048", n-handler-threads="1", pmd-cpu-mask="0xf", vlan-limit="0"}
ovs_version : "3.3.1"
ssl : []
statistics : {}
system_type : cclinux
system_version : "22.09.2"

ovs-vsctl get interface tun_port_p0 status
{bus_info="bus_name=pci, vendor_id=8086, device_id=10fb", driver_name=net_ixgbe, if_descr="DPDK 23.11.0 net_ixgbe", if_type="6", link_speed="10Gbps", max_hash_mac_addrs="4096", max_mac_addrs="127", max_rx_pktlen="1618", max_rx_queues="128", max_tx_queues="64", max_vfs="0", max_vmdq_pools="64", min_rx_bufsize="1024", n_rxq="2", n_txq="5", numa_id="0", port_no="1", rx-steering=rss, rx_csum_offload="true", tx_geneve_tso_offload="false", tx_ip_csum_offload="true", tx_out_ip_csum_offload="false", tx_out_udp_csum_offload="false", tx_sctp_csum_offload="true", tx_tcp_csum_offload="true", tx_tcp_seg_offload="false", tx_udp_csum_offload="true", tx_vxlan_tso_offload="false"}`

@wangjun0728
Copy link
Author

wangjun0728 commented Mar 7, 2024

Regarding E810, it was observed that there was an abnormal printing message after I created the vhost user client port.
I suspect that the 82599 network card does not support tx_out_udp_csum_offload and tx_out_ip_csum_offload, which is causing the issue。

@igsilya
Copy link
Member

igsilya commented Mar 7, 2024

@wangjun0728 thanks for the info!
This looks very similar to what is supposed to be fixed in https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ .
Could you confirm that you have this patch in your version of OVS?

CC: @mkp-rh

@wangjun0728
Copy link
Author

@igsilya
The patch modification is included in my code. I've previously discussed this issue with Mike. This patch resolved the issue with my Mellanox network card, but in the case of Intel network cards (82599 and E810), there are anomalies with the Geneve overlay.
image

Additionally, the latest code I'm using is this one:https://github.com/openvswitch/ovs/commits/branch-3.3/

The checksum offload capability of Intel network cards indeed differs from Mellanox network cards. I believe this might be the root cause of the issue, as it seems more like a problem with the DPDK-side driver.

@mkp-rh
Copy link

mkp-rh commented Mar 7, 2024

I think there's somewhat of a hint provided here:

Output batch contains invalid packets. Only 0/1 are valid: Operation not supported

There are very few places where DPDK will return ENOTSUPP. I don't have an E810 card right now, but will try to investigate the code.

@igsilya
Copy link
Member

igsilya commented Mar 7, 2024

@mkp-rh note that Operation not supported is on 82599 card. E810 doesn't reject packets but throws MDD events.

@mkp-rh
Copy link

mkp-rh commented Mar 7, 2024

For the MDD issue, I see that the E810 errata page reports:

Some of the Tx Data checks performed as part of the Malicious Driver Detection (MDD) are reported as
anti-spoof failures in addition to the actual failures

So it could be the MDD anti-spoofing features, or a general tx data check failure.

In the ixgbe driver, ixgbe_prep_pkts only returns ENOTSUP if the ol_flags are incorrect.

From the log above I see ol_flags=0x800800000000182, which when translates into the following tx offload flags:

RTE_MBUF_F_TX_TUNNEL_GENEVE
RTE_MBUF_F_TX_OUTER_IPV4

ixgbe_rxtx.c contains the supported IXGBE_TX_OFFLOAD_MASK, which doesn't include RTE_MBUF_F_TX_TUNNEL_GENEVE. So that flag shouldn't be included when we send the frame.

@igsilya
Copy link
Member

igsilya commented Mar 7, 2024

RTE_MBUF_F_TX_TUNNEL_GENEVE
RTE_MBUF_F_TX_OUTER_IPV4

ixgbe_rxtx.c contains the supported IXGBE_TX_OFFLOAD_MASK, which doesn't include RTE_MBUF_F_TX_TUNNEL_GENEVE. So that flag shouldn't be included when we send the frame.

So, if we do not request TSO or inner checksumming we must not specify RTE_MBUF_F_TX_TUNNEL_* flags. Right?
IIUC, we need openvswitch/ovs@9b7e1a7 but for tunnels.

@igsilya
Copy link
Member

igsilya commented Mar 7, 2024

@mkp-rh Hmm, also the RTE_MBUF_F_TX_OUTER_IPV4 is not set, while it is required for RTE_MBUF_F_TX_OUTER_IP_CKSUM according to the API. And it seems openvswitch/ovs@9b7e1a7 check is not really correct as it doesn't seem to cover all the outer/inner cases.

Edit: Nevermind, wrong flag. But the existing check might still be incomplete.

@wangjun0728
Copy link
Author

Hi @igsilya @mkp-rh , if you have suggestions for modifications, I have the environment for E810 and 82599 network cards to verify.

@igsilya
Copy link
Member

igsilya commented Mar 8, 2024

@wangjun0728 Could you try this one: igsilya/ovs@00c0a91 ? It should fix the 82599 case at least, I think.

@wangjun0728
Copy link
Author

@igsilya This looks great! Applying your modifications resolved the error with the 82599 network card, and I can now communicate without issues using iperf. Additionally, I've observed the E810 network card, and the MDD error still persists.

@wangjun0728
Copy link
Author

I also noticed a modification in the DPDK community, but applying it didn't yield any results. I suspect there might be a flaw in the E810 driver's support for tunnel TSO.

https://patches.dpdk.org/project/dpdk/patch/[email protected]/

@wangjun0728
Copy link
Author

After enabling DPDK's PMD logs with the command --log-level=pmd,debug, I captured a portion of DPDK startup log information. Currently, it's unclear whether there's any definite correlation with the errors present.

2024-03-11T02:38:04.088Z|00007|dpdk|INFO|Using DPDK 23.11.0
2024-03-11T02:38:04.088Z|00008|dpdk|INFO|DPDK Enabled - initializing...
2024-03-11T02:38:04.088Z|00009|dpdk|INFO|dpdk init get port_num:2
2024-03-11T02:38:04.088Z|00010|dpdk|INFO|EAL ARGS: ovs-vswitchd -a 0000:af:00.1 -a 0000:af:00.0 --log-level=pmd,debug --socket-mem 2048 -l 0.
2024-03-11T02:38:04.091Z|00011|dpdk|INFO|EAL: Detected CPU lcores: 80
2024-03-11T02:38:04.091Z|00012|dpdk|INFO|EAL: Detected NUMA nodes: 2
2024-03-11T02:38:04.091Z|00013|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-03-11T02:38:04.096Z|00014|dpdk|INFO|EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
2024-03-11T02:38:04.099Z|00015|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-03-11T02:38:04.100Z|00016|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 0
2024-03-11T02:38:04.100Z|00017|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 1
2024-03-11T02:38:04.101Z|00018|dpdk|INFO|EAL: VFIO support initialized
2024-03-11T02:38:04.839Z|00019|dpdk|INFO|EAL: Using IOMMU type 1 (Type 1)
2024-03-11T02:38:04.994Z|00020|dpdk|INFO|EAL: Ignore mapping IO port bar(1)
2024-03-11T02:38:04.994Z|00021|dpdk|INFO|EAL: Ignore mapping IO port bar(4)
2024-03-11T02:38:05.120Z|00022|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:af:00.0 (socket 1)
2024-03-11T02:38:05.586Z|00023|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.28.0, ICE OS Default Package (single VLAN mode)
2024-03-11T02:38:05.586Z|00024|dpdk|INFO|ice_dev_init(): FW 5.3.-1521546806 API 1.7
2024-03-11T02:38:05.608Z|00025|dpdk|INFO|ice_flow_init(): Engine 4 disabled
2024-03-11T02:38:05.608Z|00026|dpdk|INFO|ice_fdir_setup(): FDIR HW Capabilities: fd_fltr_guar = 1024, fd_fltr_best_effort = 14336.
2024-03-11T02:38:05.612Z|00027|dpdk|INFO|__vsi_queues_bind_intr(): queue 0 is binding to vect 257
2024-03-11T02:38:05.612Z|00028|dpdk|INFO|ice_fdir_setup(): FDIR setup successfully, with programming queue 0.
2024-03-11T02:38:05.736Z|00029|dpdk|INFO|EAL: Ignore mapping IO port bar(1)
2024-03-11T02:38:05.736Z|00030|dpdk|INFO|EAL: Ignore mapping IO port bar(4)
2024-03-11T02:38:05.839Z|00031|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:af:00.1 (socket 1)
2024-03-11T02:38:05.942Z|00032|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.28.0, ICE OS Default Package (single VLAN mode)
2024-03-11T02:38:05.942Z|00033|dpdk|INFO|ice_dev_init(): FW 5.3.-1521546806 API 1.7
2024-03-11T02:38:05.965Z|00034|dpdk|INFO|ice_flow_init(): Engine 4 disabled
2024-03-11T02:38:05.965Z|00035|dpdk|INFO|ice_fdir_setup(): FDIR HW Capabilities: fd_fltr_guar = 1024, fd_fltr_best_effort = 14336.
2024-03-11T02:38:05.968Z|00036|dpdk|INFO|__vsi_queues_bind_intr(): queue 0 is binding to vect 257
2024-03-11T02:38:05.968Z|00037|dpdk|INFO|ice_fdir_setup(): FDIR setup successfully, with programming queue 0.
2024-03-11T02:38:05.972Z|00038|dpdk|WARN|TELEMETRY: No legacy callbacks, legacy socket not created
2024-03-11T02:38:05.972Z|00039|dpdk|INFO|DPDK rte_pdump - initializing...
2024-03-11T02:38:05.977Z|00044|dpdk|INFO|DPDK Enabled - initialized
2024-03-11T02:38:06.223Z|00001|dpdk|INFO|ice_interrupt_handler(): OICR: link state change event
2024-03-11T02:38:06.406Z|00089|dpdk|INFO|Device with port_id=1 already stopped
2024-03-11T02:38:06.572Z|00090|dpdk|INFO|ice_set_rx_function(): Using AVX2 OFFLOAD Vector Rx (port 1).
2024-03-11T02:38:06.572Z|00091|dpdk|ERR|ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq
2024-03-11T02:38:06.572Z|00092|dpdk|INFO|__vsi_queues_bind_intr(): queue 1 is binding to vect 1
2024-03-11T02:38:06.572Z|00093|dpdk|INFO|__vsi_queues_bind_intr(): queue 2 is binding to vect 1
2024-03-11T02:38:07.555Z|00002|dpdk|INFO|ice_interrupt_handler(): OICR: link state change event
2024-03-11T02:38:07.600Z|00102|dpdk|INFO|Device with port_id=0 already stopped
2024-03-11T02:38:07.623Z|00103|dpdk|INFO|ice_set_rx_function(): Using AVX2 OFFLOAD Vector Rx (port 0).
2024-03-11T02:38:07.624Z|00104|dpdk|ERR|ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq
2024-03-11T02:38:07.624Z|00105|dpdk|INFO|__vsi_queues_bind_intr(): queue 1 is binding to vect 1
2024-03-11T02:38:07.624Z|00106|dpdk|INFO|__vsi_queues_bind_intr(): queue 2 is binding to vect 1

igsilya added a commit to igsilya/ovs that referenced this issue Mar 11, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For exmaple, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks geneve tunnels on these cards.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Mar 11, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For exmaple, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks geneve tunnels on these cards.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Mar 11, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For exmaple, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks geneve tunnels on these cards.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Mar 11, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For exmaple, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks geneve tunnels on these cards.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Ilya Maximets <[email protected]>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Mar 11, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For example, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks Geneve tunnels on these cards.

An extra check is added to make sure we don't have any unexpected
Tx offload flags set.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Ilya Maximets <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
@igsilya
Copy link
Member

igsilya commented Mar 11, 2024

@wangjun0728 I posted the refined verion of the 82599 fix here: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
Could you check with this version? It has some extra checking, but I do not expect it to behave much different, i.e. it should fix the 82599 case, but should not affect the E810 problem.

@igsilya
Copy link
Member

igsilya commented Mar 11, 2024

For the E810, I still don't have a lot to suggest. One thing that might help understanding the situation better is to dump some of the mbufs we're trying to send. Maybe you can capture some logs with the following change applied:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c52accff..331031035 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2607,6 +2607,17 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
                  (char *) dp_packet_eth(pkt);
         mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
                  (char *) dp_packet_l3(pkt);
+        VLOG_WARN_RL(&rl, "%s: Tunnel offload:"
+                     " outer_l2_len=%d"
+                     " outer_l3_len=%d"
+                     " l2_len=%d"
+                     " l3_len=%d"
+                     " l4_len=%d",
+                     netdev_get_name(&dev->up),
+                     mbuf->outer_l2_len, mbuf->outer_l3_len,
+                     mbuf->l2_len, mbuf->l3_len, mbuf->l4_len);
+        netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up),
+                              "Tunneled packet", mbuf);
     } else {
         mbuf->l2_len = (char *) dp_packet_l3(pkt) -
                (char *) dp_packet_eth(pkt);

?
It will spam the packets into the log, so definitely not recommended for a long-running test. But maybe it can shed some light on the problem.

@mkp-rh
Copy link

mkp-rh commented Mar 11, 2024

@wangjun0728 Are you able to check if the following patch resolves your issue on E810?

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index df7bf8e6b..046acd8ba 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -597,12 +597,15 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags)
          * support inner checksum offload and an outer UDP checksum is
          * required, then we can't offload inner checksum either. As that would
          * invalidate the outer checksum. */
-        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM) &&
-                dp_packet_hwol_is_outer_udp_cksum(p)) {
-            flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
-                       NETDEV_TX_OFFLOAD_UDP_CKSUM |
-                       NETDEV_TX_OFFLOAD_SCTP_CKSUM |
-                       NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+        if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {
+            if (dp_packet_hwol_is_outer_udp_cksum(p)) {
+                flags &= ~(NETDEV_TX_OFFLOAD_TCP_CKSUM |
+                           NETDEV_TX_OFFLOAD_UDP_CKSUM |
+                           NETDEV_TX_OFFLOAD_SCTP_CKSUM |
+                           NETDEV_TX_OFFLOAD_IPV4_CKSUM);
+            }
+            *dp_packet_ol_flags_ptr(p) &= ~(DP_PACKET_OL_TX_TUNNEL_GENEVE |
+                                            DP_PACKET_OL_TX_TUNNEL_VXLAN);
         }
     }
 

@wangjun0728
Copy link
Author

wangjun0728 commented Apr 9, 2024

Rolling back your modification didn't resolve the issue; it seems that the 82599 network card doesn't support enabling TSO.

The same issue exists on the Mellanox CX5 network card. Even though the MTU has been adjusted to 1558, iperf cannot send a large number of TCP packets. However, unlike the 82599 network card, no similar errors have been observed with the Mellanox CX5 card.

So, does it mean that TSO cannot be enabled if the outer UDP checksum offload is not supported?

  2024-04-09T05:59:20.584Z|00003|netdev_dpdk(pmd-c02/id:88)|WARN|Dropped 302 log messages in last 67 seconds (most recently, 57 seconds ago) due to excessive rate
  2024-04-09T05:59:20.584Z|00004|netdev_dpdk(pmd-c02/id:88)|WARN|tun_port_p1: Output batch contains invalid packets. Only 0/1 are valid: Operation not supported
  2024-04-09T05:59:20.588Z|00005|netdev_dpdk(pmd-c02/id:88)|DBG|tun_port_p1: First invalid packet:
  dump mbuf at 0x1938b6e00, iova=0x18f261d80, buf_len=7496
    pkt_len=7416, ol_flags=0x2884800000000182, nb_segs=1, port=65535, ptype=0
    segment at 0x1938b6e00, data=0x18f261dc2, len=7416, off=66, refcnt=1
    Dump data at [0x18f261dc2], len=7416
  00000000: 40 A6 B7 21 92 8C 68 91 D0 65 C6 C3 81 00 00 5C | @..!..h..e.....\
  00000010: 08 00 45 00 1C E6 00 00 40 00 40 11 BB 9F 0A FD | ..E.....@.@.....
  00000020: 26 38 0A FD 26 36 DF 9B 17 C1 1C D2 AF DD 02 40 | &8..&6.........@
  00000030: 65 58 00 00 31 00 01 02 80 01 00 04 00 06 0A C8 | eX..1...........
  00000040: E1 5C 84 0E 06 AF A9 F4 AA D6 08 00 45 00 1C AC | .\..........E...
  00000050: A8 3A 00 00 40 06 A2 04 0A 00 00 09 0A 00 00 05 | .:..@...........
  00000060: 9C 44 14 51 06 83 A9 67 58 87 A7 38 50 18 00 7E | .D.Q...gX..8P..~
  00000070: AD E8 00 00 9D 23 90 5F 6F 71 50 F8 48 2D BD 13 | .....#._oqP.H-..

david-marchand added a commit to david-marchand/ovs that referenced this issue Apr 19, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
david-marchand added a commit to david-marchand/ovs that referenced this issue Apr 19, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Apr 19, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Apr 19, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue May 10, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue May 10, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
david-marchand added a commit to david-marchand/ovs that referenced this issue May 17, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
david-marchand added a commit to david-marchand/ovs that referenced this issue May 17, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue May 30, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue May 30, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue May 31, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue May 31, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
igsilya pushed a commit to igsilya/ovs that referenced this issue Jun 5, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
igsilya pushed a commit to igsilya/ovs that referenced this issue Jun 5, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue Jun 6, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue Jun 6, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue Jun 6, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to kevintraynor/ovs that referenced this issue Jun 6, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to openvswitch/ovs that referenced this issue Jun 6, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to openvswitch/ovs that referenced this issue Jun 6, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to openvswitch/ovs that referenced this issue Jun 6, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
kevintraynor pushed a commit to openvswitch/ovs that referenced this issue Jun 6, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
roseoriorden pushed a commit to roseoriorden/ovs that referenced this issue Jul 1, 2024
Some drivers (primarily, Intel ones) do not expect any marking flags
being set if no offloads are requested.  If these flags are present,
driver will fail Tx preparation or behave abnormally.

For example, ixgbe driver will refuse to process the packet with
only RTE_MBUF_F_TX_TUNNEL_GENEVE and RTE_MBUF_F_TX_OUTER_IPV4 set.
This pretty much breaks Geneve tunnels on these cards.

An extra check is added to make sure we don't have any unexpected
Tx offload flags set.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Acked-by: Mike Pattrick <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
roseoriorden pushed a commit to roseoriorden/ovs that referenced this issue Jul 1, 2024
Fixing the issue of incorrect outer UDP checksum in packets sent by
E810 or X710. We disable RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM,but also
disable all the dependent offloads like
RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO and
RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO.

Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: openvswitch/ovs-issues#321
Signed-off-by: Jun Wang <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
roseoriorden pushed a commit to roseoriorden/ovs that referenced this issue Jul 1, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
roseoriorden pushed a commit to roseoriorden/ovs that referenced this issue Jul 1, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
roidayan pushed a commit to roidayan/ovs that referenced this issue Aug 18, 2024
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
	MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: openvswitch/ovs-issues#321
Fixes: 084c808 ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782 ("netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.")
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
(cherry picked from commit 2e03f55)
Signed-off-by: Roi Dayan <[email protected]>
Change-Id: Ibc94d237c35d785aed8921e9e5c6cac29dbd7ea7
roidayan pushed a commit to roidayan/ovs that referenced this issue Aug 18, 2024
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|00002|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
	Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: openvswitch/ovs-issues#321
Signed-off-by: David Marchand <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>
(cherry picked from commit a924887)
Signed-off-by: Roi Dayan <[email protected]>
Change-Id: Iaeb59ea36b3fd22b8befc753abf03f5a0f5fee42
@wangjun0728
Copy link
Author

I use this patch from Mike, which can very well solve the problem that my ov-dpdk cannot enable TSO. Currently, I have verified that CX6/X710/E810, etc. are all normal. The performance of virtual machines on the same node is very high, and cross-node performance is still good.
Only supported by E810. And the MTU configuration restrictions that existed before TSO was enabled on the E810 network card no longer exist.
I still want to thank @mkp-rh , @david-marchand and @igsilya .

https://patchwork.ozlabs.org/project/openvswitch/list/?series=417313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants