-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1st fragment dropped on VxLAN connection #1148
1st fragment dropped on VxLAN connection #1148
Comments
|
I confirm this issue happens in both way (sending packet to the NSE from the NSC, and sending packet to the NSC from the NSE) |
I also tried with with client from https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/features/webhook#client-requests-for-postgresql-service |
@NikitaSkrynnik Some things to look at here as you debug: You can kubctl into the forwarder pod and use vppctl to trace the packets through vpp. If I understand @LionelJouin correclty, the ping from either the NSC or NSE is going something like this:
If you run ping in the NSC, you would run the trace in the VPP in forwarder-1. I suspect the things you want to trace are:
Set these before you send the ping from the NSC. That should give you a bit more information about what may be happening inside VPP. @LionelJouin Feel free to try that too and report your findings back here :) |
@LionelJouin Do you happen to know if vethpairs or tapv2 is being used in the setup where you see the issue? Ed |
@LionelJouin If you are using vethpairs... I have a potential root cause identified. With a vethpair, you have two sides of the pair, for the sake of conversation lets call them (nsc-1, vpp-1). Currently it appears that nsc-1 is having its mtu set correctly, but vpp-1 is not. Further, host-vpp-1 interface in vpp has its mtu set to match nsc-1. (note: this is all the same as well on the NSE side). The result is that if I send a packet into nsc-1 that is fine for the mtu on nsc-1, but to small for the mtu on vpp-1 ... its dropped by the kernel. If a packet arrives and is sent out host-vpp-1 and its smaller than the mtu on host-vpp-1 but larger than the mtu on vpp-1... the kernel drops it. My (tentative) theory as to why you are seeing the second fragment and not the first is that you are properly getting fragmentation of the original packet at its origin... but along the way we are hitting this issue with the first fragment (which is larger) causing it to get dropped. The second smaller fragment is sufficiently small it doesn't hit this issue. Could you check to see if that's what you are seeing? Steps would include:
Note: to get mtu info in vpp:
|
This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: Ed Warnicke <[email protected]>
This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: Ed Warnicke <[email protected]>
I tried the trace commands, but I don't think I found anything relevant. I use VxLAN, and I have these MTUs:
|
…k-vpp@main PR link: networkservicemesh/sdk-vpp#452 Commit: 7958db6 Author: Ed Warnicke Date: 2021-11-22 09:09:20 -0600 Message: - Fix for mtu issues with kernelvethpair (#452) This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: NSMBot <[email protected]>
…k-vpp@main PR link: networkservicemesh/sdk-vpp#452 Commit: 7958db6 Author: Ed Warnicke Date: 2021-11-22 09:09:20 -0600 Message: - Fix for mtu issues with kernelvethpair (#452) This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: NSMBot <[email protected]>
…k-vpp@main PR link: networkservicemesh/sdk-vpp#452 Commit: 7958db6 Author: Ed Warnicke Date: 2021-11-22 09:09:20 -0600 Message: - Fix for mtu issues with kernelvethpair (#452) This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: NSMBot <[email protected]>
…k-vpp@main PR link: networkservicemesh/sdk-vpp#452 Commit: 7958db6 Author: Ed Warnicke Date: 2021-11-22 09:09:20 -0600 Message: - Fix for mtu issues with kernelvethpair (#452) This *probably* fixes: networkservicemesh/sdk#1148 The underlying issue was that the end of the vethpair to which VPP was attaching with afpacket was not getting its MTU set correctly. As a result, if an oversized packet was sent over it, it would be fragmented by the kernel to a size that matches the mtu on the end of the veth pair that was in the NSC network namespace. The resulting packet would *still* be to large for the MTU of the end of the vethpair attached to the VPP instance, and would be dropped there. The second fragment, being smaller, would be smaller than the MTU of the end of the vethpair to which VPP was attached with af-packet, and so get through. Signed-off-by: NSMBot <[email protected]>
af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
@LionelJouin - we appear to have a fix. I was able to reproduce your issue locally. Tracked it back to: https://gerrit.fd.io/r/c/vpp/+/34585 Tested locally that it resolves your issue (it does!). Incorporated in: edwarnicke/govpp#44 The net net issue was this: if (tph->tp_snaplen > apif->host_mtu)
fill_gso_buffer_flags (first_b0, apif->host_mtu,
l4_hdr_sz); in the af-packet-input node. It was mistakenly comparing the snaplen (which includes the ethernet header) to the host_mtu (which does not include the ethernet header) to determine whether the incoming packet was a GSO packet or not. The vxlan packet was coming in via af-packet. The size of its 'payload' (where payload here is relative to the outer ethernet header) was 1494. Add the ethernet header (14 bytes) and the snaplen was 1508. 1508 was larger than the host-mtu (1500) and so it was being marked as GSO. When it got l2xc cross connected to the tap interface, which did not have GSO enabled... it was being dropped. The fix correctly compensates for the outer ethernet header. Some comments lest there be confusion:
|
af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
@LionelJouin Once these these have been merged this should be ready for you to retest: |
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
…d-forwarder-vpp@main PR link: networkservicemesh/cmd-forwarder-vpp#411 Commit: 4775484 Author: Ed Warnicke Date: 2021-11-26 04:39:57 -0600 Message: - Incorporate VPP GSO fix (#411) edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: NSMBot <[email protected]>
edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: Ed Warnicke <[email protected]>
…d-nsc-vpp@main PR link: networkservicemesh/cmd-nsc-vpp#323 Commit: b831d0f Author: Ed Warnicke Date: 2021-11-26 04:41:26 -0600 Message: - Incorporate VPP GSO fix (#323) edwarnicke/govpp#44 af-packet may incorrectly mark a packet as being a GSO packet due to a slight miscomputation around the MTU. This should fix that. https://gerrit.fd.io/r/c/vpp/+/34585 Fixes networkservicemesh/sdk#1148 Signed-off-by: NSMBot <[email protected]>
Current Behavior
When pinging or sending UDP packets from the NSE to the NSC through a NSM VxLAN link (created by the vpp-forwarder) with a packet size higher than the MTU, the NSC does not receive the first fragment.
Failure Information (for bugs)
We checked the vxlan packets on both vpp forwarders (so on the 2 nodes), and we could see the 2 fragments are correctly sent on the NSE node and correctly received on the NSC node.
In the vpp-forwarder which runs on the same node as the NSC, we could see the drop counter of the tap interface corresponding to the NSC one is increasing each time we are sending fragmented traffic.
Steps to Reproduce
tcpdump -nnvvXSu -i any icmp
ping -s 2000 169.254.0.3
Context
Versions:
I tried with v1.0.0 images and I could not see this issue. It was working fine
The text was updated successfully, but these errors were encountered: