Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker node interface MTU arbitrarily increased #557

Closed
zolug opened this issue Apr 6, 2022 · 15 comments
Closed

worker node interface MTU arbitrarily increased #557

zolug opened this issue Apr 6, 2022 · 15 comments
Labels
bug Something isn't working

Comments

@zolug
Copy link
Contributor

zolug commented Apr 6, 2022

MTU of the node interface hosting NSM_TUNNEL_IP raised from initial 1500 bytes when using vpp-forwarder v1.3.0-rc.1 (AF_PACKET mode).
Thus causing traffic disturbances if the "base" interface (e.g. tap) MTU is 1500 bytes on the host.

Mostly the new value ends up being 1544 bytes (but in one occasion it turned out to be 1588 bytes).
Also, in case there are multiple network interfaces on the VM, their MTU might get changed as well even though TUNNEL_IP is not hosted by them. (I guess it's because I have a DeviceSelectorFile configured as well. Although occasionally not all the devices in the file are affected by the MTU change).

Interface MTUs are left intact when running with vpp-forwarder v1.2.0.
Also, when rebuilt with the VPP version used in v1.2.0, the problem does not appear.

To reproduce:

  • example 'basic' or 'memory' should do (there's no need to deploy any NSC or NSE)
  • Kind can be used as well

IPv4 traffic impact can be verified simply with netcat (assuming the new MTU is 1544 bytes):

  • Create a TCP server on one of the worker nodes listening on an address that belongs to the affected interface (pick a server port of your liking)
  • Create TCP client connecting to the listener set up in the previous step, and try to send e.g. 1480 bytes payload. The data will never reach the server. Running tcpdump on the client side reveals the retransmits.
    (Note: If TCP TX segmentation offload is enabled, then TCP payload sizes exceeding (NEW_MTU - (IPv4_header_size + TCP_overhead_including_options)) will succeed.)

k8s impact:
If kube-apiserver is running on a VM that also hosts a vpp-forwarder, then kubernetes API server requests from other nodes might randomly fail resulting in the following error:

vm-002 ~ # kubectl get pods -n nsm
E0406 17:32:54.015653 15189 request.go:1085] Unexpected error when reading response body: http2: client connection lost
unexpected error when reading response body. Please retry. Original error: http2: client connection lost

mtu.txt

@zolug zolug added the bug Something isn't working label Apr 6, 2022
@zolug
Copy link
Contributor Author

zolug commented Apr 8, 2022

Got a hint from @ljkiraly that instead of HwInterfaceSetMtu vpp API call SwInterfaceSetMtu might be more feasible.
Gave it a try and interface MTUs were left intact, and things seemed to work fine.
diff.txt

edit:
In the meantime @ljkiraly informed be that mtu chain component could still interfere (and should be the reason why in some cases I ended up with 1588 bytes MTU): https://github.com/networkservicemesh/sdk-vpp/blob/5cb7919d7814d5079e68b6d62c39b00dd10d6d89/pkg/networkservice/connectioncontext/mtu/common.go#L37

@edwarnicke
Copy link
Member

@zolug OK... so it looks like the root issue here is that the VPP MTU is resulting in setting the interface MTU (and incorrectly). Is that right?

@edwarnicke
Copy link
Member

@ljkiraly Could you say more about the cases in which the mtu chain element could interfere?

@zolug
Copy link
Contributor Author

zolug commented Apr 11, 2022

@zolug OK... so it looks like the root issue here is that the VPP MTU is resulting in setting the interface MTU (and incorrectly). Is that right?

Yes, that's correct.

@ljkiraly
Copy link
Contributor

@edwarnicke
First of all the main issue is that the HwInterfaceSetMtu binapi RPC changed it's behavior. I do not know the reason behind this, so that's why we not tried to fix this by changing to SwInterfaceSetMtu.

At the startup of the forwarder the initialization function will set the MTU first and add 44 bytes to the given interface (e.g the interface given by IP_TUNNEL address, with default 1500 MTU will be set to 1544). Then the ConnectionContext will try to adjust the MTU and calls the HwInterfaceSetMtu again increasing the MTU value with 44 bytes (resulting in 1588). I suppose that will keep increasing when the forwarder restarts.

@edwarnicke
Copy link
Member

@ljkiraly Got it, so we need to simply switch from HwInterfaceSetMtu to SwInterfaceSetMtu where ever it is used.

@ljkiraly
Copy link
Contributor

@edwarnicke I'm not sure about the ConnectionContext, since there are two functions: setVPPL2MTU and setVPPL3MTU.

  • layer2 calls HwInterfaceSetMtu
  • layer3 calls SwInterfaceSetMtu.
    I don't know the original intention behind this division.

@edwarnicke
Copy link
Member

@zolug @ljkiraly I think we can get what we need with setVPPL3MTU using SwInterfaceSetMTU but the outstanding question is do we anticipate needing to pass 802.1q tagged ethernet frames over VXLAN?

Currently we reduce MTU by 50 bytes for VXLAN:

20 bytes outer IP header
8 bytes - outer UDP header
8 bytes - vxlan header
14 byte - inner ethernet frame

If we wanted to support 802.1q vlan tags using strictly IP MTUs (is SwInterfaceSetMTU) we would need to increase that overhead to 54 bytes (4 more bytes for 802.1q)

Thoughts?

@zolug
Copy link
Contributor Author

zolug commented Apr 11, 2022

@edwarnicke

do we anticipate needing to pass 802.1q tagged ethernet frames over VXLAN?

No, we don't. (At least we don't have a use case like that.)

@edwarnicke
Copy link
Member

@ljkiraly @denis-tingaikin @zolug Could you have a look at these fixes:

#569

networkservicemesh/sdk-vpp#554

@edwarnicke
Copy link
Member

@zolug
Copy link
Contributor Author

zolug commented Apr 11, 2022

@edwarnicke Thanks! The fixes work without any problem.

The following log msg printed on forwarder start is a bit misleading though now:

[MTU:[1500 1500 1500 1500]] [cmd:/bin/forwarder] [duration:155.192µs] [swIfIndex:1] [vppapi:HwInterfaceSetMtu]

@edwarnicke
Copy link
Member

@edwarnicke Thanks! The fixes work without any problem.

The following log msg printed on forwarder start is a bit misleading though now:

[MTU:[1500 1500 1500 1500]] [cmd:/bin/forwarder] [duration:155.192µs] [swIfIndex:1] [vppapi:HwInterfaceSetMtu]

@zolug This should be fixed in #570 - thank you for catching it :)

@ljkiraly
Copy link
Contributor

@edwarnicke I propose to simply get L3 MTU since is set based on L2 link MTU and should not modified elsewhere. Check networkservicemesh/sdk-vpp#555

@edwarnicke
Copy link
Member

@ljkiraly Have you confirmed that the L3 MTU is set for the interfaces you care about (in the absence of setting it yourself?). I ask, because for software interfaces (like af packet) its not.

@zolug zolug closed this as completed Jun 2, 2022
@LionelJouin LionelJouin moved this to ✅ Done in Meridio Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants