Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

Closed
janeczku opened this issue Jul 7, 2021 · 17 comments · Fixed by #6842
Closed

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

janeczku opened this issue Jul 7, 2021 · 17 comments · Fixed by #6842

Comments

@janeczku
Copy link

janeczku commented Jul 7, 2021

Expected Behavior

Pod-pod and pod-service communication across nodes should work.

Current Behavior

All traffic between pods across nodes is dropped (with the exception of ICMP).

Possible Solution

VMware recommends to either:

  • Change the VXLAN port to 8472 (when NSX is not used) or 4789 (when NSX is used)
  • Disable the VXLAN hardware offload feature on the VMXNET3 NIC (which recent Linux driver version enable by default)

Since a port change is not feasible for Calico Windows (which requires 4789) disabling the hardware offload feature is the only feasible solution. Since this feature was not even supported by earlier Linux versions for that particular NIC device there is no performance impact of disabling it.

Given that the NIC firmware configuration is not something most users are used to manage i suggest to implement a transparent solution in Calico that disables the offload feature when Calico configures VXLAN on host interfaces backed by a VMXNET3 device.
To that effect: It looks like Calico already configures NIC driver settings: https://github.com/projectcalico/felix/blob/master/ethtool/ethtool.go

Steps to Reproduce (for bugs)

  1. Provision VMs on vSphere version 6.7u2 or later using one of the following operating systems: CentOS/RHEL/Oracle 8.3, SLES 15 SP2/SP3
  2. Install Kubernetes cluster on the nodes
  3. Install Calico with VXLAN overlay following official docs, e.g.:

Context

VXLAN packets are dropped on the Linux network stack due to incorrect checksums of inner packets. These incorrect checksums occur when enabling VXLAN hardware offload on the VMXNET3 interface (which recent Linux version do by default) and creating a VXLAN overlay network in the guest OS on ports other than 8472 (when NSX is not used) or 4789 (when NSX is used).

References:

Your Environment

  • Calico version 3.19.1
  • Orchestrator version: Kubernetes 1.19.12 (RKE)
  • Operating System and version: CentOS/RHEL 8.3, SLES 15 SP2
@champtar
Copy link

champtar commented Jul 7, 2021

VXLAN offload works with many 10G NICs, disabling by default will hurt performance for those, and each card can have different offload toggle, for the qede driver + IPIP you need to disable all offload, not just tx-udp_tnl-csum-segmentation for exemple.

@janeczku
Copy link
Author

janeczku commented Jul 7, 2021

Good point, but the issue at hand is completely limited to vSphere infrastructure, so the fix would/should also only apply to the specific type of NIC used there (VMXNET3). The goal is not to solve all knowns issue in relation to Calico IPIP or VXLAN but to restore compatibility with what is undoubtedly a very mainstream and widespread infrastructure.

@lmm
Copy link
Contributor

lmm commented Aug 10, 2021

Thanks @janeczku. So IIUC there is a workaround to disable hardware offloading on those specific NICs that can be done prior to installing Calico for Windows.
Perhaps another way is to document this issue and workaround for Calico vSphere users on https://docs.projectcalico.org

cc @song-jiang

@fasaxc
Copy link
Member

fasaxc commented Aug 20, 2021

Is there a good way to detect these NICs? If so, we could arrange for ChecksumOffloadBroken to be set int hat case: https://github.com/projectcalico/felix/blob/master/iptables/feature_detect.go#L116

Note: Calico feature detction can be overridden with config by setting an override in the FelixConfiguration resource:

featureDetectOverride: "ChecksumOffloadBroken=true"

@janeczku
Copy link
Author

janeczku commented Aug 20, 2021

It should either be documented or the workaround should be applied automatically in Felix using the approach described by @fasaxc above.

@janeczku
Copy link
Author

Yes, they can be detected by determining NIC model and hw revision via ethtool syscalls

@janeczku
Copy link
Author

janeczku commented Aug 20, 2021

The bug is actually in the new linux driver for vmxnet3. So probably instead of detecting the specific hardware revision (which i am not sure is exposed over ethtool) it would be enough to detect that it uses the buggy driver version.

@champtar
Copy link

Sometimes the bug is with the driver + firmware combination, it's endless.
Best thing would be to have Calico send packets using raw sockets and receive them on another node and see if the checksums are correct, ie really test that it's working.

@robodude666
Copy link

@fasaxc, et al.,

I have an issue where pods can't communicate with one another across nodes. I've concluded that it's related to this issue.

I was able to verify that on a brand new k3s cluster install adding featureDetectOverride: "ChecksumOffloadBroken=true" to the FelixConfiguration fixes the issue, but I'm unable to get an existing install fixed by applying the change. What needs to be done for the change to take effect?

I have calico installed via the tigera operator v1.23.1 (calico v3.21.0) on k3s v1.21.5+k3s2. OS is Ubuntu 20.04.

-robodude666

@caseydavenport caseydavenport changed the title Calico VXLAN network broken on VMware vSphere with recent Linux versions ChecksumOffloadBroken autodetection doesn't necessarily detect all cases Jan 10, 2022
@CecileRobertMichon
Copy link
Contributor

I'm hitting this issue on Azure (requires VXLAN) Linux version 5.15.0-1014-azure, using Helm to install Calico in VXLAN mode via operator. Unfortunately, the autodetect doesn't work because my kernel version is > 5.7 (even though Ubuntu 20.04 doesn't appear to have the fix).

However, Calico does not allow configuring Felix directly when using the operator: https://projectcalico.docs.tigera.io/reference/felix/configuration

It would be great if we could either:

  1. Improve the ChecksumOffloadBroken to not rely on a simple kernel version check (since not all distributions have the fix backported) - this would be my preferred solution
  2. Allow configuring Felix via operator / Helm chart values

@caseydavenport
Copy link
Member

Hm, that's a bummer that the auto-detection isn't working on newer kernels.

If you have installed Calico using the operator, you cannot modify the environment provided to felix directly. To configure felix, see the FelixConfiguration resource instead.

If you're using the operator, you should look at https://projectcalico.docs.tigera.io/reference/resources/felixconfig to use REST API-based configuration instead of environment variables.

You should be able to modify the default FelixConfiguration resource to set:

spec.featureDetectOverride: "ChecksumOffloadBroken=true"

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Aug 26, 2022

You should be able to modify the default FelixConfiguration resource

@caseydavenport that's what I'm doing for now and it seems to make the tests happy: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/1a1fa22e8947ba7805e029a279c85af325c2e32b/templates/addons/calico/felix-override.yaml

Do you know if there is a way to do this directly via the Helm chart though? It'd be easier if I could set the featureDetectOverride in values.yaml instead of having to modify the default FelixConfigurations resource via kubectl apply after the helm install. Maybe I'm missing something?

After doing some research across many GitHub issues on this kernel bug I found https://github.com/rancher/rke2-charts/blob/main-source/packages/rke2-calico/generated-changes/overlay/templates/felixconfig.yaml, seems like rancher folks are doing some sort of overlay to extend the upstream calico template to allow configuring Felix in values.yaml. Would it be valuable to add something like it directly in the official Calico Helm chart?

Thanks so much for the answer and for all your work on the project btw, I've gone through a lot of Calico issues the past few days and your comments were very helpful!

@caseydavenport
Copy link
Member

Thanks for the pointer to that overlay file! I didn't realize that.

However, this line . . . Looks like #6412 strikes again!

Would it be valuable to add something like it directly in the official Calico Helm chart?

It definitely would, and were it not for the problems discussed in the above issue I'd probably just do that right now. To be honest I'm tempted to do it anyway since the default FelixConfiguration is a singleton and this would be a nice UX improvement and would actually be abstracted behind helm's values.yaml "API" anyway... I will mull on that :)

Thanks so much...

You're very welcome! and I really appreciate the kind words 😸

@CecileRobertMichon
Copy link
Contributor

Hey @caseydavenport have you given this any more thought? Looks like others are running into this as well from issue mentions

@caseydavenport
Copy link
Member

@fasaxc has a PR which will always disable the offload here: #6842

That's probably the best way for now.

@fredkan
Copy link

fredkan commented Sep 21, 2023

Is there a good way to detect these NICs? If so, we could arrange for ChecksumOffloadBroken to be set int hat case: https://github.com/projectcalico/felix/blob/master/iptables/feature_detect.go#L116

Note: Calico feature detction can be overridden with config by setting an override in the FelixConfiguration resource:

featureDetectOverride: "ChecksumOffloadBroken=true"

this only works for VXLAN, not for IPIP;

@fasaxc
Copy link
Member

fasaxc commented Sep 25, 2023

@fredkan see above, we decided to disable it by default in more recent versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants