Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod creation fails when requesting vfio-pci bound resource via SRIOV CNI, as DANM unable to setup dummy kernel interface for the device #231

Closed
superfix906 opened this issue Aug 19, 2020 · 7 comments · Fixed by #234
Labels
bug Something isn't working

Comments

@superfix906
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

Network CNI could not be setup for SRIOV, when the allocated resource is a vfio-pci bound device. Fails in creation of dummy interface, with error : "cannot create dummy interface for DPDK because:cannot assign requested address"

What you expected to happen:

Network CNI should have been setup and and pod requesting DPDK (vfio-pci) interface should have started, with a dummy kernel interface in Pods' n/w namespace.

How to reproduce it:

Install DANM in lightweight mode using the installer job, once all services are running, launch danmNet and pod with requests for a vfio-pci bound interface, via SRIOV CNI

Anything else we need to know?:

Am using flannel for IPV4 based cluster networking, danm is installed as per the installer job document in lightweight mode, all danm services are up and running. Am able to create a pod with SRIOV as CNI when the resource is bound to kernel/netdevice, and even IPAM is able to allocate IP for the same. The same is not true, when the resource is bound to vfio-pci driver, the CNI setup fails to create the dummy kernel interface, with the following error message :

Events:
Type Reason Age From Message
Normal Scheduled default-scheduler Successfully assigned example/app to test
Warning FailedCreatePodSandBox 2s kubelet, test Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ee0d160e99d6bb410d8b75d2fef6f0f546811537598ab1282c8b2cd29e8cf925" network for pod "app": networkPlugin cni failed to set up pod "app_example" network: CNI network could not be set up: CNI operation for network:sriov-vfio failed with:Post-processing failed for interface:eth1 because:failed to create dummy kernel interface for eth1 because:cannot create dummy interface for DPDK because:cannot assign requested address
Normal SandboxChanged 2s kubelet, test Pod sandbox changed, it will be killed and re-created.

POD yaml

apiVersion: v1
kind: Pod
metadata:
name: app
namespace: example
labels:
env: test
annotations:
danm.k8s.io/interfaces: |
[
{"network":"management", "ip":"dynamic"},
{"network":"sriov-vfio", "ip":"dynamic"}
]
spec:
containers:
- name: sriov-pod
image: centos:latest
args:
- sleep
- "10000"
resources:
requests:
intel.com/sriov_vfio_vf: '1'
limits:
intel.com/sriov_vfio_vf: '1'

DanmNet Yaml

apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: management
namespace: example
spec:
NetworkID: 10-flannel
NetworkType: flannel
---
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: sriov-vfio
namespace: example
spec:
NetworkID: sriov-vfio
NetworkType: sriov
Options:
device_pool: "intel.com/sriov_vfio_vf"
cidr: 10.1.20.0/24

SRIOV resources

{
"cpu": "48",
"ephemeral-storage": "280411618864",
"hugepages-1Gi": "17Gi",
"intel.com/sriov_dpdk_vf": "0",
"intel.com/sriov_fec_vf": "1",
"intel.com/sriov_netdevice_vf": "15",
"intel.com/sriov_vfio_vf": "1",
"memory": "79530372Ki",
"pods": "110"
}

Environment:

  • DANM version (use danm -version): v4.2.0, commit: c0a4c15

  • Kubernetes version (use kubectl version): v1.18.6

  • DANM configuration (K8s manifests, kubeconfig files, CNI config file):

  • /etc/cni/net.d/00-danm.conf

{
"cniVersion": "0.3.1",
"name": "danm_meta_cni",
"type": "danm",
"kubeconfig": "/etc/cni/net.d/danm-kubeconfig",
"cniDir": "/etc/cni/net.d",
"namingScheme": ""
}

  • /etc/cni/net.d/10-flannel.conf

{
"name": "cbr0",
"cniVersion": "0.3.1",
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
}

  • kubeadm config view

apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.6
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/24
serviceSubnet: 10.96.0.0/16
scheduler: {}

  • /var/lib/kubelet/config.yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

  • OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): 3.10.0-1062.18.1.rt56.1044.el7.x86_64
  • Others:
@superfix906
Copy link
Author

To add some more information, I tried similar with dpdk's 'igb_uio' driver and was able to make it work, the dummy interface creation was successful, unlike the case of 'vfio-pci'. So this issue is specifically for devices bound to 'vfio-pci' driver alone. Any help on this will be appreciated, as vfio-pci is the way we want to move ahead. Thanks in advance !

@Levovar
Copy link
Collaborator

Levovar commented Aug 24, 2020

interesting issue cause i explicitly tested the scenario and it was working for me :)
so actually the error you see is coming from here:

return errors.New("cannot create dummy interface for DPDK because:" + err.Error())

at this point the IP address is actually not yet added to the link, we only set its MAC address! so the error is 1: coming from the kernel 2: must be MAC clash related, not IP

I can only think of two things why this can happen:

  • you are using stock CentOS 3.10 kernel, which is as old as time itself. it can happen it is buggy, and cannot differentiate between MAC addresses in different netnses, or did not recognize when a VF was rebound to vfio PCI and still thinks that MAC is really in use
    try updating to 4.19
  • how sure are you that resources belonging to "intel.com/sriov_vfio_vf" pool are really already bound to and managed by VFIO? can you double check and provide printouts?
    MAC address cashing can happen if the VF is managed by kernel, and was physically bound into the container's netns as well

but TBH my money is on the old kernel

@Levovar
Copy link
Collaborator

Levovar commented Aug 29, 2020

@superfix906 so managed to retest this recently with 82599 NICs (which model you are using BTW?), on CentOS 7.8 with 4.18 kernel
it works fine in all scenarios, with or without VLAN tag in the network. but one thing I noticed when VLAN is also used in the network we add the VF MAC address to both the dummy, and the VLAN interface on top of it
my kernel could tolerate it, but maybe the older ones could not?
I made this change to address it: #234 , but as you did not use VLAN tag in your network this is prob not the root cause

in any case, we did encounter such an error you describe in our evnrionment, but it only happened when DANM was asked to work with improperly setup VFs (binding to VFIO was not properly done before the Pod was created)
Considering the feature can be reliably used in our environment I strongly think the root cause is environment specific, and possibly related to either your kernel, or to improper device management in the host layer

@Levovar
Copy link
Collaborator

Levovar commented Sep 2, 2020

further debugged the problem. the error possibly appears when the MAC address of the VF is full zero. the kernel refuses to set it on the dummy interface
this can happen with some Intel drivers. the referenced PR now adds check for zero MAC, and only tries to set it on the dummy if it is a valid one, which should solve the problem
it is currently unclear whether the Intel drivers zero out both admin and effective MACs, or in some cases the SR-IOV CNI fails to properly reset the VF after use, because I did observe VFIO bound VFs to sometimes have MAC addresses, and sometimes don't.
So it is still kind of a mystery, but nevertheless whatever happens on the host level DANM will now behave more resilient :)

@superfix906
Copy link
Author

@Levovar Thanks a lot for the detailed research and inputs. Appreciate that !

Unfortunately, we have digressed from this at the moment. Shall update once we back at this again. Thanks again.

@Levovar
Copy link
Collaborator

Levovar commented Sep 11, 2020

@superfix906 no problem :)
meanwhile we have tested the change in our own environment, and it solves the reported problem so I will close the ticket

thanks again for reporting the case!

@Levovar Levovar closed this as completed Sep 11, 2020
@Levovar Levovar added the bug Something isn't working label Sep 11, 2020
@krsna1729
Copy link

we found setting the mac address apriori as part of node/device setup is better than leaving it zero mac. This prevents creation of random mac when DPDK enumerates the VFs for some models.

https://github.com/clearlinux/cloud-native-setup/blob/e74b3ca892ea04ec293d37e35f9815505141792e/clr-k8s-examples/9-multi-network/systemd/sriov.sh#L47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants