Skip to content
This repository has been archived by the owner on Sep 8, 2022. It is now read-only.

Installation steps using helm #14

Open
2 tasks done
rverma-dev opened this issue Jan 10, 2022 · 9 comments
Open
2 tasks done

Installation steps using helm #14

rverma-dev opened this issue Jan 10, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@rverma-dev
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

  1. In digital ocean tried to install using
cilium install --version -service-mesh:v1.11.0-beta.1 --config enable-envoy-config=true --kube-proxy-replacement=probe 

but getting errors like

controller endpoint-769-regeneration-recovery is failing since 37s (24x): regeneration recovery failed
  1. Even tried cilium uninstall and simple cilium install --kube-proxy-replacement=probe, but that also gave same error.
  2. Then tried simply
helm install cilium cilium/cilium \
    --version 1.11.0 \
    --namespace kube-system

and this went fine.

Cilium Version

1.11.0

Kernel Version

NA

Kubernetes Version

1.21.5

Sysdump

Uploading cilium-sysdump-20220110-102642.zip…

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@pchaigno pchaigno added the bug Something isn't working label Jan 10, 2022
@pchaigno
Copy link
Member

Thanks for the report!

Could you try to upload the Cilium sysdump again? It seems you submitted the issue before uploading was finished.

@rverma-dev
Copy link
Author

Trial 2: EKS

tried installing using helm

helm upgrade --install cilium cilium/cilium --version=1.11.0 \
             --namespace kube-system --set eni.enabled=true \
             --set ipam.mode=eni --set egressMasqueradeInterfaces=eth0 \
             --set loadBalancer.algorithm=maglev --set hubble.enabled=true  \
             --set hubble.relay.enabled=true --set hubble.ui.enabled=false \
             --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \
             --set kubeProxyReplacement="strict" \
             --set k8sServiceHost=$API_SERVER_IP --set k8sServicePort=443 \
             --set-string extraConfig.enable-envoy-config="true" \
             --set image.repository=quay.io/cilium/cilium-service-mesh \
             --set image.tag=v1.11.0-beta.1 \
             --set image.useDigest=false \
             --set operator.image.suffix=-service-mesh \
             --set operator.image.useDigest=false \
             --set operator.replicas=1 \
             --set operator.image.tag=v1.11.0-beta.1

Got below error.
Helm need rbac update and there seems other bpf issues too.

level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=40.842791ms bpfWaitForELF="3.806µs" bpfWriteELF="697.761µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.285µs" policyCalculation="3.206µs" prepareBuild="623.979µs" proxyConfiguration="7.414µs" proxyPolicyCalculation="2.816µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=43.733597ms waitingForCTClean=201ns waitingForLock=773ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/cilium_clusterwide_network_policy.go:93: failed to list *v2.CiliumClusterwideNetworkPolicy: ciliumclusterwidenetworkpolicies.cilium.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"ciliumclusterwidenetworkpolicies\" in API group \"cilium.io\" at the cluster scope" subsys=klog
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/watchers/cilium_clusterwide_network_policy.go:93: Failed to watch *v2.CiliumClusterwideNetworkPolicy: failed to list *v2.CiliumClusterwideNetworkPolicy: ciliumclusterwidenetworkpolicies.cilium.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"ciliumclusterwidenetworkpolicies\" in API group \"cilium.io\" at the cluster scope" subsys=k8s
level=warning msg="Unable to update CiliumNode custom resource" error="ciliumnodes.cilium.io \"ip-192-168-113-75.ec2.internal\" is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot update resource \"ciliumnodes/status\" in API group \"cilium.io\" at the cluster scope" subsys=ipam
level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:143: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" at the cluster scope" subsys=klog
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:143: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" at the cluster scope" subsys=k8s
level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=55.841988ms bpfWaitForELF="4.595µs" bpfWriteELF="745.463µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.409µs" policyCalculation="6.682µs" prepareBuild="598.265µs" proxyConfiguration="7.357µs" proxyPolicyCalculation="2.824µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=60.28447ms waitingForCTClean=197ns waitingForLock=836ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=62.441743ms bpfWaitForELF="4.154µs" bpfWriteELF="840.205µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.732µs" policyCalculation="3.232µs" prepareBuild="795.148µs" proxyConfiguration="7.916µs" proxyPolicyCalculation="3.243µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=66.318589ms waitingForCTClean=208ns waitingForLock="1.053µs"
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint

@sealneaward
Copy link

I received the same errors in both installation methods on a 1.21 EKS cluster.

@ghouscht
Copy link

Same issue here on a bare-metal cluster. @pchaigno did you get a sysdump? If not I can share one privately with you.

@pchaigno
Copy link
Member

@ghouscht I didn't receive a sysdump yet. If you could share one, that would help as it would allow us to confirm this is a complexity issue caused by the lack of kernel support for KPR. I'm pchaigno on Slack as well.

@gkjsa
Copy link

gkjsa commented Jan 27, 2022

same for me on Azure:

cilium install \
    --context xxxxxx \
    --cluster-name xxxxxx \
    --cluster-id 1 \
    --azure-resource-group xxxxxx \
    --azure-subscription-id xxxxx \
    --azure-client-id xxxxx \
    --azure-client-secret xxxxxx \
    --azure-tenant-id xxxxxx \
    --version -service-mesh:v1.11.0-beta.1 \
    --config enable-envoy-config=true \
    --kube-proxy-replacement=probe

results in

cilium-qj48r cilium-agent level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 3826_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_03826': parameter mismatch" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: map 'cilium_calls_hostns_03826': error reusing pinned map" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: map 'cilium_calls_hostns_03826': failed to create: Invalid argument(-22)" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: failed to load object '3826_next/bpf_host.o'" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="Unable to load program" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" file-path=3826_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
cilium-qj48r cilium-agent level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
cilium-qj48r cilium-agent level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 file-path=3826_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
cilium-qj48r cilium-agent level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=35.925896ms bpfWaitForELF="3.8µs" bpfWriteELF="769.615µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.4µs" policyCalculation="3.8µs" prepareBuild="563.111µs" proxyConfiguration="9.301µs" proxyPolicyCalculation="4µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=38.809751ms waitingForCTClean=300ns waitingForLock=900ns
cilium-qj48r cilium-agent level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint

A previous installation with Cilium 1.11.1 went fine on the same cluster (AKS 1.21.7).

@pchaigno
Copy link
Member

cc @jrajahalme

@gkjsa
Copy link

gkjsa commented Feb 1, 2022

I assume this is because of reusing AKS clusters that have been in clustermesh mode previously.
Clustermesh has been disabled before but probably some settings still exist.

@Jiang1155
Copy link

I had similar issue. I got following errors:

2022-06-25T00:32:12.685918484Z level=error msg="Command execution failed" cmd="[ip -force link set dev eth0 xdpgeneric obj /var/run/cilium/state/bpf_xdp.o sec from-netdev]" error="exit status 255" subsys=datapath-loader
2022-06-25T00:32:12.685965305Z level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/xdp//globals/cilium_calls_xdp': parameter mismatch" subsys=datapath-loader
2022-06-25T00:32:12.685972334Z level=warning msg="libbpf: map 'cilium_calls_xdp': error reusing pinned map" subsys=datapath-loader
2022-06-25T00:32:12.685977404Z level=warning msg="libbpf: map 'cilium_calls_xdp': failed to create: Invalid argument(-22)" subsys=datapath-loader
2022-06-25T00:32:12.685981737Z level=warning msg="libbpf: failed to load object '/var/run/cilium/state/bpf_xdp.o'" subsys=datapath-loader
2022-06-25T00:32:12.694436571Z level=fatal msg="Failed to compile XDP program" error="Failed to load prog with ip: exit status 255" subsys=datapath-loader
2022-06-25T00:32:14.062967388Z level=info msg="regenerating all endpoints" reason="kube-apiserver identity updated" subsys=endpoint-manager

This happened when I downgraded cilium from a newer version to old v1.11.1. And only happened when I enable xdp via
bpf-lb-acceleration: testing-only

I have two nodes. I reload one node and it recovered from the error. I tried to get sysdump ( I guess now it calls debuginfo?) . But I can only get it from the recovered cilium pod. For the crashing one, I cannot get it since it keeps crashing. I uploaded the file here anyway.

para-mismatch.log

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants