-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to "KillPodSandbox" due to calico connection is unauthorized #7220
Comments
@sysnet4admin could you please fill in the template with more information about your issue? |
@coutinhop Thank you for letting me know empty issue that I upload. |
Duplicated[ pod for all namespaces ]
[ Describe for terminating pod ]
[calico-#7220-cluster-info.dump] |
Plus workaround is effective.
|
same behaviour
cluster info
|
k8s v1.26.1 + calico_v3.17.1 = NOT duplicated (i.e. solved issue with this version) |
Could there be something else in your cluster invalidating the tokens somehow? |
Facing same issue. We refrain using workaround so any updates on how to get rid of this issue? I'm using - |
I have the same problem, if u think that's clear in master, maybe the problem can solving at another node or worker. |
@caseydavenport |
I am seeing the same issue. After storage was extended on the device. |
Also facing this issue in canal as well. Causing a lot of headaches in my production cluster. any ideas on how to fix this? EDIT: As suggested before a reboot
I will see if after a few days if the issue persists |
Was this missing from the manifest in our docs? Or just the manifest in your cluster? Make sure when upgrading that you are pulling the latest manifest from our release. |
Sorry for not being more clear, I use rancher rke to bootstrap my cluster and it seems they didn't have the latest manifests. |
Facing same issue using k8s: v1.27.0 & calico: v3.25.1. Installed calico using calico manifest. |
I found one ambiguity that can make this confusing. TokenWatcher ignores the |
I rebuilt calico 3.25.1 with a very low token ttl to understand what the flow is and I think the way this works at the moment when The way
Also noticed a "hard" failure in install-cni if I think there are some strange interactions between manually setting this variable and how #7106 would work in a future release. Can someone with more context comment on what's the expected flow here? |
This workaround didn't help me, but deleting the files from folder /etc/cni/net.d/* worked for me. |
FYI
|
|
For all who are still struggling with this issue: take a look to the logs of your calico-node pod. I had the same problem and found out that the ServiceAccount "calico-node" was not permitted to create a "serviceaccounts/token" ressource because it was restricted to the ressource name "calico-cni-plugin". I removed the restriction to "calico-cni-plugin" and it works now. |
Would you care to explain this and the steps please? |
As of Calico v3.26, the |
A little bit background information: at my company we are using IBM Cloud and their Kubernetes cluster which we updated 13 days ago. Yesterday we noticed that CRUD operations on any pod are failing. Regarding to cluster roles only one - in fact kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-cni-plugin
uid: bf8ebe3c-f491-4e63-bdb9-1801826917e5
resourceVersion: '109270438'
creationTimestamp: '2023-06-28T22:52:52Z'
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"calico-cni-plugin"},"rules":[{"apiGroups":[""],"resources":["pods","nodes","namespaces"],"verbs":["get"]},{"apiGroups":[""],"resources":["pods/status"],"verbs":["patch"]},{"apiGroups":["crd.projectcalico.org"],"resources":["blockaffinities","ipamblocks","ipamhandles","clusterinformations","ippools","ipreservations","ipamconfigs"],"verbs":["get","list","create","update","delete"]}]}
managedFields:
- manager: kubectl-client-side-apply
operation: Update
apiVersion: rbac.authorization.k8s.io/v1
time: '2023-06-28T22:52:52Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
- manager: dashboard
operation: Update
apiVersion: rbac.authorization.k8s.io/v1
time: '2023-07-12T17:12:23Z'
fieldsType: FieldsV1
fieldsV1:
f:rules: {}
rules:
- verbs:
- get
apiGroups:
- ''
resources:
- pods
- nodes
- namespaces
- verbs:
- patch
apiGroups:
- ''
resources:
- pods/status
- verbs:
- get
- list
- create
- update
- delete
apiGroups:
- crd.projectcalico.org
resources:
- blockaffinities
- ipamblocks
- ipamhandles
- clusterinformations
- ippools
- ipreservations
- ipamconfigs Version:
Is the role calico-cni-plugin supposed to be allowed to create serviceaccount tokens? What specific log do you want to take a look at? Best regards and thank you for your help! EDIT: unfortunetaly the logs of calico-node has been overwritten. But I can remember that it showed something like "service account 'calico-node:kube-system' has no permission to obtain a token". EDIT2: just for a test I added again the resourceName 'calico-cni-plugin' to the service account token creation rule for the 'calico-node' cluster role and seems not work. The
|
Nope, the calico-cni-plugin serviceaccount should not be able to make tokens. However,
This is interesting - it sounds like you're running with the code from Calico v3.25, but the RBAC from Calico v3.26, which would result in the problems you're seeing. The v3.25 code expects to have this RBAC:
Where as v3.26 expects this:
|
We were able to fix this problem now. Our master node's had an incorrect version which ruined everything. An update of our master node was fortunately the solution without any hacky workarounds. But thanks for the help - I appreciate that! |
This works for me, but I wonder why this happens? I've had to reboot calico at least a few times in the past months, I wonder if I have anything misconfigured |
After some period, Pods cannot create and delete with this message
It seems to be relate with the service account of policy changed from kubernetes
v1.26.0
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#manual-secret-management-for-serviceaccounts
Here is the workaround of solution.
re-read calico-node information by restart or delete.
Expected Behavior
kubectl
create
ordelete
is working fine.Current Behavior
It won't work properly
Possible Solution
`Workaround' is restart daemonset or delete pod.
OR
'Possible Solution' is that create a long period secret token for service account instead of this.
and use this secret with service account for calico-node. (it is related with #5712 #6421)
Steps to Reproduce (for bugs)
Context
It already applied to the code from #6218
node/pkg/cni/token_watch.go
So I decoded applied JWT on the calico-node.
It confirmed 1 year(365d) properly.
JWT
Decoded JWT's Payload
Thus this issue is a little different logic to verify the authorization from kubernetes.
/var/log/message
from all nodes like below when it happened.[control-plane node]
[worker node]
Your Environment
The text was updated successfully, but these errors were encountered: