-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflows: update AKS workflows with new taints #584
Conversation
8b5825f
to
ca35089
Compare
Link to workflow run testing PR changes: https://github.com/cilium/cilium-cli/actions/runs/1346273735 |
@tklauser Weird issue while trying to run
This does not ring a bell to me: why would there already be a |
Not sure why that would be to be honest, also it looks like the CA is already existing, which is somewhat unexpected in a fresh cluster:
Is it possible that we somehow ended up (re-)using an existing cluster where Cilium was previously installed and not properly uninstalled? |
Not possible as the clusters are unique per workflow run, so they are completely clean when created. I'll try to reproduce manually on Monday... |
From the sysdump it looks like Cilium is already fully deployed before the installation of the agent DaemonSet and operator Deployment from the
|
ca35089
to
fdc073c
Compare
I found the issue. We use an in-cluster script to run
|
fdc073c
to
c6ede4e
Compare
New link to workflow run testing workflow changes: https://github.com/cilium/cilium-cli/actions/runs/1355353366 |
c6ede4e
to
2eac3f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Immediate solution (implemented in next push): wait at least for the initial system nodepool to be deleted before installing our Helm chart.
Should we push this change to cilium/cilium as well for consistency?
Probably worth a comment in the code, since reading the rest, one would expect a --no-wait
in the delete command as well.
I would personally prefer not to. Arguments:
Can do :) |
Re-impacted from: cilium/cilium#17529 Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent application pods from being managed by the default AKS CNI plugin. To this end, the proposed workflow users should follow when installing Cilium into AKS was to replace the initial AKS node pool with a new tainted system node pool, as it is not possible to taint the initial AKS node pool, cf. Azure/AKS#1402. AKS recently pushed a change on the API side that forbids setting up custom taints on system node pools, cf. Azure/AKS#2578. It is not possible anymore for us to recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent application pods from being managed by the default AKS CNI plugin. To work around this new limitation, we propose the following workflow instead: - Replace the initial node pool with a system node pool tainted with `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from being scheduled on it. - Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent application pods from being scheduled on the user node pool until Cilium is ready to manage them. Signed-off-by: Nicolas Busseneau <[email protected]>
2eac3f9
to
5853ed3
Compare
https://github.com/cilium/cilium-cli/actions/workflows/aks.yaml can be re-enabled once this is merged :) |
Re-impacted from: cilium/cilium#17529
Context: we recommend users taint all nodepools with
node.cilium.io/agent-not-ready=true:NoSchedule
to prevent application pods from being managed by the default AKS CNI plugin.To this end, the proposed workflow users should follow when installing Cilium into AKS was to replace the initial AKS node pool with a new tainted system node pool, as it is not possible to taint the initial AKS node pool, cf. Azure/AKS#1402.
AKS recently pushed a change on the API side that forbids setting up custom taints on system node pools, cf. Azure/AKS#2578.
It is not possible anymore for us to recommend users taint all nodepools with
node.cilium.io/agent-not-ready=true:NoSchedule
to prevent application pods from being managed by the default AKS CNI plugin.To work around this new limitation, we propose the following workflow instead:
CriticalAddonsOnly=true:NoSchedule
, preventing application pods from being scheduled on it.node.cilium.io/agent-not-ready=true:NoSchedule
to prevent application pods from being scheduled on the user node pool until Cilium is ready to manage them.