Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning node taints to system nodepool not working anymore, failing with SystemPoolHasRestrictedTaint #2578

Closed
nbusseneau opened this issue Oct 1, 2021 · 5 comments
Labels
Feedback General feedback question resolution/answer-provided Provided answer to issue, question or feedback.

Comments

@nbusseneau
Copy link

nbusseneau commented Oct 1, 2021

Hello 😄

What happened: Creating system nodepools with node taints is not working since 2021-09-28, failing with the following error:

(SystemPoolHasRestrictedTaint) Operation failed due to insufficient nodes for system pod scheduling. Placing custom taints on system pool is not supported(except 'CriticalAddonsOnly' taint or taint effect is 'PreferNoSchedule'). Please refer to https://aka.ms/aks/system-taints for detail

What you expected to happen: System nodepool is created with the node taint applied.

How to reproduce it (as minimally and precisely as possible): Using the CLI:

az group create --name foo --location westeurope
az aks create --resource-group foo --name bar
az aks nodepool add --name nodepool2 --resource-group foo --cluster-name bar \
  --mode system --node-taints foo=bar:NoSchedule

Anything else we need to know?:

We recommend users to apply taint node.cilium.io/agent-not-ready=true:NoSchedule to nodepools when using cilium/cilium (CNI plugin) to prevent application pods from being managed by the default AKS CNI plugin.

We test this behaviour in our AKS CI, via an automated GitHub Actions workflow using the Azure CLI (see source command lines here).

Thanks!

@ghost ghost added the triage label Oct 1, 2021
@ghost
Copy link

ghost commented Oct 1, 2021

Hi nbusseneau, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@nbusseneau nbusseneau changed the title Assigning node taints to system nodepool not working anymore, failing with SystemPoolHasRestrictedTaint aks: assigning node taints to system nodepool not working anymore, failing with SystemPoolHasRestrictedTaint Oct 1, 2021
@nbusseneau nbusseneau changed the title aks: assigning node taints to system nodepool not working anymore, failing with SystemPoolHasRestrictedTaint Assigning node taints to system nodepool not working anymore, failing with SystemPoolHasRestrictedTaint Oct 1, 2021
@palma21
Copy link
Member

palma21 commented Oct 1, 2021

AKS System pools only allow 1 taint CriticalAddonsOnly=true:NoSchedule or soft taints, so that behavior is by design per the documentation.
https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools

AKS User pools support any taint.

You can still any any taint manually via kubectl in a per node / temporary basis, but not to the node pool model/permanently.

caveat: it is not possible to apply node taints to the initial nodepool of AKS clusters at creation time

This is possible now, that linked issue is not just waiting for update taints to be supported.

@palma21 palma21 added Feedback General feedback question resolution/answer-provided Provided answer to issue, question or feedback. labels Oct 1, 2021
@ghost ghost removed the triage label Oct 1, 2021
@nbusseneau
Copy link
Author

@palma21 Thanks for your answer.

AKS System pools only allow 1 taint CriticalAddonsOnly=true:NoSchedule or soft taints

Does that mean that when it was working (before 2021-09-28), this was actually not intended and should never have been working in the first place?

If that is indeed the case, do you have any recommendation on pod management by third-party CNI plugins?

so that behavior is by design per the documentation.

Nit: I might have missed where this is specified, but I don't see where in the documentation it is specified that system nodepools can only use CriticalAddonsOnly=true:NoSchedule or soft taints => I would suggest clarifying this in the list of system nodepools restrictions.

caveat: it is not possible to apply node taints to the initial nodepool of AKS clusters at creation time

This is possible now, that linked issue is not just waiting for update taints to be supported.

Oh, is it? I don't see --node-taints in the az aks create documentation, and it does not seem to work with CLI version 2.28.0. Am I missing something?

@ghost
Copy link

ghost commented Oct 4, 2021

Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.

@ghost ghost closed this as completed Oct 4, 2021
@nbusseneau
Copy link
Author

nbusseneau commented Oct 4, 2021

I still had questions though... :/

nbusseneau added a commit to cilium/cilium that referenced this issue Oct 4, 2021
Context: we recommend recommend users taint all nodepools with
`node.cilium.io/agent-not-ready=true:NoSchedule` to prevent application
pods from being managed by the default AKS CNI plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on the system node pool.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
nbusseneau added a commit to cilium/cilium that referenced this issue Oct 12, 2021
Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
nbusseneau added a commit to cilium/cilium that referenced this issue Oct 13, 2021
Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
aanm pushed a commit to cilium/cilium that referenced this issue Oct 13, 2021
Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
nbusseneau added a commit to cilium/cilium-cli that referenced this issue Oct 15, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
joamaki pushed a commit to joamaki/cilium that referenced this issue Oct 18, 2021
[ upstream commit 2cb55ca ]

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
Signed-off-by: Jussi Maki <[email protected]>
nbusseneau added a commit to cilium/cilium-cli that referenced this issue Oct 18, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
joamaki pushed a commit to cilium/cilium that referenced this issue Oct 19, 2021
[ upstream commit 2cb55ca ]

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
Signed-off-by: Jussi Maki <[email protected]>
nbusseneau added a commit to cilium/cilium-cli that referenced this issue Oct 19, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
aanm pushed a commit to cilium/cilium-cli that referenced this issue Oct 20, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feedback General feedback question resolution/answer-provided Provided answer to issue, question or feedback.
Projects
None yet
Development

No branches or pull requests

2 participants