Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Nodes Don't Currently Support Out of Memory Eviction (OOMKILL) #2820

Closed
ojfw20 opened this issue Mar 2, 2022 · 10 comments
Closed

Windows Nodes Don't Currently Support Out of Memory Eviction (OOMKILL) #2820

ojfw20 opened this issue Mar 2, 2022 · 10 comments
Assignees

Comments

@ojfw20
Copy link

ojfw20 commented Mar 2, 2022

What happened: Pods fail to start on a Windows nodepool after resource demands increase past node capacity. Pods on a Windows node also start paging to disk when the node runs out of memory.

What you expected to happen: OOMKill feature triggers scheduling of pods on a node with free memory, allowing pods to start as expected.

How to reproduce it (as minimally and precisely as possible): Overallocate a Windows node with pods, trigger pods to request more memory than the node can provide.

Anything else we need to know?: We have been informed that OOMKill is not support on Windows nodes. This seems to be a gaping hole in the feasibility of using Windows nodepools for any sort of elastic scalability. We would like to see OOMKill supported on Windows nodepools.

https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/#kubelet-compatibility agrees, and states that:

  • The (Windows) kubelet does not take OOM eviction actions

  • Eviction by using --enforce-node-allocable is not implemented

  • Eviction by using --eviction-hard and --eviction-soft are not implemented

Environment:

  • Kubernetes version (use kubectl version): 1.21.7
  • Size of cluster (how many worker nodes are in the cluster?): 4 Windows Nodes, 2 Linux Nodes
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.):
@ghost ghost added the triage label Mar 2, 2022
@ghost
Copy link

ghost commented Mar 2, 2022

Hi ojfw20, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@ghost ghost added the action-required label Mar 4, 2022
@ghost
Copy link

ghost commented Mar 4, 2022

Triage required from @Azure/aks-pm

@ojfw20
Copy link
Author

ojfw20 commented Mar 9, 2022

/sig windows

@phealy phealy added feature-request Requested Features windows labels Mar 9, 2022
@ghost
Copy link

ghost commented Mar 9, 2022

@immuzz, @justindavies would you be able to assist?

Issue Details

What happened: Pods fail to start on a Windows nodepool after resource demands increase past node capacity. Pods on a Windows node also start paging to disk when the node runs out of memory.

What you expected to happen: OOMKill feature triggers scheduling of pods on a node with free memory, allowing pods to start as expected.

How to reproduce it (as minimally and precisely as possible): Overallocate a Windows node with pods, trigger pods to request more memory than the node can provide.

Anything else we need to know?: We have been informed that OOMKill is not support on Windows nodes. This seems to be a gaping hole in the feasibility of using Windows nodepools for any sort of elastic scalability. We would like to see OOMKill supported on Windows nodepools.

https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/#kubelet-compatibility agrees, and states that:

  • The (Windows) kubelet does not take OOM eviction actions

  • Eviction by using --enforce-node-allocable is not implemented

  • Eviction by using --eviction-hard and --eviction-soft are not implemented

Environment:

  • Kubernetes version (use kubectl version): 1.21.7
  • Size of cluster (how many worker nodes are in the cluster?): 4 Windows Nodes, 2 Linux Nodes
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.):
Author: ojfw20
Assignees: -
Labels:

feature-request, triage, windows, action-required

Milestone: -

@ojfw20
Copy link
Author

ojfw20 commented Apr 27, 2022

Hi! Any update on this?

@AbelHu
Copy link
Member

AbelHu commented Sep 2, 2022

I think that this needs upstream support. cc @allyford

@allyford allyford self-assigned this Sep 5, 2022
@ojfw20
Copy link
Author

ojfw20 commented Feb 8, 2023

bump @allyford

@AbelHu
Copy link
Member

AbelHu commented Jul 10, 2023

Reference kubernetes/kubernetes#119184

@allyford
Copy link
Contributor

Reference kubernetes/kubernetes#119184

Based on the update here, creating a separate feature request specifically for adding the new kubelet parameters from upstream into AKS. See #4068

@allyford
Copy link
Contributor

Closing this issue. Upstream investigations of node conditions that lead to evictions can be found here: kubernetes/kubernetes#119184

Now that upstream has supported memory based eviction for windows, using #4068 for tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants