Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable disk based evictions for Kubernetes 1.19 #9296

Merged
merged 2 commits into from
Jun 8, 2020

Conversation

hakman
Copy link
Member

@hakman hakman commented Jun 8, 2020

e2e pre-submits and periodic e2e for 1.19. Logs indicate it is related to the eviction manager that cannot correctly retrieve the disk stats:
https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/e2e-kops-aws-k8s-latest/1269848278476787712/

W0608 04:40:31.222453    4996 eviction_manager.go:339] eviction manager: attempting to reclaim ephemeral-storage
I0608 04:40:31.222488    4996 container_gc.go:85] attempting to delete unused containers
I0608 04:40:31.225274    4996 image_gc_manager.go:322] attempting to delete unused images
I0608 04:40:31.231317    4996 eviction_manager.go:350] eviction manager: must evict pod(s) to reclaim ephemeral-storage
I0608 04:40:31.231341    4996 eviction_manager.go:368] eviction manager: pods ranked for eviction: kube-proxy-ip-172-20-42-239.ap-northeast-2.compute.internal_kube-system(98b80b03aaa21bd38ed134d9616a3730)
E0608 04:40:31.231362    4996 eviction_manager.go:569] eviction manager: cannot evict a critical pod kube-proxy-ip-172-20-42-239.ap-northeast-2.compute.internal_kube-system(98b80b03aaa21bd38ed134d9616a3730)
I0608 04:40:31.231370    4996 eviction_manager.go:391] eviction manager: unable to evict any pods from the node
I0608 04:40:40.627941    4996 image_gc_manager.go:305] [imageGCManager]: Disk usage on image filesystem is at 100% which is over the high threshold (85%). Trying to free 3774873 bytes down to the low threshold (80%).
E0608 04:40:40.631282    4996 kubelet.go:1231] Image garbage collection failed multiple times in a row: failed to garbage collect required amount of images. Wanted to free 3774873 bytes, but freed 0 bytes
I0608 04:40:40.631730    4996 kubelet_getters.go:173] "Pod status updated" pod="kube-system/kube-proxy-ip-172-20-42-239.ap-northeast-2.compute.internal" status="Running"

I tried to understand what exactly happens, but I am stuck at finding the root cause. For know, I know that:

  • Kubernetes 1.18 works OK
  • using containerd 1.3.4 as container runtime works OK
  • restarting kubelet.service fixes the issue
  • using Docker 19.03.x + Kubernetes 1.19 doesn't work

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 8, 2020
@k8s-ci-robot k8s-ci-robot requested review from mikesplain and zetaab June 8, 2020 08:12
@olemarkus
Copy link
Member

/lgtm

Looks like the cluster passed the validation step \o/

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 8, 2020
@hakman
Copy link
Member Author

hakman commented Jun 8, 2020

@olemarkus Yes, I think I tracked the issue with failing e2e in 1.19 to this. Once I have it passed I will update the description.

@rifelpet
Copy link
Member

rifelpet commented Jun 8, 2020

Can we open a kops GitHub issue that tracks this? And label it blocks-next so that we don’t forget :)
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2020
@k8s-ci-robot k8s-ci-robot merged commit 0cfcd5f into kubernetes:master Jun 8, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jun 8, 2020
@hakman hakman deleted the disable-disk-evictions branch July 1, 2020 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants