Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with older versions of k8s for basic clusters #8248

Merged
merged 1 commit into from
Jan 19, 2020

Conversation

hakman
Copy link
Member

@hakman hakman commented Jan 2, 2020

Currently, Kops is not able to start a basic cluster with Kubernetes < 1.12 for a few reasons:

  • PodPriority feature gate is needed for various manifests with Kubernetes < 1.11
  • the kube-dns manifest cannot be applied because of an empty resource {}

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 2, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @hakman. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 2, 2020
@hakman
Copy link
Member Author

hakman commented Jan 2, 2020

/assign @rifelpet
@johngmyers

@k8s-ci-robot
Copy link
Contributor

@hakman: GitHub didn't allow me to request PR reviews from the following users: johngmyers.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @rifelpet
/cc @johngmyers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tanjunchen
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 2, 2020
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 2, 2020
@johngmyers
Copy link
Member

I was able to create a validated cluster with Kubernetes 1.10.13. 1.9.11 did not come up.

If it's been broken with nobody complaining, I'm not sure the value of fixing it. Is it broken in kops 1.15?

@hakman
Copy link
Member Author

hakman commented Jan 2, 2020

@johngmyers anything 1.12+ should be ok. The value is only for minimum supported version. If we want to say 1.9+, these should be fixed. Otherwise, minimum supported version should be 1.12.

@johngmyers
Copy link
Member

Appears to have been broken in kops 1.15.0 per this commit

@johngmyers
Copy link
Member

The change is minimal enough. No objections.

@rifelpet
Copy link
Member

rifelpet commented Jan 3, 2020

I'd like to discuss this at office hours today. It seems that our two options are to either fix the broken support for these older k8s versions or update our deprecation policy to drop those versions as well.

@hakman
Copy link
Member Author

hakman commented Jan 3, 2020

Agreed @rifelpet. In any case, best to know what the blockers are first.
The broken template affects also 1.11. Without the fix support should be 1.12+.

@hakman
Copy link
Member Author

hakman commented Jan 5, 2020

With @rifelpet's help, e2e tests for older versions are up and running. Tweaks are still needed, but we can see what versions at least come up e2e-kops-aws-k8s-1.x:
https://testgrid.k8s.io/sig-cluster-lifecycle-kops#Summary

In some 1.11 jobs, the clusters are up enough to run some tests, but kube-dns cannot be installed by Protokube:

NAME			CURRENT	UPDATE
kube-dns.addons.k8s.io	-	1.14.13-kops.1
I0105 08:53:37.834943   28975 addon.go:139] Applying update from "s3://k8s-kops-prow/e2e-kops-aws-k8s-1.11.test-cncf-aws.k8s.io/addons/kube-dns.addons.k8s.io/k8s-1.6.yaml"
I0105 08:53:37.834984   28975 s3fs.go:219] Reading file "s3://k8s-kops-prow/e2e-kops-aws-k8s-1.11.test-cncf-aws.k8s.io/addons/kube-dns.addons.k8s.io/k8s-1.6.yaml"
I0105 08:53:37.981011   28975 apply.go:67] Running command: kubectl apply -f /tmp/channel407381304/manifest.yaml
I0105 08:53:38.126740   28975 apply.go:70] error running kubectl apply -f /tmp/channel407381304/manifest.yaml
I0105 08:53:38.126766   28975 apply.go:71] error: error validating "/tmp/channel407381304/manifest.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false
Error: error updating "kube-dns.addons.k8s.io": error applying update from "s3://k8s-kops-prow/e2e-kops-aws-k8s-1.11.test-cncf-aws.k8s.io/addons/kube-dns.addons.k8s.io/k8s-1.6.yaml": error running kubectl

For some reason the tests don't validate the cluster before preceding, maybe we should add this instead of the simple Node Ready check.

@hakman
Copy link
Member Author

hakman commented Jan 11, 2020

Based on discussions we had at last office hours, assigning to Justin to check why PodPriority is needed.
/assign @justinsb

@johngmyers
Copy link
Member

johngmyers commented Jan 11, 2020

We need priorityClassName because support for the scheduler.alpha.kubernetes.io/critical-pod annotation was removed in Kubernetes 1.16. It wouldn't be good having things like kubelet or kube-proxy be evicted in order to make room for other pods.

@johngmyers
Copy link
Member

johngmyers commented Jan 11, 2020

The PR for the aforementioned commit was #6897. Looks like it got cherry-picked back to kops 1.13.

@justinsb
Copy link
Member

So I dug into this. As @hakman pointed out there are two things happening:

  • The empty blocks mapping to {} is a bug, though one that only affects manifests that have empty objects. So I like the idea of fixing it specifically for kube-dns here, and I'm trying to fix it in general in Don't output empty sections in the manifests #8317 so that we'll just skip over them. kubectl normally tolerates an empty section, but the {} is an artifact of our remapping process and kubectl doesn't like it.

  • The pod priority class issue was indeed introduced through a series of commits, and at least in 1.9 we get the error spec.priorityClassName: Forbidden: Pod priority is disabled by feature-gate - normally kubernetes ignores unknown fields, so this is a little inconsistent, but I think @johngmyers has identified the sequence of PRs that led to this being broken.

I think we now have more testing for older versions (thanks @rifelpet and @hakman), and we understand how this happened.

The other option to the fix in this PR is to not set the PodPriorityClass for the older versions of k8s (before 1.11) - i.e. either skip MarkPodAsClusterCritical or pass in the cluster and have it be a no-op for older clusters versions. As we're likely dropping support for < 1.11 very soon, this would only matter for a cherry pick of this PR; arguably it is more consistent with our policy of not changing existing versions. But OTOH it's already broken...

@johngmyers
Copy link
Member

I don't recall our deprecation policy as intending to take on a new requirement to fix preexisting brokenness in handling ancient versions. I don't object to taking a fix as minimal as this PR, but I would not want to spend the effort fixing, say, kops 1.16's handling of k8s 1.6. I think adding logic to make the setting of podPriorityClass conditional on version (and presumably whether the feature gate is enabled) would be a bit much, especially since we haven't yet gotten an actual user complaint.

@hakman
Copy link
Member Author

hakman commented Jan 12, 2020

In theory, checking the k8s version before using MarkPodAsClusterCritical() should be pretty simple.

In practice, for etcd is not so trivial. I can't find a simple way to check the Kubernetes version here:

kubemanifest.MarkPodAsClusterCritical(pod)

Unless there is something I'm missing, the feature gate is the simplest way to get things working again for 1.9 and 1.10.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 15, 2020
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 17, 2020
@hakman
Copy link
Member Author

hakman commented Jan 17, 2020

@justinsb removed the PodPriority part from this, will create a separate PR for explaining that 1.9 requires PodPriority feature gate.

@johngmyers
Copy link
Member

/retest

@johngmyers
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2020
@johngmyers
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 19, 2020
@hakman
Copy link
Member Author

hakman commented Jan 19, 2020

Release notes added for PodPriority feature gate requirement for Kubernetes 1.9. I think this is now in line with what we discussed in office hours.

@johngmyers
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 19, 2020
@rifelpet
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2020
@k8s-ci-robot k8s-ci-robot merged commit b926413 into kubernetes:master Jan 19, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 19, 2020
k8s-ci-robot added a commit that referenced this pull request Jan 19, 2020
…-origin-release-1.17

Automated cherry pick of #8248: Fix issues with older versions of k8s for basic clusters
@hakman hakman deleted the fix-old-k8s branch January 19, 2020 19:20
k8s-ci-robot added a commit that referenced this pull request Jan 20, 2020
…-origin-release-1.16

Automated cherry pick of #8248: Fix issues with older versions of k8s for basic clusters
k8s-ci-robot added a commit that referenced this pull request Jan 23, 2020
…-origin-release-1.15

Automated cherry pick of #8248: Fix issues with older versions of k8s for basic clusters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants