-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump CoreDNS version to 1.6.5 and update manifest #85108
Bump CoreDNS version to 1.6.5 and update manifest #85108
Conversation
/priority important-soon |
@rajansandeep this is very close the release, but i'm going to try to review it later.
we should push the image fist. |
image is pushing #84993 (comment) |
@@ -223,7 +223,10 @@ metadata: | |||
labels: | |||
k8s-app: kube-dns | |||
spec: | |||
replicas: 2 | |||
# replicas: not specified here: | |||
# 1. In order to make Addon Manager do not reconcile this replicas parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does kubeadm have an "addon manager" ? @neolit123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it does not have one, per se.
it has "phases" that manage addons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment can be:
# Default replica count is 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's what I thought re: phases.
I think the rest of the details in this comment make more sense for kube-up and less sense for kubeadm (presuming this is referring to the "Addon manager" in cluster/)
/test pull-kubernetes-e2e-kind |
@@ -313,7 +325,9 @@ data: | |||
Corefile: | | |||
.:53 { | |||
errors | |||
health | |||
health { | |||
lameduck 12s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 12s the timeout for the health check in this case?
the timeout for CP components is 15s, so we may want to match that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - as I was describing the reasoning behind this, I realized that a timeout of 5 seconds should be all that is necessary. When picking 12s, I was conflating the issue with the readiness/health check periods, which I don't think actually come into play. The function of lameduck is to finish processing in flight queries before shutting down. A lameduck of longer than 5 would typically be pointless, since most clients have a default timeout of 5 seconds (and thus would have stopped listening for a response after then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, 5 seems good if that is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajansandeep do you agree with the change to 5 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I'll push a commit to reflect those changes.
/retest |
/hold cancel |
- key: k8s-app | ||
operator: In | ||
values: ["kube-dns"] | ||
topologyKey: kubernetes.io/hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajansandeep could you please explain the motivation?
my understanding is the following:
- we reduce the replica count to 1.
- coredns will deploy on the primary CP node (where kubeadm init is called).
- the anti-affinity rule makes sure that the Pod will not schedule on a Node that already has it.
if i'm not mistaken, this will not improve much what we have right now.
a problem we have currently, is that both replicas land on the same primary CP Node.
ideally what we want is a coredns instance to be deployed on all CP Nodes.
one way of doing that is with static-pods, but given we treat coredns as an addon we should use a DaemonSet with a NodeSelector that matches the kubeadm "master" node-role.
i'm going to experiment with that in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly, by changing the coredns object type we are going to break a lot of users that have automation around kubectl patch deployment coredns...
, so such a change is not a great idea without a grace period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neolit123
With pod anti-affinity enabled and 2 coredns replicas:
- If a user has only a master node installed via
kubeadm init
, there will be one coredns pod inrunning
state and one inpending
state. - The other coredns pod will remain in
pending
state and waits for scheduling until another worker node is created viakubeadm join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a problem we have currently, is that both replicas land on the same primary CP Node.
Pod anti-affinity solves this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With pod anti-affinity enabled and 2 coredns replicas:
- If a user has only a master node installed via kubeadm init, there will be one coredns pod in running state and one in pending state.
- The other coredns pod will remain in pending state and waits for scheduling until another worker node is created via kubeadm join.
this is true for 2 replicas and antit-affinity, we don't want Pending pods because it will break e2e tests using our test suite, where all pods are expected to be Ready.
a problem we have currently, is that both replicas land on the same primary CP Node.
Pod anti-affinity solves this problem.
yes. but we reduce the replicas to 1, so if the primary CP node becomes NotReady (e.g. shutdown) the coredns service will still go down and the pod will not reschedule on a Ready node. (same happens for 2 replicas, without anti-affinity).
i guess i'm trying to see how 1 replica with anti-affinity is an improvement over 2 replicas without it.
like i've mentioned earlier, ideally we want a coredns DS for all CP nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if continuing to use a Deployment we might want to add these tolerations: #55713 (comment)
^ this issue BTW is one where users are being quite confused by some scheduling aspects of k8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisohaver PTAL too.
so basically i'm proposing that we keep the replica count to 2.
and introduce the following:
spec:
...
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 15
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 15
this will improve the current deployment by rescheduling the coredns Pods if a Node becomes NotReady after 15 seconds.
i don't think the anti-affinity rule is needed here:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values: ["kube-dns"]
topologyKey: kubernetes.io/hostname
because with the current setup the deployment already does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with the more trivial questions here. Is this required for the CoreDNS version bump?
If so, why is this a patch release and not a minor version bump? If it's not required, can we split it and move it into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've removed the pod anti-affinity changes from this PR and move it to another PR.
/test pull-kubernetes-e2e-kind-ipv6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajansandeep !
- key: k8s-app | ||
operator: In | ||
values: ["kube-dns"] | ||
topologyKey: kubernetes.io/hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with the more trivial questions here. Is this required for the CoreDNS version bump?
If so, why is this a patch release and not a minor version bump? If it's not required, can we split it and move it into a separate PR?
No it's not.
Yes - makes sense. |
/approve |
…n of coredns up to version 1.6.5
460dd60
to
2544a76
Compare
/lgtm |
/test pull-kubernetes-e2e-kind-ipv6 |
@neolit123 Does this need the milestone tag? |
i don't think it does yet. /retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
dep updates
/assign @BenTheElder |
/assign @liggitt |
@@ -632,7 +632,9 @@ func TestCreateCoreDNSConfigMap(t *testing.T) { | |||
}`, | |||
expectedCorefileData: `.:53 { | |||
errors | |||
health | |||
health { | |||
lameduck 5s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a required change? will users with a custom dns config be broken if they don't make this change as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required. It's just an improvement that reduces query failures during rolling upgrades. The setting allows CoreDNS to complete in flight dns queries before exiting.
Without the setting, CoreDNS will not be broken.
/approve /hold on the config compatibility question |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, neolit123, rajansandeep, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
looks like I got scooped @neolit123 ... :prow_fire: 😞 |
np |
canceling the hold as per @chrisohaver 's explanation here: thanks |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This PR is dependent on the CoreDNS 1.6.5 image be pushed to gcr.io for which #84993 has been opened.
/hold
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: