Coredns Deployment Deleted on each kubespray cluster.yml run #6387

dlouks · 2020-07-10T16:02:38Z

Environment:

Cloud provider or hardware configuration: vsphere
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): CentOS 7 Linux 3.10.0-1062.12.1.el7.x86_64
Version of Ansible (ansible --version): 2.8.8
Version of Python (python --version): 2.7.5

Kubespray version (commit) (git rev-parse --short HEAD): 8cb644f

Network plugin used: calico

Configuration
dns_mode: coredns

With each run of kubespray using cluster.yml the coredns deployment is deleted and there is a period of time where no coredns pods are running until it's recreated by a later task. The task doing the deletion is https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/ansible/tasks/cleanup_dns.yml#L2.

Confirmed this task to delete the coredns deployment does not run when using scale.yml.

The text was updated successfully, but these errors were encountered:

floryut · 2020-07-13T09:07:05Z

Well, the delete task is called seconds before the applying one

- name: Kubernetes Apps | Cleanup DNS
  import_tasks: tasks/cleanup_dns.yml
  when:
    - inventory_hostname == groups['kube-master'][0]
  tags:
    - upgrade
    - coredns
    - nodelocaldns

- name: Kubernetes Apps | CoreDNS
  import_tasks: "tasks/coredns.yml"
  when:
    - dns_mode in ['coredns', 'coredns_dual']
    - inventory_hostname == groups['kube-master'][0]
  tags:
    - coredns

dlouks · 2020-07-13T17:21:49Z

I would think as long as you have localdns enabled the deletion and re-adding of the coredns deployment shouldn't create a DNS outage so maybe this is a non-issue. It just struck me as wrong that with every kubespray run with cluster.yml it deletes/adds a deployment that is already correctly configured.

I haven't looked closely enough, but is there something unique about the coredns deployment created by kubeadm that could be used to know whether the coredns deployment should be removed?

fejta-bot · 2020-10-11T18:20:33Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

dlouks · 2020-10-15T21:09:20Z

/remove-lifecycle stale

I haven't had a chance to look more into this, but I'd like to understand why we are removing and readding dns and whether there is a check we can perform to determine if it's necessary.

andybrook · 2021-01-08T21:37:21Z

What about a handler triggered by this changing? - https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/ansible/tasks/coredns.yml#L10 as an indication that coredns needs redeploying?

dlouks · 2021-01-25T22:05:47Z

@andybrook, I'm not sure how that would work exactly since the coredns service would need to be deleted prior to knowing whether there is a change.

I don't really understand the need or history behind deleting coredns. A deletion would be required if you wanted to change service name or IP, but I don't think that was why it gets deleted on every run.

Looking at the commit history, it looks more like coredns started getting deleted because kubedns used to be deleted and in this change e40368a it was updated to delete both coredns and kube-dns.

From the testing I've done on new or existing clusters it doesn't seem to matter if the coredns deployment is deleted or this task is skipped. Maybe a kubespray member can help shed some light on the reason behind the deletion, but to me it seems like the deletion of coredns deployment can simply be removed.

andybrook · 2021-01-26T22:26:26Z

Hi @dlouks , sorry I probably wasn't clear enough. My suggestion was that if the template "coredns-deployment.yml" were to be marked as changed by the ansible run then the deployment could be deleted.

Currently main.yml runs the show and first calls cleanup_dns.yml to delete the deployment, then coredns.yml to create the template, then deploys said template.

Instead my suggestion was a bit of rejigging to remove cleanup_dns.yml, so first coredns.yml would create the template and then using the "notify" syntax on the first item in coredns.yml would call out to a handler within the role which would essentially perform what's currently happening in cleanup_dns.yml. Then main.yml would go on to deploy the template as per now.

I'm only moderately experienced with ansible but the handler function is great for just running a task when something has triggered a change. So normally during a run on an established cluster the file coredns-deployment.yml wouldn't be changed, thus the handler wouldn't be triggered. The deployment wouldn't be deleted and the deployment later on in main.yml (line 39) will sail through as there are no changes to the template.

Anywho, this all got me thinking about why you might want to destroy a DNS pod before it's recreation and the only thing I could think of is actually not because of a change in coredns-deployment.yml but coredns-config.yml. Since the pod would need respawning to pick up the change in the configuration and that wouldn't happen with just a change in a configmap.

That said, if it's easier to just remove the logic that deletes the deployment entirely and not worry about the configmap that's probably simpler than the above idea, HTH.

dlouks · 2021-01-28T17:13:23Z

@andybrook, thanks for the clarification. The reason for the deletion is because kubeadm creates a coredns deployment and a kube-dns service. As suggested in #7211 I put an annotation on the coredns deployment and only delete it if that annotation isn't there.

dlouks added the kind/bug Categorizes issue or PR as related to a bug. label Jul 10, 2020

dlouks changed the title ~~Coredns Deployment Deleted on each kubespray run~~ Coredns Deployment Deleted on each kubespray cluster.yml run Jul 10, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 11, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2020

dlouks mentioned this issue Jan 25, 2021

Remove deletion of coredns deployment. #7211

Merged

k8s-ci-robot closed this as completed in #7211 Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

dlouks commented Jul 10, 2020 •

edited

Loading

floryut commented Jul 13, 2020 •

edited

Loading

dlouks commented Jul 13, 2020

fejta-bot commented Oct 11, 2020

dlouks commented Oct 15, 2020

andybrook commented Jan 8, 2021

dlouks commented Jan 25, 2021 •

edited

Loading

andybrook commented Jan 26, 2021 •

edited

Loading

dlouks commented Jan 28, 2021

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

Comments

dlouks commented Jul 10, 2020 • edited Loading

floryut commented Jul 13, 2020 • edited Loading

dlouks commented Jul 13, 2020

fejta-bot commented Oct 11, 2020

dlouks commented Oct 15, 2020

andybrook commented Jan 8, 2021

dlouks commented Jan 25, 2021 • edited Loading

andybrook commented Jan 26, 2021 • edited Loading

dlouks commented Jan 28, 2021

dlouks commented Jul 10, 2020 •

edited

Loading

floryut commented Jul 13, 2020 •

edited

Loading

dlouks commented Jan 25, 2021 •

edited

Loading

andybrook commented Jan 26, 2021 •

edited

Loading