Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

Closed
dlouks opened this issue Jul 10, 2020 · 8 comments · Fixed by #7211
Closed

Coredns Deployment Deleted on each kubespray cluster.yml run #6387

dlouks opened this issue Jul 10, 2020 · 8 comments · Fixed by #7211
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dlouks
Copy link
Contributor

dlouks commented Jul 10, 2020

Environment:

  • Cloud provider or hardware configuration: vsphere

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): CentOS 7 Linux 3.10.0-1062.12.1.el7.x86_64

  • Version of Ansible (ansible --version): 2.8.8

  • Version of Python (python --version): 2.7.5

Kubespray version (commit) (git rev-parse --short HEAD): 8cb644f

Network plugin used: calico

Configuration
dns_mode: coredns

With each run of kubespray using cluster.yml the coredns deployment is deleted and there is a period of time where no coredns pods are running until it's recreated by a later task. The task doing the deletion is https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/ansible/tasks/cleanup_dns.yml#L2.

Confirmed this task to delete the coredns deployment does not run when using scale.yml.

@dlouks dlouks added the kind/bug Categorizes issue or PR as related to a bug. label Jul 10, 2020
@dlouks dlouks changed the title Coredns Deployment Deleted on each kubespray run Coredns Deployment Deleted on each kubespray cluster.yml run Jul 10, 2020
@floryut
Copy link
Member

floryut commented Jul 13, 2020

Well, the delete task is called seconds before the applying one

- name: Kubernetes Apps | Cleanup DNS
  import_tasks: tasks/cleanup_dns.yml
  when:
    - inventory_hostname == groups['kube-master'][0]
  tags:
    - upgrade
    - coredns
    - nodelocaldns

- name: Kubernetes Apps | CoreDNS
  import_tasks: "tasks/coredns.yml"
  when:
    - dns_mode in ['coredns', 'coredns_dual']
    - inventory_hostname == groups['kube-master'][0]
  tags:
    - coredns

@dlouks
Copy link
Contributor Author

dlouks commented Jul 13, 2020

I would think as long as you have localdns enabled the deletion and re-adding of the coredns deployment shouldn't create a DNS outage so maybe this is a non-issue. It just struck me as wrong that with every kubespray run with cluster.yml it deletes/adds a deployment that is already correctly configured.

I haven't looked closely enough, but is there something unique about the coredns deployment created by kubeadm that could be used to know whether the coredns deployment should be removed?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 11, 2020
@dlouks
Copy link
Contributor Author

dlouks commented Oct 15, 2020

/remove-lifecycle stale

I haven't had a chance to look more into this, but I'd like to understand why we are removing and readding dns and whether there is a check we can perform to determine if it's necessary.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2020
@andybrook
Copy link

What about a handler triggered by this changing? - https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes-apps/ansible/tasks/coredns.yml#L10 as an indication that coredns needs redeploying?

@dlouks
Copy link
Contributor Author

dlouks commented Jan 25, 2021

@andybrook, I'm not sure how that would work exactly since the coredns service would need to be deleted prior to knowing whether there is a change.

I don't really understand the need or history behind deleting coredns. A deletion would be required if you wanted to change service name or IP, but I don't think that was why it gets deleted on every run.

Looking at the commit history, it looks more like coredns started getting deleted because kubedns used to be deleted and in this change e40368a it was updated to delete both coredns and kube-dns.

From the testing I've done on new or existing clusters it doesn't seem to matter if the coredns deployment is deleted or this task is skipped. Maybe a kubespray member can help shed some light on the reason behind the deletion, but to me it seems like the deletion of coredns deployment can simply be removed.

@andybrook
Copy link

andybrook commented Jan 26, 2021

Hi @dlouks , sorry I probably wasn't clear enough. My suggestion was that if the template "coredns-deployment.yml" were to be marked as changed by the ansible run then the deployment could be deleted.

Currently main.yml runs the show and first calls cleanup_dns.yml to delete the deployment, then coredns.yml to create the template, then deploys said template.

Instead my suggestion was a bit of rejigging to remove cleanup_dns.yml, so first coredns.yml would create the template and then using the "notify" syntax on the first item in coredns.yml would call out to a handler within the role which would essentially perform what's currently happening in cleanup_dns.yml. Then main.yml would go on to deploy the template as per now.

I'm only moderately experienced with ansible but the handler function is great for just running a task when something has triggered a change. So normally during a run on an established cluster the file coredns-deployment.yml wouldn't be changed, thus the handler wouldn't be triggered. The deployment wouldn't be deleted and the deployment later on in main.yml (line 39) will sail through as there are no changes to the template.

Anywho, this all got me thinking about why you might want to destroy a DNS pod before it's recreation and the only thing I could think of is actually not because of a change in coredns-deployment.yml but coredns-config.yml. Since the pod would need respawning to pick up the change in the configuration and that wouldn't happen with just a change in a configmap.

That said, if it's easier to just remove the logic that deletes the deployment entirely and not worry about the configmap that's probably simpler than the above idea, HTH.

@dlouks
Copy link
Contributor Author

dlouks commented Jan 28, 2021

@andybrook, thanks for the clarification. The reason for the deletion is because kubeadm creates a coredns deployment and a kube-dns service. As suggested in #7211 I put an annotation on the coredns deployment and only delete it if that annotation isn't there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants