Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ignore_errors from drain tasks and enable retires #7151

Merged
merged 2 commits into from
Jan 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions roles/remove-node/pre-remove/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
allow_ungraceful_removal: false
drain_grace_period: 300
drain_timeout: 360s
drain_retries: 3
drain_retry_delay_seconds: 10
35 changes: 24 additions & 11 deletions roles/remove-node/pre-remove/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,26 @@
---
- name: cordon-node | Mark all nodes as unschedulable before drain # noqa 301
command: >-
{{ bin_dir }}/kubectl cordon {{ hostvars[item]['kube_override_hostname']|default(item) }}
with_items:
- "{{ node.split(',') | default(groups['kube-node']) }}"
register: result
failed_when: result.rc != 0 and not allow_ungraceful_removal
- name: remove-node | Set `nodes_to_drain` as empty list
set_fact:
nodes_to_drain: []

- name: remove-node | Identify nodes to drain, ignore non-cluster nodes
shell: |
set -o pipefail
{{ bin_dir }}/kubectl get nodes -o json \
| jq .items[].metadata.name \
| jq "select(. | test(\"^{{ hostvars[item]['kube_override_hostname']|default(item) }}$\"))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlouks jq is not guaranteed to be present, is it possible to use kubectl only ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just stumbled over this problem while using remove-node.yml which failed because of missing "jq" on my master node.

Can you please change this either by making sure jq is installed on the needed nodes for this task or change the command somehow so it no longer needs jq.

loop: "{{ node.split(',') | default(groups['kube-node']) }}"
register: nodes
delegate_to: "{{ groups['kube-master']|first }}"
changed_when: false
run_once: true

- name: remove-node | Generate list of nodes to drain
set_fact:
nodes_to_drain: "{{ nodes_to_drain }} + [ '{{ item.stdout | regex_replace('\"', '') }}' ]"
loop: "{{ nodes.results }}"
when: item.stdout | length != 0
run_once: true
ignore_errors: yes

- name: remove-node | Drain node except daemonsets resource # noqa 301
command: >-
Expand All @@ -18,10 +30,11 @@
--grace-period {{ drain_grace_period }}
--timeout {{ drain_timeout }}
--delete-local-data {{ hostvars[item]['kube_override_hostname']|default(item) }}
with_items:
- "{{ node.split(',') | default(groups['kube-node']) }}"
loop: "{{ nodes_to_drain }}"
register: result
failed_when: result.rc != 0 and not allow_ungraceful_removal
delegate_to: "{{ groups['kube-master']|first }}"
run_once: true
ignore_errors: yes
until: result.rc == 0 or allow_ungraceful_removal
retries: "{{ drain_retries }}"
delay: "{{ drain_retry_delay_seconds }}"