Browse Source

fix(remove-node): Ensure safety and validation for node removal process (#12085)

This commit enhances the node removal playbook's reliability and safety by implementing the following changes:

1. **Node Validation**: Added a validation step using assert to ensure the `node` variable is defined and contains nodes. If the list is empty or undefined, the playbook fails early, preventing accidental operations on the entire cluster.

2. **Removed Defaulting for Hosts**: Updated tasks to enforce explicit `node` variable input without defaulting to critical groups (e.g., `etcd:k8s_cluster:calico_rr`). By validating `node` beforehand, tasks now solely rely on user-provided input and safely avoid unintended targeting.

3. **Explicit User Confirmation**: Enhanced the confirmation prompt to clarify the scope of the operation. The admin is now required to explicitly confirm node state deletion, ensuring a deliberate decision before proceeding.

These improvements strengthen the reliability and safety of the `remove-node.yml` playbook by eliminating ambiguous behavior, preventing misconfigurations, and ensuring clear interaction during node removal tasks.
pull/12088/head
Farshad Asadpour 1 month ago
committed by GitHub
parent
commit
1513254622
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
2 changed files with 15 additions and 3 deletions
  1. 2
      docs/getting_started/getting-started.md
  2. 16
      playbooks/remove_node.yml

2
docs/getting_started/getting-started.md

@ -59,6 +59,8 @@ ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml -b -v \
--extra-vars "node=nodename,nodename2" --extra-vars "node=nodename,nodename2"
``` ```
> Note: The playbook does not currently support the removal of the first control plane or etcd node. These nodes are essential for maintaining cluster operations and must remain intact.
If a node is completely unreachable by ssh, add `--extra-vars reset_nodes=false` If a node is completely unreachable by ssh, add `--extra-vars reset_nodes=false`
to skip the node reset step. If one node is unavailable, but others you wish to skip the node reset step. If one node is unavailable, but others you wish
to remove are able to connect via SSH, you could set `reset_nodes=false` as a host to remove are able to connect via SSH, you could set `reset_nodes=false` as a host

16
playbooks/remove_node.yml

@ -1,9 +1,19 @@
--- ---
- name: Validate nodes for removal
hosts: localhost
tasks:
- name: Assert that nodes are specified for removal
assert:
that:
- node is defined
- node | length > 0
msg: "No nodes specified for removal. The `node` variable must be set explicitly."
- name: Common tasks for every playbooks - name: Common tasks for every playbooks
import_playbook: boilerplate.yml import_playbook: boilerplate.yml
- name: Confirm node removal - name: Confirm node removal
hosts: "{{ node | default('etcd:k8s_cluster:calico_rr') }}"
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false gather_facts: false
tasks: tasks:
- name: Confirm Execution - name: Confirm Execution
@ -24,7 +34,7 @@
when: reset_nodes | default(True) | bool when: reset_nodes | default(True) | bool
- name: Reset node - name: Reset node
hosts: "{{ node | default('kube_node') }}"
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false gather_facts: false
environment: "{{ proxy_disable_env }}" environment: "{{ proxy_disable_env }}"
pre_tasks: pre_tasks:
@ -40,7 +50,7 @@
# Currently cannot remove first control plane node or first etcd node # Currently cannot remove first control plane node or first etcd node
- name: Post node removal - name: Post node removal
hosts: "{{ node | default('kube_control_plane[1:]:etcd[1:]') }}"
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false gather_facts: false
environment: "{{ proxy_disable_env }}" environment: "{{ proxy_disable_env }}"
roles: roles:

Loading…
Cancel
Save