You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

131 lines
4.2 KiB

  1. # Adding/replacing a node
  2. Modified from [comments in #3471](https://github.com/kubernetes-sigs/kubespray/issues/3471#issuecomment-530036084)
  3. ## Adding/replacing a worker node
  4. This should be the easiest.
  5. ### 1) Add new node to the inventory
  6. ### 2) Run `scale.yml`
  7. You can use `--limit=node1` to limit Kubespray to avoid disturbing other nodes in the cluster.
  8. Before using `--limit` run playbook `facts.yml` without the limit to refresh facts cache for all nodes.
  9. ### 3) Drain the node that will be removed
  10. ```sh
  11. kubectl drain NODE_NAME
  12. ```
  13. ### 4) Run the remove-node.yml playbook
  14. With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
  15. ### 5) Remove the node from the inventory
  16. That's it.
  17. ## Adding/replacing a master node
  18. ### 1) Recreate apiserver certs manually to include the new master node in the cert SAN field
  19. For some reason, Kubespray will not update the apiserver certificate.
  20. Edit `/etc/kubernetes/kubeadm-config.yaml`, include new host in `certSANs` list.
  21. Use kubeadm to recreate the certs.
  22. ```sh
  23. cd /etc/kubernetes/ssl
  24. mv apiserver.crt apiserver.crt.old
  25. mv apiserver.key apiserver.key.old
  26. cd /etc/kubernetes
  27. kubeadm init phase certs apiserver --config kubeadm-config.yaml
  28. ```
  29. Check the certificate, new host needs to be there.
  30. ```sh
  31. openssl x509 -text -noout -in /etc/kubernetes/ssl/apiserver.crt
  32. ```
  33. ### 2) Run `cluster.yml`
  34. Add the new host to the inventory and run cluster.yml.
  35. ### 3) Restart kube-system/nginx-proxy
  36. In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted in order to reload.
  37. ```sh
  38. # run in every host
  39. docker ps | grep k8s_nginx-proxy_nginx-proxy | awk '{print $1}' | xargs docker restart
  40. ```
  41. ### 4) Remove old master nodes
  42. If you are replacing a node, remove the old one from the inventory, and remove from the cluster runtime.
  43. ```sh
  44. kubectl drain NODE_NAME
  45. kubectl delete node NODE_NAME
  46. ```
  47. After that, the old node can be safely shutdown. Also, make sure to restart nginx-proxy in all remaining nodes (step 3)
  48. From any active master that remains in the cluster, re-upload `kubeadm-config.yaml`
  49. ```sh
  50. kubeadm config upload from-file --config /etc/kubernetes/kubeadm-config.yaml
  51. ```
  52. ## Adding/Replacing an etcd node
  53. You need to make sure there are always an odd number of etcd nodes in the cluster. In such a way, this is always a replace or scale up operation. Either add two new nodes or remove an old one.
  54. ### 1) Add the new node running cluster.yml
  55. Update the inventory and run `cluster.yml` passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`.
  56. Run `upgrade-cluster.yml` also passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`. This is necessary to update all etcd configuration in the cluster.
  57. At this point, you will have an even number of nodes. Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node. Even so, running applications should continue to be available.
  58. ### 2) Remove an old etcd node
  59. With the node still in the inventory, run `remove-node.yml` passing `-e node=NODE_NAME` as the name of the node that should be removed.
  60. ### 3) Make sure the remaining etcd members have their config updated
  61. In each etcd host that remains in the cluster:
  62. ```sh
  63. cat /etc/etcd.env | grep ETCD_INITIAL_CLUSTER
  64. ```
  65. Only active etcd members should be in that list.
  66. ### 4) Remove old etcd members from the cluster runtime
  67. Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member.
  68. ```sh
  69. # list all members
  70. etcdctl member list
  71. # remove old member
  72. etcdctl member remove MEMBER_ID
  73. # careful!!! if you remove a wrong member you will be in trouble
  74. # note: these command lines are actually much bigger, since you need to pass all certificates to etcdctl.
  75. ```
  76. ### 5) Make sure the apiserver config is correctly updated
  77. In every master node, edit `/etc/kubernetes/manifests/kube-apiserver.yaml`. Make sure only active etcd nodes are still present in the apiserver command line parameter `--etcd-servers=...`.
  78. ### 6) Shutdown the old instance