You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

646 lines
21 KiB

  1. # Setting up your first cluster with Kubespray
  2. This tutorial walks you through the detailed steps for setting up Kubernetes
  3. with [Kubespray](https://kubespray.io/).
  4. The guide is inspired on the tutorial [Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way), with the
  5. difference that here we want to showcase how to spin up a Kubernetes cluster
  6. in a more managed fashion with Kubespray.
  7. ## Target Audience
  8. The target audience for this tutorial is someone looking for a
  9. hands-on guide to get started with Kubespray.
  10. ## Cluster Details
  11. * [kubespray](https://github.com/kubernetes-sigs/kubespray) v2.13.x
  12. * [kubernetes](https://github.com/kubernetes/kubernetes) v1.17.9
  13. ## Prerequisites
  14. * Google Cloud Platform: This tutorial leverages the [Google Cloud Platform](https://cloud.google.com/) to streamline provisioning of the compute infrastructure required to bootstrap a Kubernetes cluster from the ground up. [Sign up](https://cloud.google.com/free/) for $300 in free credits.
  15. * Google Cloud Platform SDK: Follow the Google Cloud SDK [documentation](https://cloud.google.com/sdk/) to install and configure the `gcloud` command
  16. line utility. Make sure to set a default compute region and compute zone.
  17. * The [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) command line utility is used to interact with the Kubernetes
  18. API Server.
  19. * Linux or Mac environment with Python 3
  20. ## Provisioning Compute Resources
  21. Kubernetes requires a set of machines to host the Kubernetes control plane and the worker nodes where containers are ultimately run. In this lab you will provision the compute resources required for running a secure and highly available Kubernetes cluster across a single [compute zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones).
  22. ### Networking
  23. The Kubernetes [networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model) assumes a flat network in which containers and nodes can communicate with each other. In cases where this is not desired [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can limit how groups of containers are allowed to communicate with each other and external network endpoints.
  24. > Setting up network policies is out of scope for this tutorial.
  25. #### Virtual Private Cloud Network
  26. In this section a dedicated [Virtual Private Cloud](https://cloud.google.com/compute/docs/networks-and-firewalls#networks) (VPC) network will be setup to host the Kubernetes cluster.
  27. Create the `kubernetes-the-kubespray-way` custom VPC network:
  28. ```ShellSession
  29. gcloud compute networks create kubernetes-the-kubespray-way --subnet-mode custom
  30. ```
  31. A [subnet](https://cloud.google.com/compute/docs/vpc/#vpc_networks_and_subnets) must be provisioned with an IP address range large enough to assign a private IP address to each node in the Kubernetes cluster.
  32. Create the `kubernetes` subnet in the `kubernetes-the-hard-way` VPC network:
  33. ```ShellSession
  34. gcloud compute networks subnets create kubernetes \
  35. --network kubernetes-the-kubespray-way \
  36. --range 10.240.0.0/24
  37. ```
  38. > The `10.240.0.0/24` IP address range can host up to 254 compute instances.
  39. #### Firewall Rules
  40. Create a firewall rule that allows internal communication across all protocols.
  41. It is important to note that the ipip protocol has to be allowed in order for
  42. the calico (see later) networking plugin to work.
  43. ```ShellSession
  44. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-internal \
  45. --allow tcp,udp,icmp,ipip \
  46. --network kubernetes-the-kubespray-way \
  47. --source-ranges 10.240.0.0/24
  48. ```
  49. Create a firewall rule that allows external SSH, ICMP, and HTTPS:
  50. ```ShellSession
  51. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-external \
  52. --allow tcp:80,tcp:6443,tcp:443,tcp:22,icmp \
  53. --network kubernetes-the-kubespray-way \
  54. --source-ranges 0.0.0.0/0
  55. ```
  56. It is not feasible to restrict the firewall to a specific IP address from
  57. where you are accessing the cluster as the nodes also communicate over the public internet and would otherwise run into
  58. this firewall. Technically you could limit the firewall to the (fixed) IP
  59. addresses of the cluster nodes and the remote IP addresses for accessing the
  60. cluster.
  61. ### Compute Instances
  62. The compute instances in this lab will be provisioned using [Ubuntu Server](https://www.ubuntu.com/server) 18.04.
  63. Each compute instance will be provisioned with a fixed private IP address and
  64. a public IP address (that can be fixed - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)).
  65. Using fixed public IP addresses has the advantage that our cluster node
  66. configuration does not need to be updated with new public IP addresses every
  67. time the machines are shut down and later on restarted.
  68. Create three compute instances which will host the Kubernetes control plane:
  69. ```ShellSession
  70. for i in 0 1 2; do
  71. gcloud compute instances create controller-${i} \
  72. --async \
  73. --boot-disk-size 200GB \
  74. --can-ip-forward \
  75. --image-family ubuntu-1804-lts \
  76. --image-project ubuntu-os-cloud \
  77. --machine-type e2-standard-2 \
  78. --private-network-ip 10.240.0.1${i} \
  79. --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
  80. --subnet kubernetes \
  81. --tags kubernetes-the-kubespray-way,controller
  82. done
  83. ```
  84. > Do not forget to fix the IP addresses if you plan on re-using the cluster
  85. after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
  86. Create three compute instances which will host the Kubernetes worker nodes:
  87. ```ShellSession
  88. for i in 0 1 2; do
  89. gcloud compute instances create worker-${i} \
  90. --async \
  91. --boot-disk-size 200GB \
  92. --can-ip-forward \
  93. --image-family ubuntu-1804-lts \
  94. --image-project ubuntu-os-cloud \
  95. --machine-type e2-standard-2 \
  96. --private-network-ip 10.240.0.2${i} \
  97. --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
  98. --subnet kubernetes \
  99. --tags kubernetes-the-kubespray-way,worker
  100. done
  101. ```
  102. > Do not forget to fix the IP addresses if you plan on re-using the cluster
  103. after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
  104. List the compute instances in your default compute zone:
  105. ```ShellSession
  106. gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way"
  107. ```
  108. > Output
  109. ```ShellSession
  110. NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
  111. controller-0 us-west1-c e2-standard-2 10.240.0.10 XX.XX.XX.XXX RUNNING
  112. controller-1 us-west1-c e2-standard-2 10.240.0.11 XX.XXX.XXX.XX RUNNING
  113. controller-2 us-west1-c e2-standard-2 10.240.0.12 XX.XXX.XX.XXX RUNNING
  114. worker-0 us-west1-c e2-standard-2 10.240.0.20 XX.XX.XXX.XXX RUNNING
  115. worker-1 us-west1-c e2-standard-2 10.240.0.21 XX.XX.XX.XXX RUNNING
  116. worker-2 us-west1-c e2-standard-2 10.240.0.22 XX.XXX.XX.XX RUNNING
  117. ```
  118. ### Configuring SSH Access
  119. Kubespray is relying on SSH to configure the controller and worker instances.
  120. Test SSH access to the `controller-0` compute instance:
  121. ```ShellSession
  122. IP_CONTROLLER_0=$(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way AND name:controller-0" --format="value(EXTERNAL_IP)")
  123. USERNAME=$(whoami)
  124. ssh $USERNAME@$IP_CONTROLLER_0
  125. ```
  126. If this is your first time connecting to a compute instance SSH keys will be
  127. generated for you. In this case you will need to enter a passphrase at the
  128. prompt to continue.
  129. > If you get a 'Remote host identification changed!' warning, you probably
  130. already connected to that IP address in the past with another host key. You
  131. can remove the old host key by running `ssh-keygen -R $IP_CONTROLLER_0`
  132. Please repeat this procedure for all the controller and worker nodes, to
  133. ensure that SSH access is properly functioning for all nodes.
  134. ## Set-up Kubespray
  135. The following set of instruction is based on the [Quick Start](https://github.com/kubernetes-sigs/kubespray) but slightly altered for our
  136. set-up.
  137. As Ansible is a python application, we will create a fresh virtual
  138. environment to install the dependencies for the Kubespray playbook:
  139. ```ShellSession
  140. python3 -m venv venv
  141. source venv/bin/activate
  142. ```
  143. Next, we will git clone the Kubespray code into our working directory:
  144. ```ShellSession
  145. git clone https://github.com/kubernetes-sigs/kubespray.git
  146. cd kubespray
  147. git checkout release-2.13
  148. ```
  149. Now we need to install the dependencies for Ansible to run the Kubespray
  150. playbook:
  151. ```ShellSession
  152. pip install -r requirements.txt
  153. ```
  154. Copy ``inventory/sample`` as ``inventory/mycluster``:
  155. ```ShellSession
  156. cp -rfp inventory/sample inventory/mycluster
  157. ```
  158. Update Ansible inventory file with inventory builder:
  159. ```ShellSession
  160. declare -a IPS=($(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way" --format="value(EXTERNAL_IP)" | tr '\n' ' '))
  161. CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
  162. ```
  163. Open the generated `inventory/mycluster/hosts.yaml` file and adjust it so
  164. that controller-0, controller-1 and controller-2 are control plane nodes and
  165. worker-0, worker-1 and worker-2 are worker nodes. Also update the `ip` to the respective local VPC IP and
  166. remove the `access_ip`.
  167. The main configuration for the cluster is stored in
  168. `inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml`. In this file we
  169. will update the `supplementary_addresses_in_ssl_keys` with a list of the IP
  170. addresses of the controller nodes. In that way we can access the
  171. kubernetes API server as an administrator from outside the VPC network. You
  172. can also see that the `kube_network_plugin` is by default set to 'calico'.
  173. If you set this to 'cloud', it did not work on GCP at the time of testing.
  174. Kubespray also offers to easily enable popular kubernetes add-ons. You can
  175. modify the
  176. list of add-ons in `inventory/mycluster/group_vars/k8s-cluster/addons.yml`.
  177. Let's enable the metrics server as this is a crucial monitoring element for
  178. the kubernetes cluster, just change the 'false' to 'true' for
  179. `metrics_server_enabled`.
  180. Now we will deploy the configuration:
  181. ```ShellSession
  182. ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa cluster.yml
  183. ```
  184. Ansible will now execute the playbook, this can take up to 20 minutes.
  185. ## Access the kubernetes cluster
  186. We will leverage a kubeconfig file from one of the controller nodes to access
  187. the cluster as administrator from our local workstation.
  188. > In this simplified set-up, we did not include a load balancer that usually
  189. sits on top of the
  190. three controller nodes for a high available API server endpoint. In this
  191. simplified tutorial we connect directly to one of the three
  192. controllers.
  193. First, we need to edit the permission of the kubeconfig file on one of the
  194. controller nodes:
  195. ```ShellSession
  196. ssh $USERNAME@$IP_CONTROLLER_0
  197. USERNAME=$(whoami)
  198. sudo chown -R $USERNAME:$USERNAME /etc/kubernetes/admin.conf
  199. exit
  200. ```
  201. Now we will copy over the kubeconfig file:
  202. ```ShellSession
  203. scp $USERNAME@$IP_CONTROLLER_0:/etc/kubernetes/admin.conf kubespray-do.conf
  204. ```
  205. This kubeconfig file uses the internal IP address of the controller node to
  206. access the API server. This kubeconfig file will thus not work of from
  207. outside of the VPC network. We will need to change the API server IP address
  208. to the controller node his external IP address. The external IP address will be
  209. accepted in the
  210. TLS negotation as we added the controllers external IP addresses in the SSL
  211. certificate configuration.
  212. Open the file and modify the server IP address from the local IP to the
  213. external IP address of controller-0, as stored in $IP_CONTROLLER_0.
  214. > Example
  215. ```ShellSession
  216. apiVersion: v1
  217. clusters:
  218. - cluster:
  219. certificate-authority-data: XXX
  220. server: https://35.205.205.80:6443
  221. name: cluster.local
  222. ...
  223. ```
  224. Now, we load the configuration for `kubectl`:
  225. ```ShellSession
  226. export KUBECONFIG=$PWD/kubespray-do.conf
  227. ```
  228. We should be all set to communicate with our cluster from our local workstation:
  229. ```ShellSession
  230. kubectl get nodes
  231. ```
  232. > Output
  233. ```ShellSession
  234. NAME STATUS ROLES AGE VERSION
  235. controller-0 Ready master 47m v1.17.9
  236. controller-1 Ready master 46m v1.17.9
  237. controller-2 Ready master 46m v1.17.9
  238. worker-0 Ready <none> 45m v1.17.9
  239. worker-1 Ready <none> 45m v1.17.9
  240. worker-2 Ready <none> 45m v1.17.9
  241. ```
  242. ## Smoke tests
  243. ### Metrics
  244. Verify if the metrics server addon was correctly installed and works:
  245. ```ShellSession
  246. kubectl top nodes
  247. ```
  248. > Output
  249. ```ShellSession
  250. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  251. controller-0 191m 10% 1956Mi 26%
  252. controller-1 190m 10% 1828Mi 24%
  253. controller-2 182m 10% 1839Mi 24%
  254. worker-0 87m 4% 1265Mi 16%
  255. worker-1 102m 5% 1268Mi 16%
  256. worker-2 108m 5% 1299Mi 17%
  257. ```
  258. Please note that metrics might not be available at first and need a couple of
  259. minutes before you can actually retrieve them.
  260. ### Network
  261. Let's verify if the network layer is properly functioning and pods can reach
  262. each other:
  263. ```ShellSession
  264. kubectl run myshell1 -it --rm --image busybox -- sh
  265. hostname -i
  266. # launch myshell2 in seperate terminal (see next code block) and ping the hostname of myshell2
  267. ping <hostname myshell2>
  268. ```
  269. ```ShellSession
  270. kubectl run myshell2 -it --rm --image busybox -- sh
  271. hostname -i
  272. ping <hostname myshell1>
  273. ```
  274. > Output
  275. ```ShellSession
  276. PING 10.233.108.2 (10.233.108.2): 56 data bytes
  277. 64 bytes from 10.233.108.2: seq=0 ttl=62 time=2.876 ms
  278. 64 bytes from 10.233.108.2: seq=1 ttl=62 time=0.398 ms
  279. 64 bytes from 10.233.108.2: seq=2 ttl=62 time=0.378 ms
  280. ^C
  281. --- 10.233.108.2 ping statistics ---
  282. 3 packets transmitted, 3 packets received, 0% packet loss
  283. round-trip min/avg/max = 0.378/1.217/2.876 ms
  284. ```
  285. ### Deployments
  286. In this section you will verify the ability to create and manage [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/).
  287. Create a deployment for the [nginx](https://nginx.org/en/) web server:
  288. ```ShellSession
  289. kubectl create deployment nginx --image=nginx
  290. ```
  291. List the pod created by the `nginx` deployment:
  292. ```ShellSession
  293. kubectl get pods -l app=nginx
  294. ```
  295. > Output
  296. ```ShellSession
  297. NAME READY STATUS RESTARTS AGE
  298. nginx-86c57db685-bmtt8 1/1 Running 0 18s
  299. ```
  300. #### Port Forwarding
  301. In this section you will verify the ability to access applications remotely using [port forwarding](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/).
  302. Retrieve the full name of the `nginx` pod:
  303. ```ShellSession
  304. POD_NAME=$(kubectl get pods -l app=nginx -o jsonpath="{.items[0].metadata.name}")
  305. ```
  306. Forward port `8080` on your local machine to port `80` of the `nginx` pod:
  307. ```ShellSession
  308. kubectl port-forward $POD_NAME 8080:80
  309. ```
  310. > Output
  311. ```ShellSession
  312. Forwarding from 127.0.0.1:8080 -> 80
  313. Forwarding from [::1]:8080 -> 80
  314. ```
  315. In a new terminal make an HTTP request using the forwarding address:
  316. ```ShellSession
  317. curl --head http://127.0.0.1:8080
  318. ```
  319. > Output
  320. ```ShellSession
  321. HTTP/1.1 200 OK
  322. Server: nginx/1.19.1
  323. Date: Thu, 13 Aug 2020 11:12:04 GMT
  324. Content-Type: text/html
  325. Content-Length: 612
  326. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  327. Connection: keep-alive
  328. ETag: "5f049a39-264"
  329. Accept-Ranges: bytes
  330. ```
  331. Switch back to the previous terminal and stop the port forwarding to the `nginx` pod:
  332. ```ShellSession
  333. Forwarding from 127.0.0.1:8080 -> 80
  334. Forwarding from [::1]:8080 -> 80
  335. Handling connection for 8080
  336. ^C
  337. ```
  338. #### Logs
  339. In this section you will verify the ability to [retrieve container logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
  340. Print the `nginx` pod logs:
  341. ```ShellSession
  342. kubectl logs $POD_NAME
  343. ```
  344. > Output
  345. ```ShellSession
  346. ...
  347. 127.0.0.1 - - [13/Aug/2020:11:12:04 +0000] "HEAD / HTTP/1.1" 200 0 "-" "curl/7.64.1" "-"
  348. ```
  349. #### Exec
  350. In this section you will verify the ability to [execute commands in a container](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#running-individual-commands-in-a-container).
  351. Print the nginx version by executing the `nginx -v` command in the `nginx` container:
  352. ```ShellSession
  353. kubectl exec -ti $POD_NAME -- nginx -v
  354. ```
  355. > Output
  356. ```ShellSession
  357. nginx version: nginx/1.19.1
  358. ```
  359. ### Kubernetes services
  360. #### Expose outside of the cluster
  361. In this section you will verify the ability to expose applications using a [Service](https://kubernetes.io/docs/concepts/services-networking/service/).
  362. Expose the `nginx` deployment using a [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport) service:
  363. ```ShellSession
  364. kubectl expose deployment nginx --port 80 --type NodePort
  365. ```
  366. > The LoadBalancer service type can not be used because your cluster is not configured with [cloud provider integration](https://kubernetes.io/docs/getting-started-guides/scratch/#cloud-provider). Setting up cloud provider integration is out of scope for this tutorial.
  367. Retrieve the node port assigned to the `nginx` service:
  368. ```ShellSession
  369. NODE_PORT=$(kubectl get svc nginx \
  370. --output=jsonpath='{range .spec.ports[0]}{.nodePort}')
  371. ```
  372. Create a firewall rule that allows remote access to the `nginx` node port:
  373. ```ShellSession
  374. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-nginx-service \
  375. --allow=tcp:${NODE_PORT} \
  376. --network kubernetes-the-kubespray-way
  377. ```
  378. Retrieve the external IP address of a worker instance:
  379. ```ShellSession
  380. EXTERNAL_IP=$(gcloud compute instances describe worker-0 \
  381. --format 'value(networkInterfaces[0].accessConfigs[0].natIP)')
  382. ```
  383. Make an HTTP request using the external IP address and the `nginx` node port:
  384. ```ShellSession
  385. curl -I http://${EXTERNAL_IP}:${NODE_PORT}
  386. ```
  387. > Output
  388. ```ShellSession
  389. HTTP/1.1 200 OK
  390. Server: nginx/1.19.1
  391. Date: Thu, 13 Aug 2020 11:15:02 GMT
  392. Content-Type: text/html
  393. Content-Length: 612
  394. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  395. Connection: keep-alive
  396. ETag: "5f049a39-264"
  397. Accept-Ranges: bytes
  398. ```
  399. #### Local DNS
  400. We will now also verify that kubernetes built-in DNS works across namespaces.
  401. Create a namespace:
  402. ```ShellSession
  403. kubectl create namespace dev
  404. ```
  405. Create an nginx deployment and expose it within the cluster:
  406. ```ShellSession
  407. kubectl create deployment nginx --image=nginx -n dev
  408. kubectl expose deployment nginx --port 80 --type ClusterIP -n dev
  409. ```
  410. Run a temporary container to see if we can reach the service from the default
  411. namespace:
  412. ```ShellSession
  413. kubectl run curly -it --rm --image curlimages/curl:7.70.0 -- /bin/sh
  414. curl --head http://nginx.dev:80
  415. ```
  416. > Output
  417. ```ShellSession
  418. HTTP/1.1 200 OK
  419. Server: nginx/1.19.1
  420. Date: Thu, 13 Aug 2020 11:15:59 GMT
  421. Content-Type: text/html
  422. Content-Length: 612
  423. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  424. Connection: keep-alive
  425. ETag: "5f049a39-264"
  426. Accept-Ranges: bytes
  427. ```
  428. Type `exit` to leave the shell.
  429. ## Cleaning Up
  430. ### Kubernetes resources
  431. Delete the dev namespace, the nginx deployment and service:
  432. ```ShellSession
  433. kubectl delete namespace dev
  434. kubectl delete deployment nginx
  435. kubectl delete svc/ngninx
  436. ```
  437. ### Kubernetes state
  438. Note: you can skip this step if you want to entirely remove the machines.
  439. If you want to keep the VMs and just remove the cluster state, you can simply
  440. run another Ansible playbook:
  441. ```ShellSession
  442. ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa reset.yml
  443. ```
  444. Resetting the cluster to the VMs original state usually takes about a couple
  445. of minutes.
  446. ### Compute instances
  447. Delete the controller and worker compute instances:
  448. ```ShellSession
  449. gcloud -q compute instances delete \
  450. controller-0 controller-1 controller-2 \
  451. worker-0 worker-1 worker-2 \
  452. --zone $(gcloud config get-value compute/zone)
  453. ```
  454. <!-- markdownlint-disable no-duplicate-heading -->
  455. ### Network
  456. <!-- markdownlint-enable no-duplicate-heading -->
  457. Delete the fixed IP addresses (assuming you named them equal to the VM names),
  458. if any:
  459. ```ShellSession
  460. gcloud -q compute addresses delete controller-0 controller-1 controller-2 \
  461. worker-0 worker-1 worker-2
  462. ```
  463. Delete the `kubernetes-the-kubespray-way` firewall rules:
  464. ```ShellSession
  465. gcloud -q compute firewall-rules delete \
  466. kubernetes-the-kubespray-way-allow-nginx-service \
  467. kubernetes-the-kubespray-way-allow-internal \
  468. kubernetes-the-kubespray-way-allow-external
  469. ```
  470. Delete the `kubernetes-the-kubespray-way` network VPC:
  471. ```ShellSession
  472. gcloud -q compute networks subnets delete kubernetes
  473. gcloud -q compute networks delete kubernetes-the-kubespray-way
  474. ```