You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

642 lines
21 KiB

  1. # Setting up your first cluster with Kubespray
  2. This tutorial walks you through the detailed steps for setting up Kubernetes
  3. with [Kubespray](https://kubespray.io/).
  4. The guide is inspired on the tutorial [Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way), with the
  5. difference that here we want to showcase how to spin up a Kubernetes cluster
  6. in a more managed fashion with Kubespray.
  7. ## Target Audience
  8. The target audience for this tutorial is someone looking for a
  9. hands-on guide to get started with Kubespray.
  10. ## Cluster Details
  11. * [kubespray](https://github.com/kubernetes-sigs/kubespray) v2.17.x
  12. * [kubernetes](https://github.com/kubernetes/kubernetes) v1.17.9
  13. ## Prerequisites
  14. * Google Cloud Platform: This tutorial leverages the [Google Cloud Platform](https://cloud.google.com/) to streamline provisioning of the compute infrastructure required to bootstrap a Kubernetes cluster from the ground up. [Sign up](https://cloud.google.com/free/) for $300 in free credits.
  15. * Google Cloud Platform SDK: Follow the Google Cloud SDK [documentation](https://cloud.google.com/sdk/) to install and configure the `gcloud` command
  16. line utility. Make sure to set a default compute region and compute zone.
  17. * The [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) command line utility is used to interact with the Kubernetes
  18. API Server.
  19. * Linux or Mac environment with Python 3
  20. ## Provisioning Compute Resources
  21. Kubernetes requires a set of machines to host the Kubernetes control plane and the worker nodes where containers are ultimately run. In this lab you will provision the compute resources required for running a secure and highly available Kubernetes cluster across a single [compute zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones).
  22. ### Networking
  23. The Kubernetes [networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model) assumes a flat network in which containers and nodes can communicate with each other. In cases where this is not desired [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can limit how groups of containers are allowed to communicate with each other and external network endpoints.
  24. > Setting up network policies is out of scope for this tutorial.
  25. #### Virtual Private Cloud Network
  26. In this section a dedicated [Virtual Private Cloud](https://cloud.google.com/compute/docs/networks-and-firewalls#networks) (VPC) network will be setup to host the Kubernetes cluster.
  27. Create the `kubernetes-the-kubespray-way` custom VPC network:
  28. ```ShellSession
  29. gcloud compute networks create kubernetes-the-kubespray-way --subnet-mode custom
  30. ```
  31. A [subnet](https://cloud.google.com/compute/docs/vpc/#vpc_networks_and_subnets) must be provisioned with an IP address range large enough to assign a private IP address to each node in the Kubernetes cluster.
  32. Create the `kubernetes` subnet in the `kubernetes-the-kubespray-way` VPC network:
  33. ```ShellSession
  34. gcloud compute networks subnets create kubernetes \
  35. --network kubernetes-the-kubespray-way \
  36. --range 10.240.0.0/24
  37. ```
  38. > The `10.240.0.0/24` IP address range can host up to 254 compute instances.
  39. #### Firewall Rules
  40. Create a firewall rule that allows internal communication across all protocols.
  41. It is important to note that the vxlan protocol has to be allowed in order for
  42. the calico (see later) networking plugin to work.
  43. ```ShellSession
  44. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-internal \
  45. --allow tcp,udp,icmp,vxlan \
  46. --network kubernetes-the-kubespray-way \
  47. --source-ranges 10.240.0.0/24
  48. ```
  49. Create a firewall rule that allows external SSH, ICMP, and HTTPS:
  50. ```ShellSession
  51. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-external \
  52. --allow tcp:80,tcp:6443,tcp:443,tcp:22,icmp \
  53. --network kubernetes-the-kubespray-way \
  54. --source-ranges 0.0.0.0/0
  55. ```
  56. It is not feasible to restrict the firewall to a specific IP address from
  57. where you are accessing the cluster as the nodes also communicate over the public internet and would otherwise run into
  58. this firewall. Technically you could limit the firewall to the (fixed) IP
  59. addresses of the cluster nodes and the remote IP addresses for accessing the
  60. cluster.
  61. ### Compute Instances
  62. The compute instances in this lab will be provisioned using [Ubuntu Server](https://www.ubuntu.com/server) 18.04.
  63. Each compute instance will be provisioned with a fixed private IP address and
  64. a public IP address (that can be fixed - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)).
  65. Using fixed public IP addresses has the advantage that our cluster node
  66. configuration does not need to be updated with new public IP addresses every
  67. time the machines are shut down and later on restarted.
  68. Create three compute instances which will host the Kubernetes control plane:
  69. ```ShellSession
  70. for i in 0 1 2; do
  71. gcloud compute instances create controller-${i} \
  72. --async \
  73. --boot-disk-size 200GB \
  74. --can-ip-forward \
  75. --image-family ubuntu-1804-lts \
  76. --image-project ubuntu-os-cloud \
  77. --machine-type e2-standard-2 \
  78. --private-network-ip 10.240.0.1${i} \
  79. --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
  80. --subnet kubernetes \
  81. --tags kubernetes-the-kubespray-way,controller
  82. done
  83. ```
  84. > Do not forget to fix the IP addresses if you plan on re-using the cluster
  85. after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
  86. Create three compute instances which will host the Kubernetes worker nodes:
  87. ```ShellSession
  88. for i in 0 1 2; do
  89. gcloud compute instances create worker-${i} \
  90. --async \
  91. --boot-disk-size 200GB \
  92. --can-ip-forward \
  93. --image-family ubuntu-1804-lts \
  94. --image-project ubuntu-os-cloud \
  95. --machine-type e2-standard-2 \
  96. --private-network-ip 10.240.0.2${i} \
  97. --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
  98. --subnet kubernetes \
  99. --tags kubernetes-the-kubespray-way,worker
  100. done
  101. ```
  102. > Do not forget to fix the IP addresses if you plan on re-using the cluster
  103. after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
  104. List the compute instances in your default compute zone:
  105. ```ShellSession
  106. gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way"
  107. ```
  108. > Output
  109. ```ShellSession
  110. NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
  111. controller-0 us-west1-c e2-standard-2 10.240.0.10 XX.XX.XX.XXX RUNNING
  112. controller-1 us-west1-c e2-standard-2 10.240.0.11 XX.XXX.XXX.XX RUNNING
  113. controller-2 us-west1-c e2-standard-2 10.240.0.12 XX.XXX.XX.XXX RUNNING
  114. worker-0 us-west1-c e2-standard-2 10.240.0.20 XX.XX.XXX.XXX RUNNING
  115. worker-1 us-west1-c e2-standard-2 10.240.0.21 XX.XX.XX.XXX RUNNING
  116. worker-2 us-west1-c e2-standard-2 10.240.0.22 XX.XXX.XX.XX RUNNING
  117. ```
  118. ### Configuring SSH Access
  119. Kubespray is relying on SSH to configure the controller and worker instances.
  120. Test SSH access to the `controller-0` compute instance:
  121. ```ShellSession
  122. IP_CONTROLLER_0=$(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way AND name:controller-0" --format="value(EXTERNAL_IP)")
  123. USERNAME=$(whoami)
  124. ssh $USERNAME@$IP_CONTROLLER_0
  125. ```
  126. If this is your first time connecting to a compute instance SSH keys will be
  127. generated for you. In this case you will need to enter a passphrase at the
  128. prompt to continue.
  129. > If you get a 'Remote host identification changed!' warning, you probably
  130. already connected to that IP address in the past with another host key. You
  131. can remove the old host key by running `ssh-keygen -R $IP_CONTROLLER_0`
  132. Please repeat this procedure for all the controller and worker nodes, to
  133. ensure that SSH access is properly functioning for all nodes.
  134. ## Set-up Kubespray
  135. The following set of instruction is based on the [Quick Start](https://github.com/kubernetes-sigs/kubespray) but slightly altered for our
  136. set-up.
  137. As Ansible is a python application, we will create a fresh virtual
  138. environment to install the dependencies for the Kubespray playbook:
  139. ```ShellSession
  140. python3 -m venv venv
  141. source venv/bin/activate
  142. ```
  143. Next, we will git clone the Kubespray code into our working directory:
  144. ```ShellSession
  145. git clone https://github.com/kubernetes-sigs/kubespray.git
  146. cd kubespray
  147. git checkout release-2.17
  148. ```
  149. Now we need to install the dependencies for Ansible to run the Kubespray
  150. playbook:
  151. ```ShellSession
  152. pip install -r requirements.txt
  153. ```
  154. Copy ``inventory/sample`` as ``inventory/mycluster``:
  155. ```ShellSession
  156. cp -rfp inventory/sample inventory/mycluster
  157. ```
  158. Update Ansible inventory file with inventory builder:
  159. ```ShellSession
  160. declare -a IPS=($(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way" --format="value(EXTERNAL_IP)" | tr '\n' ' '))
  161. CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
  162. ```
  163. Open the generated `inventory/mycluster/hosts.yaml` file and adjust it so
  164. that controller-0, controller-1 and controller-2 are control plane nodes and
  165. worker-0, worker-1 and worker-2 are worker nodes. Also update the `ip` to the respective local VPC IP and
  166. remove the `access_ip`.
  167. The main configuration for the cluster is stored in
  168. `inventory/mycluster/group_vars/k8s_cluster/k8s_cluster.yml`. In this file we
  169. will update the `supplementary_addresses_in_ssl_keys` with a list of the IP
  170. addresses of the controller nodes. In that way we can access the
  171. kubernetes API server as an administrator from outside the VPC network. You
  172. can also see that the `kube_network_plugin` is by default set to 'calico'.
  173. If you set this to 'cloud', it did not work on GCP at the time of testing.
  174. Kubespray also offers to easily enable popular kubernetes add-ons. You can
  175. modify the
  176. list of add-ons in `inventory/mycluster/group_vars/k8s_cluster/addons.yml`.
  177. Let's enable the metrics server as this is a crucial monitoring element for
  178. the kubernetes cluster, just change the 'false' to 'true' for
  179. `metrics_server_enabled`.
  180. Now we will deploy the configuration:
  181. ```ShellSession
  182. ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa cluster.yml
  183. ```
  184. Ansible will now execute the playbook, this can take up to 20 minutes.
  185. ## Access the kubernetes cluster
  186. We will leverage a kubeconfig file from one of the controller nodes to access
  187. the cluster as administrator from our local workstation.
  188. > In this simplified set-up, we did not include a load balancer that usually sits on top of the three controller nodes for a high available API server endpoint. In this simplified tutorial we connect directly to one of the three controllers.
  189. First, we need to edit the permission of the kubeconfig file on one of the
  190. controller nodes:
  191. ```ShellSession
  192. ssh $USERNAME@$IP_CONTROLLER_0
  193. USERNAME=$(whoami)
  194. sudo chown -R $USERNAME:$USERNAME /etc/kubernetes/admin.conf
  195. exit
  196. ```
  197. Now we will copy over the kubeconfig file:
  198. ```ShellSession
  199. scp $USERNAME@$IP_CONTROLLER_0:/etc/kubernetes/admin.conf kubespray-do.conf
  200. ```
  201. This kubeconfig file uses the internal IP address of the controller node to
  202. access the API server. This kubeconfig file will thus not work of from
  203. outside of the VPC network. We will need to change the API server IP address
  204. to the controller node his external IP address. The external IP address will be
  205. accepted in the
  206. TLS negotiation as we added the controllers external IP addresses in the SSL
  207. certificate configuration.
  208. Open the file and modify the server IP address from the local IP to the
  209. external IP address of controller-0, as stored in $IP_CONTROLLER_0.
  210. > Example
  211. ```ShellSession
  212. apiVersion: v1
  213. clusters:
  214. - cluster:
  215. certificate-authority-data: XXX
  216. server: https://35.205.205.80:6443
  217. name: cluster.local
  218. ...
  219. ```
  220. Now, we load the configuration for `kubectl`:
  221. ```ShellSession
  222. export KUBECONFIG=$PWD/kubespray-do.conf
  223. ```
  224. We should be all set to communicate with our cluster from our local workstation:
  225. ```ShellSession
  226. kubectl get nodes
  227. ```
  228. > Output
  229. ```ShellSession
  230. NAME STATUS ROLES AGE VERSION
  231. controller-0 Ready master 47m v1.17.9
  232. controller-1 Ready master 46m v1.17.9
  233. controller-2 Ready master 46m v1.17.9
  234. worker-0 Ready <none> 45m v1.17.9
  235. worker-1 Ready <none> 45m v1.17.9
  236. worker-2 Ready <none> 45m v1.17.9
  237. ```
  238. ## Smoke tests
  239. ### Metrics
  240. Verify if the metrics server addon was correctly installed and works:
  241. ```ShellSession
  242. kubectl top nodes
  243. ```
  244. > Output
  245. ```ShellSession
  246. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  247. controller-0 191m 10% 1956Mi 26%
  248. controller-1 190m 10% 1828Mi 24%
  249. controller-2 182m 10% 1839Mi 24%
  250. worker-0 87m 4% 1265Mi 16%
  251. worker-1 102m 5% 1268Mi 16%
  252. worker-2 108m 5% 1299Mi 17%
  253. ```
  254. Please note that metrics might not be available at first and need a couple of
  255. minutes before you can actually retrieve them.
  256. ### Network
  257. Let's verify if the network layer is properly functioning and pods can reach
  258. each other:
  259. ```ShellSession
  260. kubectl run myshell1 -it --rm --image busybox -- sh
  261. hostname -i
  262. # launch myshell2 in separate terminal (see next code block) and ping the hostname of myshell2
  263. ping <hostname myshell2>
  264. ```
  265. ```ShellSession
  266. kubectl run myshell2 -it --rm --image busybox -- sh
  267. hostname -i
  268. ping <hostname myshell1>
  269. ```
  270. > Output
  271. ```ShellSession
  272. PING 10.233.108.2 (10.233.108.2): 56 data bytes
  273. 64 bytes from 10.233.108.2: seq=0 ttl=62 time=2.876 ms
  274. 64 bytes from 10.233.108.2: seq=1 ttl=62 time=0.398 ms
  275. 64 bytes from 10.233.108.2: seq=2 ttl=62 time=0.378 ms
  276. ^C
  277. --- 10.233.108.2 ping statistics ---
  278. 3 packets transmitted, 3 packets received, 0% packet loss
  279. round-trip min/avg/max = 0.378/1.217/2.876 ms
  280. ```
  281. ### Deployments
  282. In this section you will verify the ability to create and manage [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/).
  283. Create a deployment for the [nginx](https://nginx.org/en/) web server:
  284. ```ShellSession
  285. kubectl create deployment nginx --image=nginx
  286. ```
  287. List the pod created by the `nginx` deployment:
  288. ```ShellSession
  289. kubectl get pods -l app=nginx
  290. ```
  291. > Output
  292. ```ShellSession
  293. NAME READY STATUS RESTARTS AGE
  294. nginx-86c57db685-bmtt8 1/1 Running 0 18s
  295. ```
  296. #### Port Forwarding
  297. In this section you will verify the ability to access applications remotely using [port forwarding](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/).
  298. Retrieve the full name of the `nginx` pod:
  299. ```ShellSession
  300. POD_NAME=$(kubectl get pods -l app=nginx -o jsonpath="{.items[0].metadata.name}")
  301. ```
  302. Forward port `8080` on your local machine to port `80` of the `nginx` pod:
  303. ```ShellSession
  304. kubectl port-forward $POD_NAME 8080:80
  305. ```
  306. > Output
  307. ```ShellSession
  308. Forwarding from 127.0.0.1:8080 -> 80
  309. Forwarding from [::1]:8080 -> 80
  310. ```
  311. In a new terminal make an HTTP request using the forwarding address:
  312. ```ShellSession
  313. curl --head http://127.0.0.1:8080
  314. ```
  315. > Output
  316. ```ShellSession
  317. HTTP/1.1 200 OK
  318. Server: nginx/1.19.1
  319. Date: Thu, 13 Aug 2020 11:12:04 GMT
  320. Content-Type: text/html
  321. Content-Length: 612
  322. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  323. Connection: keep-alive
  324. ETag: "5f049a39-264"
  325. Accept-Ranges: bytes
  326. ```
  327. Switch back to the previous terminal and stop the port forwarding to the `nginx` pod:
  328. ```ShellSession
  329. Forwarding from 127.0.0.1:8080 -> 80
  330. Forwarding from [::1]:8080 -> 80
  331. Handling connection for 8080
  332. ^C
  333. ```
  334. #### Logs
  335. In this section you will verify the ability to [retrieve container logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
  336. Print the `nginx` pod logs:
  337. ```ShellSession
  338. kubectl logs $POD_NAME
  339. ```
  340. > Output
  341. ```ShellSession
  342. ...
  343. 127.0.0.1 - - [13/Aug/2020:11:12:04 +0000] "HEAD / HTTP/1.1" 200 0 "-" "curl/7.64.1" "-"
  344. ```
  345. #### Exec
  346. In this section you will verify the ability to [execute commands in a container](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#running-individual-commands-in-a-container).
  347. Print the nginx version by executing the `nginx -v` command in the `nginx` container:
  348. ```ShellSession
  349. kubectl exec -ti $POD_NAME -- nginx -v
  350. ```
  351. > Output
  352. ```ShellSession
  353. nginx version: nginx/1.19.1
  354. ```
  355. ### Kubernetes services
  356. #### Expose outside of the cluster
  357. In this section you will verify the ability to expose applications using a [Service](https://kubernetes.io/docs/concepts/services-networking/service/).
  358. Expose the `nginx` deployment using a [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport) service:
  359. ```ShellSession
  360. kubectl expose deployment nginx --port 80 --type NodePort
  361. ```
  362. > The LoadBalancer service type can not be used because your cluster is not configured with [cloud provider integration](https://kubernetes.io/docs/getting-started-guides/scratch/#cloud-provider). Setting up cloud provider integration is out of scope for this tutorial.
  363. Retrieve the node port assigned to the `nginx` service:
  364. ```ShellSession
  365. NODE_PORT=$(kubectl get svc nginx \
  366. --output=jsonpath='{range .spec.ports[0]}{.nodePort}')
  367. ```
  368. Create a firewall rule that allows remote access to the `nginx` node port:
  369. ```ShellSession
  370. gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-nginx-service \
  371. --allow=tcp:${NODE_PORT} \
  372. --network kubernetes-the-kubespray-way
  373. ```
  374. Retrieve the external IP address of a worker instance:
  375. ```ShellSession
  376. EXTERNAL_IP=$(gcloud compute instances describe worker-0 \
  377. --format 'value(networkInterfaces[0].accessConfigs[0].natIP)')
  378. ```
  379. Make an HTTP request using the external IP address and the `nginx` node port:
  380. ```ShellSession
  381. curl -I http://${EXTERNAL_IP}:${NODE_PORT}
  382. ```
  383. > Output
  384. ```ShellSession
  385. HTTP/1.1 200 OK
  386. Server: nginx/1.19.1
  387. Date: Thu, 13 Aug 2020 11:15:02 GMT
  388. Content-Type: text/html
  389. Content-Length: 612
  390. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  391. Connection: keep-alive
  392. ETag: "5f049a39-264"
  393. Accept-Ranges: bytes
  394. ```
  395. #### Local DNS
  396. We will now also verify that kubernetes built-in DNS works across namespaces.
  397. Create a namespace:
  398. ```ShellSession
  399. kubectl create namespace dev
  400. ```
  401. Create an nginx deployment and expose it within the cluster:
  402. ```ShellSession
  403. kubectl create deployment nginx --image=nginx -n dev
  404. kubectl expose deployment nginx --port 80 --type ClusterIP -n dev
  405. ```
  406. Run a temporary container to see if we can reach the service from the default
  407. namespace:
  408. ```ShellSession
  409. kubectl run curly -it --rm --image curlimages/curl:7.70.0 -- /bin/sh
  410. curl --head http://nginx.dev:80
  411. ```
  412. > Output
  413. ```ShellSession
  414. HTTP/1.1 200 OK
  415. Server: nginx/1.19.1
  416. Date: Thu, 13 Aug 2020 11:15:59 GMT
  417. Content-Type: text/html
  418. Content-Length: 612
  419. Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
  420. Connection: keep-alive
  421. ETag: "5f049a39-264"
  422. Accept-Ranges: bytes
  423. ```
  424. Type `exit` to leave the shell.
  425. ## Cleaning Up
  426. ### Kubernetes resources
  427. Delete the dev namespace, the nginx deployment and service:
  428. ```ShellSession
  429. kubectl delete namespace dev
  430. kubectl delete deployment nginx
  431. kubectl delete svc/nginx
  432. ```
  433. ### Kubernetes state
  434. Note: you can skip this step if you want to entirely remove the machines.
  435. If you want to keep the VMs and just remove the cluster state, you can simply
  436. run another Ansible playbook:
  437. ```ShellSession
  438. ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa reset.yml
  439. ```
  440. Resetting the cluster to the VMs original state usually takes about a couple
  441. of minutes.
  442. ### Compute instances
  443. Delete the controller and worker compute instances:
  444. ```ShellSession
  445. gcloud -q compute instances delete \
  446. controller-0 controller-1 controller-2 \
  447. worker-0 worker-1 worker-2 \
  448. --zone $(gcloud config get-value compute/zone)
  449. ```
  450. <!-- markdownlint-disable no-duplicate-heading -->
  451. ### Network
  452. <!-- markdownlint-enable no-duplicate-heading -->
  453. Delete the fixed IP addresses (assuming you named them equal to the VM names),
  454. if any:
  455. ```ShellSession
  456. gcloud -q compute addresses delete controller-0 controller-1 controller-2 \
  457. worker-0 worker-1 worker-2
  458. ```
  459. Delete the `kubernetes-the-kubespray-way` firewall rules:
  460. ```ShellSession
  461. gcloud -q compute firewall-rules delete \
  462. kubernetes-the-kubespray-way-allow-nginx-service \
  463. kubernetes-the-kubespray-way-allow-internal \
  464. kubernetes-the-kubespray-way-allow-external
  465. ```
  466. Delete the `kubernetes-the-kubespray-way` network VPC:
  467. ```ShellSession
  468. gcloud -q compute networks subnets delete kubernetes
  469. gcloud -q compute networks delete kubernetes-the-kubespray-way
  470. ```