You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

431 lines
16 KiB

2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
  1. # Calico
  2. Check if the calico-node container is running
  3. ```ShellSession
  4. docker ps | grep calico
  5. ```
  6. The **calicoctl.sh** is wrap script with configured access credentials for command calicoctl allows to check the status of the network workloads.
  7. * Check the status of Calico nodes
  8. ```ShellSession
  9. calicoctl.sh node status
  10. ```
  11. * Show the configured network subnet for containers
  12. ```ShellSession
  13. calicoctl.sh get ippool -o wide
  14. ```
  15. * Show the workloads (ip addresses of containers and their location)
  16. ```ShellSession
  17. calicoctl.sh get workloadEndpoint -o wide
  18. ```
  19. and
  20. ```ShellSession
  21. calicoctl.sh get hostEndpoint -o wide
  22. ```
  23. ## Configuration
  24. ### Optional : Define datastore type
  25. The default datastore, Kubernetes API datastore is recommended for on-premises deployments, and supports only Kubernetes workloads; etcd is the best datastore for hybrid deployments.
  26. Allowed values are `kdd` (default) and `etcd`.
  27. Note: using kdd and more than 50 nodes, consider using the `typha` daemon to provide scaling.
  28. To re-define you need to edit the inventory and add a group variable `calico_datastore`
  29. ```yml
  30. calico_datastore: kdd
  31. ```
  32. ### Optional : Define network backend
  33. In some cases you may want to define Calico network backend. Allowed values are `bird`, `vxlan` or `none`. `vxlan` is the default value.
  34. To re-define you need to edit the inventory and add a group variable `calico_network_backend`
  35. ```yml
  36. calico_network_backend: none
  37. ```
  38. ### Optional : Define the default pool CIDRs
  39. By default, `kube_pods_subnet` is used as the IP range CIDR for the default IP Pool, and `kube_pods_subnet_ipv6` for IPv6.
  40. In some cases you may want to add several pools and not have them considered by Kubernetes as external (which means that they must be within or equal to the range defined in `kube_pods_subnet` and `kube_pods_subnet_ipv6` ), it starts with the default IP Pools of which IP range CIDRs can by defined in group_vars (k8s_cluster/k8s-net-calico.yml):
  41. ```ShellSession
  42. calico_pool_cidr: 10.233.64.0/20
  43. calico_pool_cidr_ipv6: fd85:ee78:d8a6:8607::1:0000/112
  44. ```
  45. ### Optional : BGP Peering with border routers
  46. In some cases you may want to route the pods subnet and so NAT is not needed on the nodes.
  47. For instance if you have a cluster spread on different locations and you want your pods to talk each other no matter where they are located.
  48. The following variables need to be set as follow:
  49. ```yml
  50. peer_with_router: true # enable the peering with the datacenter's border router (default value: false).
  51. nat_outgoing: false # (optional) NAT outgoing (default value: true).
  52. ```
  53. And you'll need to edit the inventory and add a hostvar `local_as` by node.
  54. ```ShellSession
  55. node1 ansible_ssh_host=95.54.0.12 local_as=xxxxxx
  56. ```
  57. ### Optional : Defining BGP peers
  58. Peers can be defined using the `peers` variable (see docs/calico_peer_example examples).
  59. In order to define global peers, the `peers` variable can be defined in group_vars with the "scope" attribute of each global peer set to "global".
  60. In order to define peers on a per node basis, the `peers` variable must be defined in hostvars or group_vars with the "scope" attribute unset or set to "node".
  61. NB: Ansible's `hash_behaviour` is by default set to "replace", thus defining both global and per node peers would end up with having only per node peers. If having both global and per node peers defined was meant to happen, global peers would have to be defined in hostvars for each host (as well as per node peers)
  62. NB²: Peers definition at node scope can be customized with additional fields `filters`, `sourceAddress` and `numAllowedLocalASNumbers` (see <https://docs.tigera.io/calico/latest/reference/resources/bgppeer> for details)
  63. Since calico 3.4, Calico supports advertising Kubernetes service cluster IPs over BGP, just as it advertises pod IPs.
  64. This can be enabled by setting the following variable as follow in group_vars (k8s_cluster/k8s-net-calico.yml)
  65. ```yml
  66. calico_advertise_cluster_ips: true
  67. ```
  68. Since calico 3.10, Calico supports advertising Kubernetes service ExternalIPs over BGP in addition to cluster IPs advertising.
  69. This can be enabled by setting the following variable in group_vars (k8s_cluster/k8s-net-calico.yml)
  70. ```yml
  71. calico_advertise_service_external_ips:
  72. - x.x.x.x/24
  73. - y.y.y.y/32
  74. ```
  75. ### Optional : Define global AS number
  76. Optional parameter `global_as_num` defines Calico global AS number (`/calico/bgp/v1/global/as_num` etcd key).
  77. It defaults to "64512".
  78. ### Optional : BGP Peering with route reflectors
  79. At large scale you may want to disable full node-to-node mesh in order to
  80. optimize your BGP topology and improve `calico-node` containers' start times.
  81. To do so you can deploy BGP route reflectors and peer `calico-node` with them as
  82. recommended here:
  83. * <https://hub.docker.com/r/calico/routereflector/>
  84. * <https://docs.projectcalico.org/v3.1/reference/private-cloud/l3-interconnect-fabric>
  85. You need to edit your inventory and add:
  86. * `calico_rr` group with nodes in it. `calico_rr` can be combined with
  87. `kube_node` and/or `kube_control_plane`.
  88. * `cluster_id` by route reflector node/group (see details [here](https://hub.docker.com/r/calico/routereflector/))
  89. Here's an example of Kubespray inventory with standalone route reflectors:
  90. ```ini
  91. [all]
  92. rr0 ansible_ssh_host=10.210.1.10 ip=10.210.1.10
  93. rr1 ansible_ssh_host=10.210.1.11 ip=10.210.1.11
  94. node2 ansible_ssh_host=10.210.1.12 ip=10.210.1.12
  95. node3 ansible_ssh_host=10.210.1.13 ip=10.210.1.13
  96. node4 ansible_ssh_host=10.210.1.14 ip=10.210.1.14
  97. node5 ansible_ssh_host=10.210.1.15 ip=10.210.1.15
  98. [kube_control_plane]
  99. node2
  100. node3
  101. [etcd]
  102. node2
  103. node3
  104. node4
  105. [kube_node]
  106. node2
  107. node3
  108. node4
  109. node5
  110. [calico_rr]
  111. rr0
  112. rr1
  113. [rack0]
  114. rr0
  115. rr1
  116. node2
  117. node3
  118. node4
  119. node5
  120. [rack0:vars]
  121. cluster_id="1.0.0.1"
  122. calico_rr_id=rr1
  123. calico_group_id=rr1
  124. ```
  125. The inventory above will deploy the following topology assuming that calico's
  126. `global_as_num` is set to `65400`:
  127. ![Image](figures/kubespray-calico-rr.png?raw=true)
  128. ### Optional : Define default endpoint to host action
  129. By default Calico blocks traffic from endpoints to the host itself by using an iptables DROP action. When using it in kubernetes the action has to be changed to RETURN (default in kubespray) or ACCEPT (see <https://docs.tigera.io/calico/latest/network-policy/hosts/protect-hosts#control-default-behavior-of-workload-endpoint-to-host-traffic> ) Otherwise all network packets from pods (with hostNetwork=False) to services endpoints (with hostNetwork=True) within the same node are dropped.
  130. To re-define default action please set the following variable in your inventory:
  131. ```yml
  132. calico_endpoint_to_host_action: "ACCEPT"
  133. ```
  134. ### Optional : Define address on which Felix will respond to health requests
  135. Since Calico 3.2.0, HealthCheck default behavior changed from listening on all interfaces to just listening on localhost.
  136. To re-define health host please set the following variable in your inventory:
  137. ```yml
  138. calico_healthhost: "0.0.0.0"
  139. ```
  140. ### Optional : Configure VXLAN hardware Offload
  141. The VXLAN Offload is disable by default. It can be configured like this to enabled it:
  142. ```yml
  143. calico_feature_detect_override: "ChecksumOffloadBroken=false" # The vxlan offload will enabled (It may cause problem on buggy NIC driver)
  144. ```
  145. ### Optional : Configure Calico Node probe timeouts
  146. Under certain conditions a deployer may need to tune the Calico liveness and readiness probes timeout settings. These can be configured like this:
  147. ```yml
  148. calico_node_livenessprobe_timeout: 10
  149. calico_node_readinessprobe_timeout: 10
  150. ```
  151. ### Optional : Enable NAT with IPv6
  152. To allow outgoing IPv6 traffic going from pods to the Internet, enable the following:
  153. ```yml
  154. nat_outgoing_ipv6: true # NAT outgoing ipv6 (default value: false).
  155. ```
  156. ## Config encapsulation for cross server traffic
  157. Calico supports two types of encapsulation: [VXLAN and IP in IP](https://docs.projectcalico.org/v3.11/networking/vxlan-ipip). VXLAN is the more mature implementation and enabled by default, please check your environment if you need *IP in IP* encapsulation.
  158. *IP in IP* and *VXLAN* is mutually exclusive modes.
  159. Kubespray defaults have changed after version 2.18 from auto-enabling `ipip` mode to auto-enabling `vxlan`. This was done to facilitate wider deployment scenarios including those where vxlan acceleration is provided by the underlying network devices.
  160. If you are running your cluster with the default calico settings and are upgrading to a release post 2.18.x (i.e. 2.19 and later or `master` branch) then you have two options:
  161. * perform a manual migration to vxlan before upgrading kubespray (see migrating from IP in IP to VXLAN below)
  162. * pin the pre-2.19 settings in your ansible inventory (see IP in IP mode settings below)
  163. **Note:**: Vxlan in ipv6 only supported when kernel >= 3.12. So if your kernel version < 3.12, Please don't set `calico_vxlan_mode_ipv6: Always`. More details see [#Issue 6877](https://github.com/projectcalico/calico/issues/6877).
  164. ### IP in IP mode
  165. To configure Ip in Ip mode you need to use the bird network backend.
  166. ```yml
  167. calico_ipip_mode: 'Always' # Possible values is `Always`, `CrossSubnet`, `Never`
  168. calico_vxlan_mode: 'Never'
  169. calico_network_backend: 'bird'
  170. ```
  171. ### BGP mode
  172. To enable BGP no-encapsulation mode:
  173. ```yml
  174. calico_ipip_mode: 'Never'
  175. calico_vxlan_mode: 'Never'
  176. calico_network_backend: 'bird'
  177. ```
  178. ### Migrating from IP in IP to VXLAN
  179. If you would like to migrate from the old IP in IP with `bird` network backends default to the new VXLAN based encapsulation you need to perform this change before running an upgrade of your cluster; the `cluster.yml` and `upgrade-cluster.yml` playbooks will refuse to continue if they detect incompatible settings.
  180. Execute the following steps on one of the control plane nodes, ensure the cluster in healthy before proceeding.
  181. ```shell
  182. calicoctl.sh patch felixconfig default -p '{"spec":{"vxlanEnabled":true}}'
  183. calicoctl.sh patch ippool default-pool -p '{"spec":{"ipipMode":"Never", "vxlanMode":"Always"}}'
  184. ```
  185. **Note:** if you created multiple ippools you will need to patch all of them individually to change their encapsulation. The kubespray playbooks only handle the default ippool created by kubespray.
  186. Wait for the `vxlan.calico` interfaces to be created on all cluster nodes and traffic to be routed through it then you can disable `ipip`.
  187. ```shell
  188. calicoctl.sh patch felixconfig default -p '{"spec":{"ipipEnabled":false}}'
  189. ```
  190. ## Configuring interface MTU
  191. This is an advanced topic and should usually not be modified unless you know exactly what you are doing. Calico is smart enough to deal with the defaults and calculate the proper MTU. If you do need to set up a custom MTU you can change `calico_veth_mtu` as follows:
  192. * If Wireguard is enabled, subtract 60 from your network MTU (i.e. 1500-60=1440)
  193. * If using VXLAN or BPF mode is enabled, subtract 50 from your network MTU (i.e. 1500-50=1450)
  194. * If using IPIP, subtract 20 from your network MTU (i.e. 1500-20=1480)
  195. * if not using any encapsulation, set to your network MTU (i.e. 1500 or 9000)
  196. ```yaml
  197. calico_veth_mtu: 1440
  198. ```
  199. ## Cloud providers configuration
  200. Please refer to the official documentation, for example [GCE configuration](http://docs.projectcalico.org/v1.5/getting-started/docker/installation/gce) requires a security rule for calico ip-ip tunnels. Note, calico is always configured with ``calico_ipip_mode: Always`` if the cloud provider was defined.
  201. ### Optional : Ignore kernel's RPF check setting
  202. By default the felix agent(calico-node) will abort if the Kernel RPF setting is not 'strict'. If you want Calico to ignore the Kernel setting:
  203. ```yml
  204. calico_node_ignorelooserpf: true
  205. ```
  206. Note that in OpenStack you must allow `ipip` traffic in your security groups,
  207. otherwise you will experience timeouts.
  208. To do this you must add a rule which allows it, for example:
  209. ### Optional : Felix configuration via extraenvs of calico node
  210. Possible environment variable parameters for [configuring Felix](https://docs.projectcalico.org/reference/felix/configuration)
  211. ```yml
  212. calico_node_extra_envs:
  213. FELIX_DEVICEROUTESOURCEADDRESS: 172.17.0.1
  214. ```
  215. ```ShellSession
  216. neutron security-group-rule-create --protocol 4 --direction egress k8s-a0tp4t
  217. neutron security-group-rule-create --protocol 4 --direction igress k8s-a0tp4t
  218. ```
  219. ### Optional : Use Calico CNI host-local IPAM plugin
  220. Calico currently supports two types of CNI IPAM plugins, `host-local` and `calico-ipam` (default).
  221. To allow Calico to determine the subnet to use from the Kubernetes API based on the `Node.podCIDR` field, enable the following setting.
  222. ```yml
  223. calico_ipam_host_local: true
  224. ```
  225. Refer to Project Calico section [Using host-local IPAM](https://docs.projectcalico.org/reference/cni-plugin/configuration#using-host-local-ipam) for further information.
  226. ### Optional : Disable CNI logging to disk
  227. Calico CNI plugin logs to /var/log/calico/cni/cni.log and to stderr.
  228. stderr of CNI plugins can be found in the logs of container runtime.
  229. You can disable Calico CNI logging to disk by setting `calico_cni_log_file_path: false`.
  230. ## eBPF Support
  231. Calico supports eBPF for its data plane see [an introduction to the Calico eBPF Dataplane](https://www.projectcalico.org/introducing-the-calico-ebpf-dataplane/) for further information.
  232. Note that it is advisable to always use the latest version of Calico when using the eBPF dataplane.
  233. ### Enabling eBPF support
  234. To enable the eBPF dataplane support ensure you add the following to your inventory. Note that the `kube-proxy` is incompatible with running Calico in eBPF mode and the kube-proxy should be removed from the system.
  235. ```yaml
  236. calico_bpf_enabled: true
  237. ```
  238. **NOTE:** there is known incompatibility in using the `kernel-kvm` kernel package on Ubuntu OSes because it is missing support for `CONFIG_NET_SCHED` which is a requirement for Calico eBPF support. When using Calico eBPF with Ubuntu ensure you run the `-generic` kernel.
  239. ### Cleaning up after kube-proxy
  240. Calico node cannot clean up after kube-proxy has run in ipvs mode. If you are converting an existing cluster to eBPF you will need to ensure the `kube-proxy` DaemonSet is deleted and that ipvs rules are cleaned.
  241. To check that kube-proxy was running in ipvs mode:
  242. ```ShellSession
  243. # ipvsadm -l
  244. ```
  245. To clean up any ipvs leftovers:
  246. ```ShellSession
  247. # ipvsadm -C
  248. ```
  249. ### Calico access to the kube-api
  250. Calico node, typha and kube-controllers need to be able to talk to the kubernetes API. Please reference the [Enabling eBPF Calico Docs](https://docs.projectcalico.org/maintenance/ebpf/enabling-bpf) for guidelines on how to do this.
  251. Kubespray sets up the `kubernetes-services-endpoint` configmap based on the contents of the `loadbalancer_apiserver` inventory variable documented in [HA Mode](/docs/operations/ha-mode.md).
  252. If no external loadbalancer is used, Calico eBPF can also use the localhost loadbalancer option. We are able to do so only if you use the same port for the localhost apiserver loadbalancer and the kube-apiserver. In this case Calico Automatic Host Endpoints need to be enabled to allow services like `coredns` and `metrics-server` to communicate with the kubernetes host endpoint. See [this blog post](https://www.projectcalico.org/securing-kubernetes-nodes-with-calico-automatic-host-endpoints/) on enabling automatic host endpoints.
  253. ### Tunneled versus Direct Server Return
  254. By default Calico uses Tunneled service mode but it can use direct server return (DSR) in order to optimize the return path for a service.
  255. To configure DSR:
  256. ```yaml
  257. calico_bpf_service_mode: "DSR"
  258. ```
  259. ### eBPF Logging and Troubleshooting
  260. In order to enable Calico eBPF mode logging:
  261. ```yaml
  262. calico_bpf_log_level: "Debug"
  263. ```
  264. To view the logs you need to use the `tc` command to read the kernel trace buffer:
  265. ```ShellSession
  266. tc exec bpf debug
  267. ```
  268. Please see [Calico eBPF troubleshooting guide](https://docs.projectcalico.org/maintenance/troubleshoot/troubleshoot-ebpf#ebpf-program-debug-logs).
  269. ## Wireguard Encryption
  270. Calico supports using Wireguard for encryption. Please see the docs on [encrypt cluster pod traffic](https://docs.projectcalico.org/security/encrypt-cluster-pod-traffic).
  271. To enable wireguard support:
  272. ```yaml
  273. calico_wireguard_enabled: true
  274. ```
  275. The following OSes will require enabling the EPEL repo in order to bring in wireguard tools:
  276. * CentOS 8
  277. * AlmaLinux 8
  278. * Rocky Linux 8
  279. * Amazon Linux 2
  280. ```yaml
  281. epel_enabled: true
  282. ```