How to create cross cloud self managed kubernetes cluster


While it’s fairly “trivial” to install a stacked kubernetes cluster with kubeadm on any cloud provider or managed bare metal (where you have a certain degree of management over the networking which permits you to use bgp for example), it’s not so trivial when your nodes are situated in different network segments (clouds) and/or behind NAT.

With this guide I will try to alleviate a pain related to this kind of setup.

Premise

In this scenario, we have 3 control plane nodes which are situated in different cloud providers and following applies:

  • every node is NOT aware of its public ip (it has a private ip address and 1:1 nat with an unknown external ip)
  • Their private ips are NOT routable between them, and potentially overlap (i.e. multiple cloud uses 10.0.0.0/24 range).
  • we have access to a load balancer (I used ha-proxy), which will be used for control plane port. In case you don’t have any, you can use round robin DNS but it’s not advised.
  • I am assuming we are using Ubuntu as underlying OS. In case you are using Centos or similar, you will have to adapt some sections.

We will also have some worker nodes which are distributed across our clouds, everything above applies as well.

Preparations

On every node perform following actions (hopefully you are deploying your infra with Terraform and you are able to use user-data script)

 1#!/bin/bash
 2
 3# Find suitable version for kubeadm from here: https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
 4export KUBE_VERSION=1.22.5-00
 5export DEBIAN_FRONTEND=noninteractive
 6
 7# some random dns resolution issue
 8until curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
 9do
10  echo "retrying download of k8s key"
11  sleep 5
12done
13echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list
14set -e
15
16echo "Backing up and deleting ip tables"
17mkdir -p /root/iptables-backup/
18mv /etc/iptables/rules.* /root/iptables-backup/
19iptables-save > /root/iptables-rules
20iptables --flush
21
22echo "Creating netplan"
23export PUBLIC_IP=$(curl -s checkip.amazonaws.com)
24cat << EOF > /etc/netplan/60-floating-ip.yaml
25network:
26  version: 2
27  renderer: networkd
28  bridges:
29    dummy0:
30      dhcp4: no
31      dhcp6: no
32      accept-ra: no
33      interfaces: [ ]
34      addresses:
35        - ${PUBLIC_IP}/32
36EOF
37
38echo "Installing Kubeadm"
39cat <<EOF | tee /etc/modules-load.d/k8s.conf
40br_netfilter
41EOF
42
43cat <<EOF | tee /etc/sysctl.d/k8s.conf
44net.bridge.bridge-nf-call-ip6tables = 1
45net.bridge.bridge-nf-call-iptables = 1
46EOF
47
48sysctl --system
49
50echo "Running apt update"
51until apt-get update
52do
53  echo "error during apt-get update"
54done
55apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
56apt-get install -y kubelet=${KUBE_VERSION} kubeadm=${KUBE_VERSION} kubectl=${KUBE_VERSION}
57apt-mark hold kubelet kubeadm kubectl
58
59echo "Installing Docker"
60mkdir -p /etc/docker/
61cat <<EOF | tee /etc/docker/daemon.json
62{
63  "exec-opts": ["native.cgroupdriver=systemd"],
64  "log-driver": "json-file",
65  "log-opts": {
66    "max-size": "100m",
67    "max-file": "3"
68  }
69}
70EOF
71
72echo "Installing docker"
73curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
74echo \
75  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
76  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
77
78apt-get update
79apt-get install -y docker-ce docker-ce-cli containerd.io
80apt-get dist-upgrade -y
81
82netplan apply

I will explain the script below.

Initial setup

# Find suitable version for kubeadm from here: https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
export KUBE_VERSION=1.22.5-00
export DEBIAN_FRONTEND=noninteractive

Define which version of the kubernetes should be installed, and the DEBIAN_FRONTEND=noninteractive will ensure that tasks like apt-get upgrade will not be blocked by asking for human intervention

Install kubernetes repo key

# some random dns resolution issue
until curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
do
  echo "retrying download of k8s key"
  sleep 5
done
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list
set -e

This block has 2 scopes:

  • Ensure that there is an internet connectivity (I had some issues where the user-data was fired before internet interface was up)
  • Install the kubernetes repository and related gpg key. (we will be using it a bit later)

Clean up existing iptables

echo "Backing up and deleting ip tables"
mkdir -p /root/iptables-backup/
mv /etc/iptables/rules.* /root/iptables-backup/
iptables-save > /root/iptables-rules
iptables --flush

If you are on the Oracle cloud, you will need to remove iptables rules since they would potentially interfere with rules created by kubernetes (to be fare, I still don’t understand the reasoning behind their inclusion since you own the underlying vpc and networking rules).

We also save a backup of existing rules to /root/iptables-rules

Public ip

echo "Creating netplan"
export PUBLIC_IP=$(curl -s checkip.amazonaws.com)
cat << EOF > /etc/netplan/60-floating-ip.yaml
network:
  version: 2
  renderer: networkd
  bridges:
    dummy0:
      dhcp4: no
      dhcp6: no
      accept-ra: no
      interfaces: [ ]
      addresses:
        - ${PUBLIC_IP}/32
EOF
netplan apply

As I said before, our clusters are not aware of their public ip, as consequence, by default, kubernetes will use their internal ip for the kubelet, and this will prevent other nodes (from other network segments) to reach one each other.

Kubernetes nodes can also have public_ip field, but unfortunately, the only way to set it up is through cloud provider. I thought about writing a “fake” cloud provider for this scope, but (at least for now) I’ve abandoned that idea.

So, instead, we are going to find our public ip (remember, in our scenario we have 1:1 nat) with PUBLIC_IP=$(curl -s checkip.amazonaws.com), and then we are going to create a dummy interface with netaddress PUBLIC_IP/32

We will be using this dummy interface to assign public ip as internal in kubelet later on

Kubelet network requirements

Straight from the official documentation

cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sysctl --system

Install kubeadm

Install and hold kubeadm and related components

until apt-get update
do
  echo "error during apt-get update"
done
apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
apt-get install -y kubelet=${KUBE_VERSION} kubeadm=${KUBE_VERSION} kubectl=${KUBE_VERSION}
apt-mark hold kubelet kubeadm kubectl

Docker options

mkdir -p /etc/docker/
cat <<EOF | tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}
EOF

We are going to be using Docker as the container runtime. In order to do so, we need to ensure that Docker’s cgroupdriver is the same as kubernetes

Since we are at it, also let’s define maximum size for log to avoid overfilling our hard drive with them.

Install Docker

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io

Final touches

apt-get dist-upgrade -y

Upgrade all packages (header included), and reboot the machine

Init first node

Cloud preparations

Control plane endpoint

We need to have a common endpoint for our HA to work. The best way to achieve it is to have a load balancer listening on tcp/6443 If your load balancer support https probes, then point it to /livez, otherwise simple tcp probe will do.

If you don’t have any Loadbalancer available, you can simply point dns record A to your first node for now.

Ports

For our setup we will need following ports:

  • tcp/6443 : control plane control port
  • tcp/2349-2380: etcd
  • tcp/10250: kubelet

Starting first control plane node

kubeadm init

Normally, if you had both private and public interface on your box you could use the --apiserver-advertise-address flag, and everything would work as it should.

Etcd uses the value of that flag as the listening address, and sadly, if you are using a provider as Oracle cloud, Scaleway, Online.net, etc, even though your box is reachable on the PUBLIC_IP, if a service binds to PUBLIC_IP:PORT, you won’t be able to reach it outside the node because internally provider will route to your private ip.

As consequence, we need to pass configuration file to kubeadm to force etcd to bind on the proper interface while exposing the public one.

 1export ENDPOINT=k8s.endpoint.dev:6443
 2export PUBLIC_IP=$(curl -s checkip.amazonaws.com)
 3export KUBECONFIG=/etc/kubernetes/admin.conf
 4
 5cat <<EOF > /etc/default/kubelet
 6KUBELET_EXTRA_ARGS="--node-ip=${PUBLIC_IP}"
 7EOF
 8
 9cat <<EOF > /root/kubeadm-config.yaml
10apiVersion: kubeadm.k8s.io/v1beta3
11kind: InitConfiguration
12localAPIEndpoint:
13  advertiseAddress: ${PUBLIC_IP}
14  bindPort: 6443
15---
16apiVersion: "kubeadm.k8s.io/v1beta3"
17kind: ClusterConfiguration
18networking:
19  podSubnet: "10.244.0.0/16"
20controlPlaneEndpoint: ${ENDPOINT}
21etcd:
22  local:
23    extraArgs:
24      listen-peer-urls: https://0.0.0.0:2380
25      listen-client-urls: https://0.0.0.0:2379
26      listen-metrics-urls: http://0.0.0.0:2381    
27controllerManager:
28  extraArgs:
29    bind-address: "0.0.0.0"
30scheduler:
31  extraArgs:
32    bind-address: "0.0.0.0"
33EOF
34
35kubeadm init --config=/root/kubeadm-config.yaml --upload-certs --v=5

If all went well, you should receive a message similar to this

1Your Kubernetes control-plane has initialized successfully!
2...
3  kubeadm join k8s.endpoint.dev:6443 --token hlpq3v.82***abrnw0z  \
4	--discovery-token-ca-cert-hash  sha256:4a0ff99bb3059bc6c38f5b1c227805aa344e1ec7c424f870c9d175b50801b1c9 \
5	--control-plane --certificate-key 48e47af7fa0e22b7f1e8d53***0af08e50f613734be7f4cba731733f3b83
6...

Keep a note of the variables above, we will need them shortly

Verify that everything is working fine (coredns pods will be pending because our node is no ready yet, we will get there)

1
2# kubectl get nodes -o wide
3NAME            STATUS     ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
4k8s-master-01   NotReady   control-plane,master   66s   v1.22.2   172.24.246.28   <none>        Ubuntu 20.04.3 LTS   5.11.0-1019-oracle   docker://20.10.8

cni

At this point if you view your node, it will be marked as Not Ready due to missing cni.

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

If everything went well, you should see the following picture:

1# kubectl get nodes -o wide
2NAME            STATUS   ROLES                  AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
3k8s-master-01   Ready    control-plane,master   2m5s   v1.22.2   172.24.246.28   <none>        Ubuntu 20.04.3 LTS   5.11.0-1019-oracle   docker://20.10.8

Other control planes

 1export ENDPOINT=k8s.endpoint.dev:6443
 2export PUBLIC_IP=$(curl -s checkip.amazonaws.com)
 3export TOKEN=hlpq3v.82***abrnw0z
 4export CA_CERT_HASH=sha256:4a0ff99bb3059bc6c38f5b1c227805aa344e1ec7c424f870c9d175b50801b1c9
 5export CERT_KEY=48e47af7fa0e22b7f1e8d53***0af08e50f613734be7f4cba731733f3b83
 6export KUBECONFIG=/etc/kubernetes/admin.conf
 7
 8cat <<EOF > /etc/default/kubelet
 9KUBELET_EXTRA_ARGS="--node-ip=${PUBLIC_IP}"
10EOF
11
12cat <<EOF > /root/kubeadm-config.yaml
13---
14apiVersion: kubeadm.k8s.io/v1beta3
15kind: JoinConfiguration
16caCertPath: /etc/kubernetes/pki/ca.crt
17discovery:
18  bootstrapToken:
19    apiServerEndpoint: ${ENDPOINT}
20    token: ${TOKEN}
21    caCertHashes: 
22      - ${CA_CERT_HASH}
23    unsafeSkipCAVerification: false
24controlPlane:
25  certificateKey: ${CERT_KEY}
26  localAPIEndpoint: 
27    advertiseAddress: ${PUBLIC_IP}
28    bindPort: 6443
29---
30apiVersion: "kubeadm.k8s.io/v1beta3"
31kind: ClusterConfiguration
32controlPlaneEndpoint: ${ENDPOINT}
33etcd:
34  local:    
35    extraArgs:
36      listen-peer-urls: https://0.0.0.0:2380
37      listen-client-urls: https://0.0.0.0:2379
38      listen-metrics-urls: http://0.0.0.0:2381
39controllerManager:
40  extraArgs:
41    bind-address: "0.0.0.0"
42scheduler:
43  extraArgs:
44    bind-address: "0.0.0.0"      
45---
46apiVersion: kubelet.config.k8s.io/v1beta1
47kind: KubeletConfiguration
48serverTLSBootstrap: true
49EOF
50
51kubeadm join --config=/root/kubeadm-config.yaml --v=5

Worker nodes

1export PUBLIC_IP=$(curl -s checkip.amazonaws.com)
2
3cat <<EOF > /etc/default/kubelet
4KUBELET_EXTRA_ARGS="--node-ip=${PUBLIC_IP}"
5EOF
6
7kubeadm join k8s.endpoint.dev:6443 --token hlpq3v.82***abrnw0z --discovery-token-ca-cert-hash  sha256:4a0ff99bb3059bc6c38f5b1c227805aa344e1ec7c424f870c9d175b50801b1c9

remove taint

This step is optional, in my setup I have limited amount of nodes, so I do want to schedule workloads on my control plane nodes.

You can skip this, just remember to add tolerances to anything you deploy from this step onwards in the guide.

1# kubectl taint node --all node-role.kubernetes.io/master:NoSchedule-
2node/k8s-master-01 untainted
3node/k8s-master-02 untainted
4node/k8s-master-03 untainted

Taint workers with small ram

kubectl get nodes -l “kubernetes.io/arch=amd64” -o name | xargs -I “{}” kubectl taint nodes “{}” lowmemory=1:NoSchedule

Copyright

Comments