RKE Installation Or: Becoming A Rancher

RKE Installation Or: Becoming A Rancher

The mousepad of my notebook breaking created the perfect opportunity to finally put into practice a long-held plan: to install my very own, bare-metal Kubernetes cluster. The Rancher Kubernetes Engine turned out to be a great fit for that because, since it’s all Docker-based, its installation and operation is comparatively simple. In this blog post, you’ll get introduced to the installation process as well as the preparation work preceding it.

Recently, I’ve done a little experiment: How useful is a fake Chinese MacBook Air called an AirBook? Not very, as it turns out, because very shortly after having started using it, its mouse pad stopped working. So what could you do with a laptop having a dysfunctional mouse pad? Obvious: You could install an operating system on it that does not require a mouse pad, such as a Linux server operating system. And then you could go one step further and take two other machines on top to create a three-node, RKE-based Kubernetes cluster, in order to then write a blog post on the installation of RKE plus the necessary preparation steps…

What Is RKE?

The Rancher Kubernetes Engine or RKE in short is a Kubernetes distribution by Rancher that aims to reduce the complexity of the Kubernetes installation process by running all components – both the pieces used during installation and all components of the resulting cluster itself – in Docker containers. Another advantage on top of the simplified installation process is that, being Docker-based, the installation and operation of RKE is independent of the underlying operating system – as long as it can run Docker (or at least a reasonably recent version of it), it can run RKE.

Before We Get Started…

You might be wondering about the reason behind writing a blog post on the installation of RKE in view of the requirements and installation instructions given on Rancher’s RKE website. Indeed, both topics are well-documented there, but despite this, when following the given steps, I quickly found myself looking for information in different sources and troubleshooting a couple of things that weren’t going quite so smoothly, and so I’ve decided to provide a short summary of all preparation work and the installation steps hoping that, thanks to it, you’ll spend less time on the RKE installation than I did.

My RKE cluster is based on Fedora Server 33, and thus, in the following sections, all commands, outputs, file locations etc. refer to Fedora Server as a representative of a RedHat-based Linux distribution.

Preparation Work

Before the installation can begin, all machines supposed to join the RKE cluster have to be set up correctly, and the following sections will guide you through the steps necessary to achieve the correct setup. My assumption is that you already have at least one machine at your disposal that runs a Linux server operating system, such as CentOS or Fedora Server.

Detour: Fedora Server Specifics

In case you’ve just installed Fedora Server as the base for running RKE, the following two sections might be helpful for you.

Extend Root Partition

After I had freshly installed Fedora Server 33, I noticed the installer – which was configured for automatic storage partitioning – had only used a small percentage of the available disk space to allocate to the root fedora_fedora volume, namely, 15 GB. According to a comment on the accepted answer here, automatically partitioned Fedora Server systems will received only 15 GB for their root logical volume by default. Thus, your very first step after a fresh install might be to expand this volume.

# Check volume groups and their sizes
# (Already displays the correct size in my case, immediately after installation, this will show '15g' for 'VSize')
$ vgs 
VG            #PV #LV #SN Attr   VSize    VFree
fedora_fedora   1   1   0 wz--n- <930.51g 520.00m

# Check name of file system partition mounted to root 
# (Your numbers will be different on a fresh install -- the important part here is '/dev/mapper/fedora_fedora-root')
$ df -h /
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/fedora_fedora-root  930G   14G  917G   2% /

# Resize root logical volume by whatever size you whish to assign, for example:
$ lvresize -L +915G --resizefs /dev/mapper/fedora_fedora-root

After having performed those three steps, you can use the df -h / command again to view the size of the fedora_fedora-root file system partition, and the size should now equal 15 GB plus whatever amount of GB you have specified in the expansion command.

On WiFi-Only Systems: Connect To WiFi

The aforementioned fake MacBook Air supposed to be one node in the RKE cluster has a very slim housing – too slim for an Ethernet port to make an appearance. After the installation of Fedora Server, this turned out to be a problem – I had configured the installer to use my WiFi, but after the installation was finished, the system wouldn’t receive an IP for its wlp2s0 device (your WiFi device might well carry a different name, of course). The logs of the NetworkManager service were full of the following error messages:

<warn>  [1644588109.4048] device (wlp2s0): re-acquiring supplicant interface (#1).
<error> [1644588112.4078] device (wlp2s0): Couldn't initialize supplicant interface: Failed to D-Bus activate wpa_supplicant service

And, finally:

<info>  [1644588164.4017] device (wlp2s0): supplicant interface keeps failing, giving up

According to this bug report, Fedora Server versions 31 through 34 beta do not include the wpa_supplicant service by default, which NetworkManager is out-of-the-box configured to rely on as a backend for wireless connections. Fortunately, the operating system does include iwd, it only has to be enabled, and NetworkManager has to be configured to use it as its backend:

# Start 'iwd' and make sure it auto-starts after a reboot
$ systemctl start iwd
$ systemctl enable iwd

# Configure NetworkManager to use 'iwd' as backend
# (In /etc/NetworkManager/conf.d/nm.conf)
[device]
wifi.backend=iwd

# Hence:
$ cat /etc/NetworkManager/conf.d/nm.conf
[device]
wifi.backend=iwd

# Restart NetworkManager
$ systemctl restart NetworkManager

After NetworkManager has been configured and restarted, its CLI can be used to join a WiFi access point:

# View list of available WiFi access points 
# (Take note of the SSID of the access point you would like to connect to)
$ nmcli dev wifi list

# Connect to WiFi using SSID from output of previous command
# ('--ask' will ask you for the password on STDIN so it won't appear in the session history)
$ nmcli dev wifi connect <SSID> --ask

# Get name of connection NetworkManager has created
$ nmcli con show

# Use connection name to enable auto-connect on machine startup
# (If connection name contains blank, encapsulate it in single quotes in this command)
$ nmcli con modify <connection name> connection.autoconnect yes

Installing Docker

The installation of Docker is rather straightforward. The steps you find below are a slightly condensed version of what you can find in Docker’s official installation instructions:

# (Prefix all following commands with 'sudo' in case you're not root)
# Configure repository
$ dnf -y install dnf-plugins-core
$ dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo

# Install Docker engine
$ dnf install docker-ce docker-ce-cli containerd.io

# Start and enable Docker
$ systemctl start docker
$ systemctl enable docker

# Verify Docker works correctly
$ docker run hello-world

In case this very last command shows Hello from Docker! somewhere in its output, it’s a pretty solid sign your Docker installation works as intended.

Adding A New User

It’s not desirable to use the root user to run RKE, so we’ll add a new, dedicated user and also add it to the docker group:

# Create new user, aptly called 'rke', including the creation of a home directory, and set password
$ useradd -m rke
$ passwd rke

# Add new user to groups 'docker' and 'wheel' 
$ usermod -aG wheel rke
$ usermod -aG docker rke

Keep in mind the RKE installer you’re going to run from your local machine will have to SSH into the rke account on the server, so you’ll want to make sure password-less SSH is possible.

Configuring Firewall And SELinux

Port 6443 is the Kube API port, which has to be accessible on the server machine for the RKE installer to finish successfully. Another port that should be opened is 10250, which is the default port for the Kubernetes kubelet – while not necessary for the installation to succeed, you might want to interact with your freshly installed Kubernetes cluster later, and some of those interactions (viewing a Pod’s logs, for example) – require the kubelet port to be accessible, so while we’re at it, let’s open both ports:

# Configure and reload firewall
$ firewall-cmd --add-port=6443/tcp --add-port=10250/tcp --permanent
$ firewall-cmd --reload

# Configure SELinux to allow communication to pass on those ports
$ semanage port -a -t http_port_t -p tcp 6443
$ semanage port -a -t http_port_t -p tcp 10250

A short note on SELinux: From the experience of running and maintaining a little web server exposed to the public Internet, I know SELinux can be a bit of a pain in the bottom sometimes, and you may have made similar experiences. Nonetheless, please do not go for the easy path of simply disabling SELinux – it’s probably one of the worst things you can do to a Linux server, and even if it only sits in your home network unexposed, at least set SELinux to Permissive mode to observe and learn from its logs (which you can find in /var/log/audit.log on most Linux distributions).

… And Three Minor Steps

The three last steps of our preparation work are to apply a sysctl setting and to make two configuration changes to sshd.

Set bridge-nf-call-iptables to 1. This is necessary so the system’s IP tables can see, and thus act on, Kubernetes’ bridged network traffic. You can enable this behavior by creating a vendor- or subject-related file in /usr/lib/sysctl.d, for example:

$ echo "net.bridge.bridge-nf-call-iptables=1" > /usr/lib/sysctl.d/51-rke-net.conf

Enable TCP forwarding for sshd. The configuration option in question, AllowTcpForwarding, is in the context of SSH port forwarding, which is a mechanism SSH uses to tunnel application ports from the client machine to the server machine or vice-versa. In our case, this option has to be enabled on the server side by specifying AllowTcpForwarding yes in the /etc/ssh/sshd_config file.

Add ssh-rsa to accepted SSH public key types. The ssh-rsa key type has to be added to the SSH daemon’s config on the server side for the RKE installer to be able to establish a connection. Without that additional accepted key type, you’ll see an error message akin to the following on your local machine running the installer:

handshake failed: ssh: unable to authenticate, attempted methods [publickey], no supported methods remain

On the server side, this will manifest in the form of the following error message in the sshd logs:

userauth_pubkey: key type ssh-rsa not in PubkeyAcceptedKeyTypes [preauth]

To solve this problem, add ssh-rsa to the PubkeyAcceptedKeyTypes field in the /etc/crypto-policies/back-ends/opensshserver.config file.

Finally, let’s do a quick reboot to make sure all changes get applied, and once the server has come back up, we can finally turn our attention to the installation process itself.

RKE Installation Steps

In the following, we’ll walk through the steps described on the Rancher website, which I’m going to extend with some additional explanations and a sample installer configuration.

Installation And Installer Execution Mode

First, you’ll want to download the rke executable to your local machine. All releases can be found on their GitHub, from where you can download the file you need, rename it to rke, and put it somewhere in your PATH. Alternatively, if you’re on MacOS, rke can be installed via Homebrew by running brew install rke.

Fortunately for us, the Rancher team have made the RKE installation by means of this installer very simple (given that the machines supposed to run RKE have been configured correctly). The idea behind it is to provide it with a configuration file, called cluster.yml, which can be either hand-crafted or created by the installer itself as a result of a bunch of questions it asks you if you run the rke config command. Based on that information, the installer will connect to all remote machines and perform its installation steps (if you’ve worked with Ansible before, the idea of providing a locally-running “orchestrator” with a combination of tasks to perform on a set of remote machines plus the configuration required to do so might sound familiar, except with the rke executable, the tasks are baked into it, and we only provide the configuration).

Creating The Installer Config File

We’ll use the very convenient “questionnaire” provided by the installer in the form of the rke config command. The following shows the complete list of questions the installer asks as well as the value I’ve provided in cases that deviate from the default. I’ve also added in some comments where appropriate – the installer obviously won’t show those.

$ rke config --name cluster.yml
# In cases where you see a blank after the colon, I've gone for the default value suggested by the installer
[+] Cluster Level SSH Private Key Path [~/.ssh/id_rsa]:
# If > 1, the installer will ask a subset of the ensuing questions for each host, 
# but for brevity, I've only included the questions for one host
[+] Number of Hosts [1]: 3
# Host-specific questions start here -- installer will iterate through these for each host
[+] SSH Address of host (1) [none]: 192.168.8.119
[+] SSH Port of host (1) [22]:
[+] SSH Private Key Path of host (192.168.8.119) [none]:
[-] You have entered empty SSH key path, trying fetch from SSH key parameter
[+] SSH Private Key of host (192.168.8.119) [none]:
[-] You have entered empty SSH key, defaulting to cluster level SSH key: ~/.ssh/id_rsa
# Remember the dedicated 'rke' user we created during prep work? This is where it's used
[+] SSH User of host (192.168.8.119) [ubuntu]: rke
# Might want to set this to 'n' for other hosts
[+] Is host (192.168.8.119) a Control Plane host (y/n)? [y]: y
# In case of a large cluster, it may be reasonable to have dedicated control plane and etcd hosts that are not worker nodes
[+] Is host (192.168.8.119) a Worker host (y/n)? [n]: y
# Might want to set this to 'n' for other hosts
[+] Is host (192.168.8.119) an etcd host (y/n)? [n]: y
[+] Override Hostname of host (192.168.8.119) [none]: kube-1
[+] Internal IP of host (192.168.8.119) [none]:
[+] Docker socket path on host (192.168.8.119) [/var/run/docker.sock]:
# Host-specific questions end here
# [Iterations through host-specific questions if n(hosts)>1]
# Cluster-global questions start here
[+] Network Plugin Type (flannel, calico, weave, canal, aci) [canal]: flannel
[+] Authentication Strategy [x509]:
[+] Authorization Mode (rbac, none) [rbac]:
[+] Kubernetes Docker image [rancher/hyperkube:v1.22.6-rancher1]:
[+] Cluster domain [cluster.local]:
[+] Service Cluster IP Range [10.43.0.0/16]:
[+] Enable PodSecurityPolicy [n]:
[+] Cluster Network CIDR [10.42.0.0/16]:
[+] Cluster DNS Service IP [10.43.0.10]:
[+] Add addon manifest URLs or YAML files [no]:

After having walked through all questions, the installer will create the cluster.yml file in your current working directory. As soon as the config file is ready, the actual installation can start.

Running The Installer

With the cluster.yml file in place, let’s run the installer (it will assume the file to be present in its current work directory by default):

$ rke up

This will connect to all of your machines and execute – roughly – the following steps:

  1. Establish SSH tunnels to all configured servers
  2. Create and distribute certificates for Kubernetes to cluster nodes
  3. Distribute audit policy file to all control plane nodes
  4. Set up Kubernetes control plane & perform health checks (cluster up from here on)
  5. Create a couple of RBAC-related objects (ServiceAccount, ClusterRole, ClusterRoleBinding)
  6. Set up Kubernetes worker plane
  7. Install and set up selected network plugin (flannel, in this case)
  8. Deploy CoreDNS
  9. Set up metrics server
  10. Deploy nginx Ingress Controller

Note that the installation happens by means of containers (the distribution of certificates in step 2, for example, is performed by a container called cert-deployer), which requires their images to be downloaded. If the installer is run for the first time, these downloads will take some time.

Upon completion, the installer will print the following line:

INFO[0020] Finished building Kubernetes cluster successfully

This means we can now take a look at our freshly installed, RKE-based Kubernetes cluster – hooray!

Reaping The Fruits: RKE Up And Running

Quite conveniently, the RKE installer downloads the config file necessary for kubectl to connect to the cluster to the directory it was invoked in on your local machine after a successful install – if you check the directory, you’ll find a new file there called kube_config_cluster.yml, which you can either simply use as your new ~/.kube/config file or configure as the config file to be used by kubectl by pointing the KUBECONFIG environment variable to it.

Checking Nodes And Pods

With kubectl configured, let’s check for available nodes:

# As always...
$ alias k=kubectl
$ k get node
NAME     STATUS   ROLES                      AGE     VERSION
kube-1   Ready    controlplane,etcd,worker   11d     v1.22.6
kube-2   Ready    worker                     11d     v1.22.6
kube-3   Ready    worker                     5d20h   v1.22.6

Your output will likely look a bit different – in my case there are three nodes, one of which is a control plane node. After having confirmed all expected nodes have joined the cluster and achieved readiness, we can go ahead and verify all Pods are up and running, too (output shortened):

$ k get po -A
NAMESPACE       NAME                                         READY   STATUS      RESTARTS         AGE
ingress-nginx   nginx-ingress-controller-2dbw4               1/1     Running     4 (3d20h ago)    11d
[...]
kube-system     coredns-8578b6dbdd-8wdhq                     1/1     Running     4 (3d20h ago)    11d
[...]
kube-system     coredns-autoscaler-f7b68ccb7-tchrn           1/1     Running     4 (3d20h ago)    11d
kube-system     kube-flannel-bl8w4                           2/2     Running     8 (3d20h ago)    11d
[...]
kube-system     metrics-server-6bc7854fb5-267pj              1/1     Running     5 (3d20h ago)    11d
[...]

In my case, as you can see, the Pods had a couple of restarts, which is because I shut down the cluster whenever it’s not in use. In your setup, on a fresh install, those Pods should not have any restarts, and after a short while, they should all become ready.

Deploying A Sample Workload

Let’s finally deploy a sample workload to the freshly installed cluster. In the context of a previous blog post, I’ve created a small manifest file containing a single Service plus a Deployment backing it, and it’s a perfect candidate to create some load on the worker nodes. You can apply it using the following command:

$ k apply -f https://raw.githubusercontent.com/AntsInMyEy3sJohnson/blog-examples/master/kubernetes/workload-reachability/simple-deployment-with-service.yaml

This will create a bunch of Pods in the workload-reachability-example namespace:

$ k -n workload-reachability-example get po
NAME                         READY   STATUS    RESTARTS   AGE
hello-app-7d84f56664-fhpdb   1/1     Running   0          68s
hello-app-7d84f56664-rbj2b   1/1     Running   0          68s
hello-app-7d84f56664-wlczm   1/1     Running   0          68s

But, of course, three Pods of a little sample workload are by no means a match for your new cluster! So let’s give it something more to chew on. Maybe 30 replicas will do?

$ k -n workload-reachability-example scale deployment hello-app --replicas=30
deployment.apps/hello-app scaled

We can now see the Pods getting spawned and scheduled to different worker nodes:

$ watch kubectl -n workload-reachability-example get pod -o wide
Every 2.0s: kubectl -n workload-reachability-example get pod -o wide
NAME                         READY   STATUS    RESTARTS   AGE     IP           NODE     NOMINATED NODE   READINESS GATES
hello-app-7d84f56664-4fxkk   1/1     Running   0          4m26s   10.42.0.73   kube-1   <none>           <none>
hello-app-7d84f56664-4pl6l   1/1     Running   0          4m26s   10.42.1.85   kube-2   <none>           <none>
hello-app-7d84f56664-4r4fr   1/1     Running   0          8m47s   10.42.2.10   kube-3   <none>           <none>
hello-app-7d84f56664-55cdq   1/1     Running   0          4m26s   10.42.1.82   kube-2   <none>           <none>
[...]

The PodSpec behind those Pods, defined in the Deployment object given in the manifests file, does not define resource requests, but only limits. In such cases, Kubernetes will automatically assign requests equal to the given limits. Thus, even without requests, you could calculate the number of replicas created from the given PodSpec that will fit in your cluster by using the limits. In my case, with 10 CPU cores available and CPU being the limiting factor in my cluster with the given resource settings, the number of Pod replicas it will run is 50 minus a couple of replicas due to the CPU requests of other Pods running in the cluster (such as the Nginx Ingress Controller Pods).

Cleaning Up

You can delete the resources created in the previous section by running the following command:

$ k delete -f https://raw.githubusercontent.com/AntsInMyEy3sJohnson/blog-examples/master/kubernetes/workload-reachability/simple-deployment-with-service.yaml

Summary

The preceding sections gave you an overview of the RKE installation steps, as well as all configuration work necessary to prepare the cluster’s machines.

The configuration work mainly involved the installation of Docker along with the creation of a dedicated user, the configuration of the firewall and SELinux on all machines, and three minor steps concerning a system setting and the configuration of sshd. In addition to that, we’ve looked at two more steps that may be necessary for users of Fedora Server with regards to resizing the root file system partition and – for all WiFi-only devices such as very slim notebooks – connecting to a WiFi endpoint.

The RKE installer works in a mode comparable to Ansible – as a kind of “orchestrator” operating from a machine outside the soon-to-be cluster (such as your local workstation), it collects the desired target state of the cluster to be installed in scope of a very handy questionnaire, and then connects to all remote machines to perform the necessary installation steps in order to create, if possible, that target state. Since RKE is based purely on Docker, the installation might take a while when run for the first time because the installer will have to download a whole range of images. After a successful install, the RKE installer will download the config file necessary for kubectl to interact with the new cluster.