In this post I go over how I set up my Kubernetes cluster across four Raspberry Pi 4 nodes using k3s, configured persistent storage using NFS, and then installed a simple “todo” API into the cluster using Helm.

Pi Cluster

Kubernetes on Raspberry Pi with k3s

The installation and configuration of a Raspberry Pi cluster running Kubernetes has been blogged about many times, and has become easier and simpler to do with newer tools. Much respect and recognition must go to Alex Ellis, who has been creating great content on this topic for a few years now. It is impossible to build a Pi Cluster and not have at least one browser tab open on an Alex Ellis blog post. Alex is responsible for the OpenFaaS project as well as great tools for working with Kubernetes and k3s: k3sup, inlets, arkade to name a few.

While it is possible to set-up a Kubernetes Raspberry Pi cluster with the kubeadm tool included as part of the official Kubernetes distribution, there is an alternative: k3s.

k3s is a light-weight, certified Kubernetes distribution by Rancher Labs targeting Internet of Things (IoT) and edge computing. It provides all the core features and functionality that is required by Kubernetes, but has stripped out alpha features and certain storage drivers and swapped out the cluster etcd key-value store for a SQLite database. This results in the entire distribution being available as single 40 MB download (although some container images will be pulled during set-up), requiring just 512 MB of RAM to run.

Installing k3s

For the next steps I needed to know the IP addresses of each Pi in my cluster. Previously I used my router to set static IP addresses for each Pi:

192.168.1.10 - whitepi
192.168.1.20 - greenpi
192.168.1.30 - redpi
192.168.1.40 - bluepi

I will be using whitepi as the k3s server node. If you are following these steps remember that you will also need to make sure that your SSH key has been copied to each of the nodes. I did this as part of my previous post.

For many of the following sections I use the kubectl tool to interact with the cluster once it has been set up.

Install k3sup

I’m using Alex Ellis’ brilliant k3sup tool to set-up my k3s on my Pi cluster. I downloaded and installed the k3sup tool on to my laptop with:

$ curl -sLS https://get.k3sup.dev | sh `sudo install k3sup /usr/local/bin/`

To check if it installed correctly you can run:

$ k3sup version
 _    _____                 
| | _|___ / ___ _   _ _ __  
| |/ / |_ \/ __| | | | '_ \ 
|   < ___) \__ \ |_| | |_) |
|_|\_\____/|___/\__,_| .__/ 
                     |_|    
Version: 0.9.2
Git Commit: 5a636dba10e1f8e6bb4bb5982c6e04fc21c34534

Set-up the server node

To configure whitepi (192.168.1.10) as the k3s server node using k3sup, I ran:

export SERVER_IP=192.168.1.10
k3sup install --ip $SERVER_IP --user pi --no-extras

Note: I specified the --no-extras flag because by default k3s will install it’s own LoadBalancer, and I will by using the MetalLB load balancer.

At the end of this operation k3sup prints out some useful information to test access to the cluster:

# Test your cluster with:
export KUBECONFIG=/home/dinofizz/kubeconfig
kubectl get node -o wide

Following these instructions shows me that my master node is running as expected (I omitted the wide option so it would fit better in this post):

$ export KUBECONFIG=/home/dinofizz/kubeconfig
$ kubectl get node
NAME      STATUS   ROLES    AGE     VERSION
whitepi   Ready    master   4m12s   v1.17.2+k3s1

Join the agent nodes

Adding nodes to the cluster requires another command…

k3sup join --ip $AGENT_IP --server-ip $SERVER_IP --user pi

…where AGENT_IP is the IP address of each agent to be added to the cluster. A simple bash script to join all my nodes (which took me more time to figure out than to individually type the same command 3 times) can be found below.

#!/bin/bash

SERVER_IP=192.168.1.10
declare -a agents=("192.168.1.20" "192.168.1.30" "192.168.1.40")

for AGENT_IP in ${agents[@]}; do
    k3sup join --ip $AGENT_IP --server-ip $SERVER_IP --user pi
done

Once the above command completed I can run the following command to confirm that all the nodes are present:

$ kubectl 
NAME      STATUS   ROLES    AGE   VERSION
whitepi   Ready    master   18m   v1.17.2+k3s1
greenpi   Ready    <none>   12m   v1.17.2+k3s1
redpi     Ready    <none>   12m   v1.17.2+k3s1
bluepi    Ready    <none>   12m   v1.17.2+k3s1

Using my Grafana dashboard I noticed that with k3s running my master node is experiencing about 10% higher CPU usage, and the agent nodes around 1% increase.

CPU usage with k3s running

The spikes in the graph occurred when running the k3s commands.

Storage

The example “ToDo” application described below requires persistent storage for the database. In a production environment this would be accomplished by using a driver designed and built for distributed storage. In my cluster I am going to keep it simple, by using a portable SSD connected via USB3 and exposed as a Network File System (NFS) share hosted on a single node - the master node. Yes, of course this presents a single point of failure - I lose access to storage if I lose connectivity to master node (or if the master node fails/hangs/crashes). I’m also not sure how this will hold up once I start stressing the system with concurrent reads and writes. For now though, I’m happy to accept this risk for this phase of my learning :)

I wanted something a step up from just a USB flash drive, and so opted for a Western Digital Green M.2 240GB SSD mounted in an Eluteng M.2 external USB3 case.

Eluteng USB3 enclosure with Western Digital M.2 SSD

Instead of listing out all the steps I did, I will provide a link to a post by Grégoire Jeanmart. The relevant section is titled “Configure the SSD disk share”.

I followed his steps exactly, but did make a change to the fstab entries by adding a nofail after the rw option. This prevents the Pi from failing to boot if it does not detect the NFS share. This is what my fstab entry looks like on all my agent nodes (note that 192.168.1.10 is the IP address of my master node which hosts the USB SSD):

proc                    /proc      proc    defaults          0       0
PARTUUID=738a4d67-01    /boot      vfat    defaults          0       2
PARTUUID=738a4d67-02    /          ext4    defaults,noatime  0       1
192.168.1.10:/mnt/ssd   /mnt/ssd   nfs     rw,nofail         0       0

I will admit here that I tried to be very clever and attempted to provision the NFS share into the worker nodes with an Ansible playbook. I unfortunately did not append to the fstab entry and so overwrote it with just the NFS share line. Strangely (I know little about Linux file systems) the Pi still booted up and allowed me to SSH in, but in a “read-only” mode. I popped out the SD card, put it in my laptop and manually edited the /etc/fstab to include the previous partitions by comparing with the entries from another working Pi and the blkid output.

MetalLB

k3s does come with a basic load balancer which uses iptables rules to forward traffic to pods, but I wanted try MetalLB, a “bare-metal load balancer for Kubernetes”. I am using it in the layer 2 configuration, where MetalLB responds to ARP requests with the host MAC address, providing an entry point for traffic into the cluster via an IP address for each service.

Installing MetalLB into the cluster

To install MetalLB into my cluster I copy-pasted the manifest installation instructions from the MetalLB website: https://metallb.universe.tf/installation/#installation-by-manifest

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

To check that things were installed correctly you can run the command kubectl get pods --all-namespaces:

$ k get pods --all-namespaces
NAMESPACE        NAME                                      READY   STATUS    RESTARTS   AGE
metallb-system   speaker-cshbh                             1/1     Running   2          28h
metallb-system   speaker-dx8qz                             1/1     Running   2          28h
metallb-system   speaker-lqxmk                             1/1     Running   2          28h
kube-system      metrics-server-6d684c7b5-bcrg6            1/1     Running   2          28h
kube-system      local-path-provisioner-58fb86bdfd-pn49b   1/1     Running   2          28h
metallb-system   controller-5c9894b5cd-5xccd               1/1     Running   2          28h
metallb-system   speaker-n7s97                             1/1     Running   2          28h
kube-system      coredns-d798c9dd-xb6ld                    1/1     Running   2          28h

A few new pods were deployed into a new metallb-system namespace: a single “controller” pod deployed as a ReplicaSet, and a “speaker” pod deployed to each node as part of a DaemonSet.

Configure MetalLB for Layer 2 routing

LoadBalancer services won’t work correctly until MetalLB has been configured. I chose the simplest option of the Layer 2 configuration, allowing the load balancer to assign IP addresses from my LAN address space to services in the cluster.

As my current LAN is using addresses in the 192.168.1.x space (aka 192.168.1.0/24), I chose to make the IP address range 192.168.1.240-250 available to MetalLB (I used the exact ConfigMap manifest as from the MetalLB configuration docs).

This can be applied to the cluster in various ways, but I find it easiest to create it from stdin using cat <<EOF | kubectl apply -f - described here, and also used in this YouTube video:

ToDo API

I needed an application to run in my cluster, so I put together a simple ToDo API application written in Go. I’m no expert in Go, in fact this would be my second project in the language (the first being my diskplayer project). I plan on using this API to test various changes to the infrastructure, including changing out my data storage technology and playing with different Kubernetes deployment types and objects: ReplicaSets, StatefulSets, PersistentVolumes, Jobs, etc. If it no longer makes sense to use this simple application, I’ll find something else to install into the cluster to serve my needs.

The ToDo API application exposes a RESTful API, with basic create, retrieve, update and delete functionality. Currently a ToDo resource contains a description and a “completed” boolean flag. There is no authentication. The application currently supports SQLite, MySQL and Mongo databases.

You can find the code for this ToDo API application here: https://github.com/dinofizz/todo-api-go

Installing with Helm

To manage the resources in my k3s cluster I could use kubectl, and run a bunch of commands sequentially (or as part of a script) to install the todo-go application and database. There is however a better way - Helm.

Helm is “the a package manager for Kubernetes”. With Helm you can describe the expected configuration of your application across a group of related Kubernetes resource definitions (yaml files) called a “chart”. Helm can then manage the installation, upgrade and uninstallation of a chart. One great feature using charts for managing Kubernetes resources is that it includes a templating language, and so substitutions for resource names, environment variables, etc. can be done for each new release. Another benefit of using Helm is that the templated values that are inserted into the resources can be arranged in a hierarchy, allowing for different values to be used for different environments, i.e. dev/staging/production.

Helm does not need to be installed into the cluster itself, it is installed on my laptop that I am using to manage the cluster “remotely”. Helm’s website offers various options for installation. I downloaded the latest release an moved it into my /usr/local/bin:

curl -LO https://get.helm.sh/helm-v3.1.3-linux-amd64.tar.gz
tar -xf helm-v3.1.3-linux-amd64.tar.gz
chmod +x helm
sudo mv helm /usr/local/bin/helm

Chart Overview

The helm chart is included in the source repository for the ToDo API application under the todo folder: https://github.com/dinofizz/todo-api-go

The following Helm chart files define the components of my ToDo application resources:

todo/values.yaml

This file contains the properties which are included in the various template files. It specifies the Docker repository and version of the ToDo app, as well as some application configuration values such as the hostname and port and the database to be used.

This default values.yml is applied automatically, but it is possible to override specific values by appending one or more values files to the helm install command. For example, by default the chart will start the ToDo application with a SQLite database accessed via a volume mount to the NFS share. By modifying the helm install command with an additional -f values-mysql.yaml argument, the database connection string will be pulled from this latter values file, overriding the SQLite connection string in the original values.yml. This change will be demonstrated in a later blog post.

todo/templates/deployment.yaml

This file specifies the container image that should be deployed to a pod, as well as how many running instances (pods) should be expected. In addition to some Kubernetes-specific properties, the container image specification has fields similar to the arguments of a docker run command: it details the image repository:version, exposes specific ports, sets the environment variables and lists the volume mounts that should be applied.

Some of the values for the deployment properties are defined in the values.yaml file - typically the values that may change as I play with the environment.

The number of desired running instances for this deployment is given in the replicas property towards the top of the file, whose value is defined in values.yaml.

todo/templates/service.yaml

This file specifies how the ToDo API is accessible from outside the cluster. I have specified a service of type “LoadBalancer”, which means that traffic to the ToDo app will be routed via the MetalLB load balancer (see below). MetalLB will assign an IP address from a pre-defined pool to any service that depends on it, making it really easy for devices on the network to communicate with the services.

Helm Install

I first created a new namespace within my Kubernetes cluster with:

$ kubectl create namespace todo

I then installed the ToDo chart into the cluster by running the following command from my project folder:

$ helm install --namespace todo todo-api todo/

The chart is given the name “todo-api”, and a single pod (ReplicaSet with 1 replica) is installed:

$ kubectl --namespace todo get pods
NAME                        READY   STATUS    RESTARTS   AGE
todo-api-6748857db6-p4lcv   1/1     Running   0          2m31s

Testing the ToDo API

The ToDo API is accessible via an IP address provided by the LoadBalancer:

$ kubectl get service --namespace todo
NAME       TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
todo-api   LoadBalancer   10.43.70.248   192.168.1.240   80:31354/TCP   7m11s

This shows that the service is available on the network at IP address 192.168.1.240 and port 80.

In the examples below I use curl to issue the HTTP requests and jq to to present a nicely formatted response.

Create

curl -s -H "Content-Type: application\json" \
--request POST \
--data '{"description": "Buy milk", "completed": false}' \
http://192.168.1.240/todo | jq

Which produces the output:

{
  "Id": "1",
  "Description": "Buy milk",
  "Completed": false
}

Retrieve

curl -s http://192.168.1.240/todo/1 | jq

Output:

{
  "Id": "1",
  "Description": "Buy milk",
  "Completed": false
}

Update

curl -s -H "Content-Type: application\json" \
--request PUT \
--data '{"description": "Buy milk", "completed": true}' \
http://192.168.1.240/todo/1 | jq

Output:

{
  "Id": "1",
  "Description": "Buy milk",
  "Completed": true
}

Delete

curl -s --request DELETE http://192.168.1.240/todo/1 | jq

Output:

{
  "result": "success"
}

Playing with Kubernetes

Scaling the ReplicaSet

With the helm chart installed using the default values.yaml a ReplicaSet with 1 replica is deployed (a single pod):

$ kubectl get pods -o wide -n todo
NAME                        READY   STATUS    RESTARTS   AGE   IP           NODE      NOMINATED NODE   READINESS GATES
todo-api-6748857db6-p4lcv   1/1     Running   1          47h   10.42.1.14   greenpi   <none>           <none>

With the above command I can see that the todo-api is running on a pod on the greenpi node.

Let’s see what happens when I increase the ReplicaSet to 12 replicas:

$ kubectl scale deployment --replicas 12 todo-api -n todo
deployment.apps/todo-api scaled

Checking the number of running pods I see 12 pods running the ToDo API, distributed across all the nodes:

$ kubectl get pods -n todo -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,Status:.status.phase --sort-by=.spec.nodeName
NAME                        NODE      Status
todo-api-6748857db6-47rww   bluepi    Running
todo-api-6748857db6-5l95g   bluepi    Running
todo-api-6748857db6-t29mm   bluepi    Running
todo-api-6748857db6-z9ctd   greenpi   Running
todo-api-6748857db6-w7w76   greenpi   Running
todo-api-6748857db6-kdq4p   greenpi   Running
todo-api-6748857db6-4fffk   redpi     Running
todo-api-6748857db6-vsvwb   redpi     Running
todo-api-6748857db6-gntck   redpi     Running
todo-api-6748857db6-qvt86   whitepi   Running
todo-api-6748857db6-xmzwk   whitepi   Running
todo-api-6748857db6-j54nd   whitepi   Running

To scale back to a single pod:

$ kubectl scale deployment --replicas 1 todo-api -n todo
deployment.apps/todo-api scaled

And to confirm (I had to wait a few seconds while 11 pods are terminated):

$ kubectl get pods -n todo
NAME                        READY   STATUS    RESTARTS   AGE
todo-api-6748857db6-z9ctd   1/1     Running   0          17m

The Ellis-Hanselman Test

The following presentation by Alex Ellis and Scott Hanselman was a big inspiration to me in building this cluster:

At around 26 minutes 50 seconds into the presentation Alex and Scott begin demonstrating the elastic high availability that is a core feature of Kubernetes. They have a 6 node Pi Cluster running a single instance of an application that blinks a row of LEDs attached to the Pi. The application is running in a pod that belongs to a ReplicaSet which specifies that there should be 1 replica running at all times. Alex then unplugs the network cable from the node running the application. After some time Kubernetes becomes aware that it can no longer communicate with the node running the replica, and spins up a new pod on an available node. When the network cable is plugged back into the first node, Kubernetes terminates the pod to satisfy the requirement for one and only one replica in the ReplicaSet.

Alex mentions in the video that they did have to tweak something to get Kubernetes to accelerate the recognition of a “lost node”. I’m not sure exactly what they tweaked, but I found that there is an “unreachable” toleration which is defaulted to 300 seconds (5 minutes) for pods in the cluster. I noticed that Helm automatically provides for the inclusion of custom defined tolerations in the deployment.yaml added so I the following to my values.yml to drop the timeout to 10 seconds:

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 10

I wanted to perform the same test but I don’t have any any LEDs attached to my Pi. To observe the changes to the cluster I simply watch the pod events as I unplug a network cable from the node running my pod, wait for a new pod to start running, and then plug network cable back in and watch Kubernetes delete the old pod (just like in Alex and Scott’s test).

My attempt

To start with I make sure that there is nothing currently running in my todo namespace:

$ kubectl get pods -n todo
No resources found in todo namespace.

I then run the kubectl get pods command, but add the -w watch flag so that I can follow changes to the pods in real-time, as well as adding the --output-watch-events flag to view pod-related events. The -o wide argument allows me to see which node the pod is running on. I call the terminal window in which this command is running the “watch” window.

$ kubectl get pods -n todo -w -o wide --output-watch-events
# no output, as nothing is running yet

To kick things off I then install the ToDo API into the todo namespace via the helm chart, by running the following command in a different terminal window from the ToDo API project root folder:

$ helm install --namespace todo todo-api todo/
NAME: todo-api
LAST DEPLOYED: Fri May  8 12:57:34 2020
NAMESPACE: todo
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
     NOTE: It may take a few minutes for the LoadBalancer IP to be available.
           You can watch the status of by running 'kubectl get --namespace todo svc -w todo-api'
  export SERVICE_IP=$(kubectl get svc --namespace todo todo-api --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")
  echo http://$SERVICE_IP:80

After running the install command above I see events logged in the “watch” window:

$ kubectl get pods -n todo -w -o wide --output-watch-events
EVENT      NAME                        READY   STATUS              RESTARTS   AGE    IP           NODE      NOMINATED NODE   READINESS GATES
ADDED      todo-api-78b984d7ff-fnnqg   0/1     Pending             0          0s     <none>       <none>    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   0/1     Pending             0          0s     <none>       <none>    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   0/1     Pending             0          4s     <none>       bluepi    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   0/1     ContainerCreating   0          4s     <none>       bluepi    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   1/1     Running             0          8s     10.42.3.42   bluepi    <none>           <none>

From the watch window output we can see that Kubernetes added a pod into the namespace, and through a series of “modifications” scheduled it to run on the bluepi node. The container image was pulled and eventually the pod resolved to a running state.

I then pull out the Ethernet cable from the bluepi node (which is easy to identify as I’ve labelled the nodes according to the colour of the Ethernet cable connected to the router):

...
MODIFIED   todo-api-78b984d7ff-fnnqg   1/1     Running             0          82s    10.42.3.42   bluepi    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   1/1     Terminating         0          97s    10.42.3.42   bluepi    <none>           <none>
ADDED      todo-api-78b984d7ff-r8649   0/1     Pending             0          0s     <none>       <none>    <none>           <none>
MODIFIED   todo-api-78b984d7ff-r8649   0/1     Pending             0          0s     <none>       greenpi   <none>           <none>
MODIFIED   todo-api-78b984d7ff-r8649   0/1     ContainerCreating   0          0s     <none>       greenpi   <none>           <none>
MODIFIED   todo-api-78b984d7ff-r8649   1/1     Running             0          4s     10.42.1.31   greenpi   <none>           <none>

After some time Kubernetes realises that bluepi is unreachable. It is unsure of the status of the pod running on that node, and decides to terminate the pod running on bluepi. To maintain the ReplicaSet count of 1 replica it schedules a new pod to be added to the namespace on a new node - this time on greenpi.

Once the new pod on greenpi is in a running state, I then re-attach the network cable to bluepi:

...
MODIFIED   todo-api-78b984d7ff-fnnqg   1/1     Terminating         0          2m     10.42.3.42   bluepi    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   0/1     Terminating         0          2m1s   10.42.3.42   bluepi    <none>           <none>
MODIFIED   todo-api-78b984d7ff-fnnqg   0/1     Terminating         0          2m2s   10.42.3.42   bluepi    <none>           <none>
DELETED    todo-api-78b984d7ff-fnnqg   0/1     Terminating         0          2m2s   10.42.3.42   bluepi    <none>           <none>

Above we see that Kubernetes was able to resume communication with the node after the network cable was re-attached. It was then able to completely follow through on the earlier “terminate” event. After a series of pod modification events we see that we end up with the original pod on bluepi “DELETED”.

To confirm the final state of the helm release in the cluster:

$ kubectl get pods -n todo
NAME                        READY   STATUS    RESTARTS   AGE   IP           NODE      NOMINATED NODE   READINESS GATES
todo-api-78b984d7ff-v7m6v   1/1     Running   0          3m    10.42.1.33   greenpi   <none>           <none>

And there we see that we have a single pod running on greenpi, just as we expected from a deployment with a ReplicaSet with 1 replica.

Conclusion

I really learnt a great deal about running Kubernetes on a Raspberry Pi. k3s makes things really easy to set-up and helm makes it really easy to deploy applications into the cluster. I’m now really excited to move forward and begin experimenting with scale, storage and other features of Kubernetes.

Dino Fizzotti

Engineer, maker, hacker, thinker, funner.

Raspberry Pi Cluster Part 2: ToDo API running on Kubernetes with k3s

Kubernetes on Raspberry Pi with k3s

Installing k3s

Install k3sup

Set-up the server node

Join the agent nodes

Storage

MetalLB

Installing MetalLB into the cluster

Configure MetalLB for Layer 2 routing

ToDo API

Installing with Helm

Chart Overview

todo/values.yaml

todo/templates/deployment.yaml

todo/templates/service.yaml

Helm Install

Testing the ToDo API

Create

Retrieve

Update

Delete

Playing with Kubernetes

Scaling the ReplicaSet

The Ellis-Hanselman Test

My attempt

Conclusion