Lab 6 - Kubernetes Storage

Welcome to the lab 6. In this session, the following topics are covered:

Kubernetes storage basics;
Persistent Volumes;
Setting up persistent databases in Kubernetes
Setting up and configuring Longhorn persistent storage

Kubernetes storage basics¶

Kubernetes only includes temporary storage options through volumes by default. If we want to use more permanent storage, we need to configure Persistent storage classes. Often these require deploying third party provisioners.

In this lab, we will take a look at some of the Storage building blocks of Kubernetes:

Kubernetes storage objects (Source: link )

Generic Kubernetes Volumes¶

Our History data server deployment uses a MinIO Pod as a database, which currently uses the default emptyDir ephemeral type of volume for data storage.

Let's investigate what happens if MinIO Pods get replaced.

Complete

Make sure some data is stored in the History data server and send a GET request to fetch the data (use the history-server service IP).

Now, delete the MinIO Pod. As we have configured it as a StatefulSet, Kubernetes will notice that the number of Pods is smaller than required (1) and will recreate the Pod automatically.

Verify

Send a GET request to the history data server again, and check what happened to the previously stored data.

As we can see, the Minio database is not currently resilient to Pod failures.

We must set up persistent data storage for MinIO.

Info

In the fourth lab, we set up MinIO as a StatefulSet and defined the data Volume to be of type emptyDir:

    volumeMounts:
    - name: minio-storage
      mountPath: "/storage"
  volumes:
    - name: minio-storage
      emptyDir: {}

Inside the container, the MinIo data is stored in path /storage. This path should not be changed, but we can modify what type of Volume is used.

Let's now create a Persistent Volume for our MinIO Pods.

Persistent Volumes¶

Before we start creating persistent volumes, we need to define which Storage Classes are available in our Kubernetes cluster.

Complete

Create a new Storage Class template with "local-storage" name and the following content:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Some explanations to used values:

provisioner: kubernetes.io/no-provisioner means that we do not use an automated (or dynamic) provisioner. And we will have to create the Persistent Volumes manually.
volumeBindingMode: WaitForFirstConsumer means that PV is not initiated until a Pod claiming it is provisioned.
reclaimPolicy: Retain means that PV content should be kept between. This setting has no real effect for us as we are not using a provisioner.

Apply the template to Kubernetes.

Verify

You can check that the storage class was created and is now available with the following command:

kubectl get storageclass

Complete

Create a "/mnt/data" folder in all Kubernetes nodes. We will store persistent volumes there.

Let's now create a new Persistent Volume minio-pv-0. Add the following content to a new pv-minio.yaml file:

apiVersion: v1
kind: PersistentVolume
metadata:
  name:  minio-pv-0
  labels:
    app: minio
    ver: minio-pv-v1-0
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/data/minio-pv-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - NODE-HOSTNAME

NODE-HOSTNAME must be replaced with actual full hostname (e.g., pelle-worker-a.cloud.ut.ee) Apply the pv-minio.yaml template using kubectl.

Also, create a folder for the new volume at /mnt/data/minio-pv-0 on the same node.

Verify

Check the list of Persistent Volumes:

kubectl get pv

You should now see that a new Persistent Volumes is available to be used.

Complete

Let's now create a new Persistent Volume Claim (PVC) minio-pvc-0 and reconfigure our MinIO database stateful set to use it.

Add the following content to a new pvc-minio.yaml file:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pvc-0
spec:
  storageClassName: local-storage
  selector:
    matchLabels:  #Select a volume with this labels
      app: minio
      ver: minio-pv-v1-0
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Apply the pvc-minio.yaml template using kubectl.

Modify the MinIO stateful set we created in lab 4 (statefulset-minio.yaml)

Replace the line:

  volumes:
    - name: minio-storage
      emptyDir: {}

with:

  volumes:
    - name: minio-storage
      persistentVolumeClaim:
        claimName: minio-pvc-0

And apply the modified MinIO stateful set template (remember to use the production namespace!).

Verify

Check that the claim was created:

kubectl get pvc -n production

Also check that the content of the configured folder on the correct cluster node (which you specified under Node Affinity)

ls /mnt/data/minio-pv-0 -al

Once the MinIO is reconfigured and history server has created the MinIO bucket, you should see that a new folder named price-data has been created there.

Info

If you run into issues with creating or testing PV and PVC's, here are some debugging tips:

Use kubectl describe on pv pvc, and pods to get more information
Use kubectl logsminio-0` to check whether MinIO container starts properly.

Complete

Make sure some data is stored in the History data server and send a GET request to fetch the data (use the history-server service IP).

Delete the minio-0 Pod and check that the database stays intact afterwards.

Verify

Send a GET request to the history data server again, and check what happened to the previously stored data.

If the price data is still there afterwards, this means that the Persistent Volume was created and is being used properly.

Info

As you can see, there were quite a few manual steps that had to be performed, like manual folder preparation and permission management.

Such steps can be automated with third party local-storage provisioners, which take care of dynamically preparing folders and their permissions, but there are also other disadvantages to using local volumes.

For instance, Pods can only be deployed where required pv's are located which can lead to unbalanced node load.

Lets next take a look at a more powerful Storage Class: Longhorn

Preparing the nodes for Longhorn¶

In this task, we will prepare the nodes for the installation of the Longhorn storage controller which will orchestrate the provisioning on reliable and replicated storage volumes in our Kubernetes cluster.

Complete

You'll be completing this section from Longhorn documentation

Usually, installing Longhorn requires you to prepare the machine with additional packages and kernel modules, but thankfully Longhorn developers have published a tool that does this for you.

Switch to root user and export the KUBECONFIG variable:

export KUBECONFIG=/etc/kubernetes/admin.conf

Download the Longhorn command line client:

curl -sSfL -o longhornctl https://github.com/longhorn/cli/releases/download/v1.7.1/longhornctl-linux-amd64

And set execution permissions: chmod +x longhornctl

After that, run the following command on the Controller node to install everything necessary on all the Kubernetes nodes:

./longhornctl install preflight

This tool will run for a while. If you are interested in what is happening, you can log into the Controller node in a different terminal. You should notice that this command has deployed Pods on all the Kubernetes nodes with elevated permissions that installs the required libraries, kernel modules and configures the nodes as needed for running Longhorn.

Verify

Usually verifying things like this is not easy, when Kubernetes is involved, but the same longhornctl command also contains a step for verifying that everything was set up properly.

Run the checking command to verify everything has been installed properly:

./longhornctl check preflight

This tool will run for a while, and let you know if there's any issues.

Danger

Never run scripts in this fashion if you don't understand them. This pattern is sadly very popular, but from a security standpoint, you should not download a file and then instantly execute it.

Installing Longhorn¶

In this task we will install Longhorn inside our cluster as a set of Kubernetes entities.

Complete

Do this part only if the verification of the previous step did not bring up any issues. Perform these tasks on the main server only.

There are several ways to install Longhorn, but we'll use the simplest and quickest one, which uses kubectl.

Download the Longhorn version 1.7.1 manifest:

wget https://raw.githubusercontent.com/longhorn/longhorn/v1.7.1/deploy/longhorn.yaml

Modify the longhorn.yaml file and change the numberOfReplicas parameter of the Longhorn StorageClass from 3 to 2. Otherwise Longhorn volumes will use up too much storage space of our cluster. It should look something like:

parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "30"

Apply the modified manifest to install Longhorn in your Kubernetes cluster:

kubectl apply -f longhorn.yaml

Verify

You can verify the installation by checking the pods in the longhorn-system namespace. They should all change into running state in a minute or two.

You can also check the kubectl get storageclass command to see, if a default storageclass got created. The storageclass defines which storage system you use for your deployments, for when you have multiple in the cluster. The default one is used when a storageclass does not get specified. It should show that the Longhorn sotrage class number of replicas is set to 2.

First Longhorn workload¶

Let's now create a Pod that uses Longhorn volumes.

Complete

Create a namespace called storage, and run the following manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-pvc
  namespace: storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: writer-pod
  namespace: storage
spec:
  containers:
  - name: busybox
    image: busybox
    command: ["/bin/sh", "-c", "echo 'Hello, Persistent Storage!' >> /data/hello.txt; sleep 3600"]
    volumeMounts:
    - name: longhorn-volume
      mountPath: /data
  volumes:
  - name: longhorn-volume
    persistentVolumeClaim:
      claimName: longhorn-pvc

Verify

You'll be able to see from kubectl -n storage get persistentvolumeclaim (short kubectl get pvc) and kubectl -n storage get persistentvolume (short kubectl get pv) that a persistent volume is created for your pod.

When you exec into your pod, you'll see that a file has been made by the container /data/hello.txt. Every time the pod has run, there will be a new line added to the file.

You can delete and recreate the pod until it goes to another node, and you'll still be able to see the file and its content there. You can also attach the PVC to another container, and you'll still see the file there.

NB! The pod is not automatically recreated as we did not define a Deployment or a StatefulSet for it.

Accessing the Longhorn user interface¶

In this task, we will check out the Longhorn user interface, which can be used to manage created volumes, their replicas and change the amount of replicas for a volume or manually clean up unnecessary volumes after they are no longer needed.

We will not open LongHorn user interface port to the outside world because it does not have user authentication. Instead we will set up a Kubernetes port forwarder, that will temporarily route traffic between our local computer and a Kubernetes service.

Complete

Download the kubernetes administrator config file (located at KUBECONFIG=/etc/kubernetes/admin.conf) into your laptop or PC. You can use the scp command for downloading files from the Virtual Machine. As the file is initially only accessible for the root user, you may need to first copy the file into centos user folder and change its permission before it can be moved by scp command.

Install the kubectl tool inside your laptop or PC.

Follow the kubectl documentation to make the configuration file that you downloaded accessible for the kubectl tool.

Verify

Use the typical kubectl commands to check that the command is working properly. For example, listing all the pods.

Once kubectl is configured and working, you can continue using it directly from your computer and you no longer need to log into the Virtual Machine to use Kubernetes commands.

Complete

Look up what is the name of the longhorn service for the user interface in the longhorn-system namespace.

Set up the port forwarding between a local port in your computer and the lonhorn user interface service in the Kubernetes cluster:

kubectl --namespace longhorn-system port-forward --address 127.0.0.1 service/<name_of_the_service> 5080:80

Verify

Access the configured port from a browser in your computer: http://localhost:5080/#/dashboard

In future, you can similarily access Kubernetes services from your computer without having to explicitly make them available to everyone.

Migrating the use case database to use Longhorn volumes¶

The last task will be to migrate the MinIO database from local persistent volumes to Longhorn volumes. We will not create Persistent volume claims manually, and will instead specify a volume claim template in the stateful set manifest. Kubernetes controllers will automatically create a volume claim for every new Pod that is created for the StatefulSet.

Complete

Update the MinIO StatefulSet, remove the volumes: block and instead add a new volumeClaimTemplates: under the StatefulSet spec: block (NB! Not under container block!). You will find an example of how to define volume claim templates in LongHorn GitHub.

Name the VolumeClaimTemplate minio-data. Do not use selector: block. Set the storage space to 1Gi and use longhorn as storage class.

Apply the new StatefulSet manifest. You may need to delete the previous statefulSet.

Verify

Check that the MinIO Pod stays in the Running status.

Also, make sure the history data server works properly after the change.

NB! If would also be good to delete the history data server Pod to verify that the data is still there after the Pod is recreated.