Skip to content

Lab 13 - Kubernetes Horizontal Pod Autoscaler

Overview

In this lab session we will check how to enable HPA and configure it for our Ghostfolio deployment. This week's tasks are:

  • Increase the size of the cluster
    • Add a new worker node
  • Setup HPA
    • Install metrics server
    • Try out a simple HPA use case
    • Configure HPA for Ghostfolio

Horizontal Pod Automatic Scaler

Horizontal Pod Automatic Scaler (HPA) was covered in Lecture 11. HPA is a Kubernetes controller that can be configured to track the performance of Workflows like Deployments and StatefulSets (by defining a HPA resource for them) and scales the number of their replicas based on chosen performance metric targets and scaling behaviour policies.

HPA architecture (Source: link )

HPA requires that Pod metrics are made available through metrics.k8s.io, custom.metrics.k8s.io, or external.metrics.k8s.io API. We will deploy a Metrics server to achieve this.

Learn more about HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Increasing the size of the cluster

As we have deployed new workflows and services in every previous lab, our cluster resources are reaching the limits. The first task is to add another VM and configure it as a second Kubernetes worker.

Complete

Follow the steps in Lab 3 to add a new worker node.

Use m1.r6c4 for the VM type and set the System Volume size to be 50 GB.

Make sure to install the same Kubernetes version you used in lab 11 (1.28.4) when upgrading Kubernetes. Also make sure to assign label custom-role=worker to the new worker node. We did this in lab 8. NB! Also make sure to select system volume type to be scratch (IOPS intensive SSD). Otherwise may run into difficult to debug disk errors.

Also, complete the Preparing the nodes for Longhorn part of Lab 6 to make sure the node becomes available for Longhorn storage.

Verify

Check that the new node shows up in the list of nodes.

kubectl describe nodes

Setting up Metrics server

HPA needs access to Pod metrics to be able to decide when to scale the number of Pod replicas. The Metrics server aggregates resource usage values exposed by the Kubelet and exposes them in the metrics.k8s.io API to be used for scaling. By default the metrics server will track CPU and memory usage of Pods. For additional metrics, custom Metrics API implementations must be used.

Resource metrics pipeline (Source: link )

Complete

Download the metrics server deployment manifest:

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Before we deploy it, we need to change some parameters inside the manifest.

By default, the Metrics server is configured to connect to node kublet through verified TLS.

However, as we generated our server certificate ourselves during installation, they are not verifiable through globally trusted root certificates. We will enable insecure tls for the metrics server to get pass this, but in production clusters, a verifiable certificates should be used or the used certificate authority should be added to the listed of trusted root certificates in the cluster.

Also, we need to modify how the Metrics server generates the connection urls, and enable only Hostname based address types, as other options will likely result in failed requests in our setup.

Modify the downloaded components.yaml file and change the container args in the Deployment kind template.

Add --kubelet-insecure-tls to the container args.

Modify --kubelet-preferred-address-types to only include Hostname option.

Apply the components.yaml manifest with kubectl.

Complete

The metrics server will be deployed inside the kube-system namespace.

Verify that the mertics server Pod starts up properly.

Trying out HPA example

Now that we have a running metrics server, we will try out a simple scenario of automatically scaling a Kubernetes deployment using HPA.

We are mainly following the Kubernetes documentation example for this task: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

Complete

Create a new namespace hpa-test.

Deploy the following manifest in the hpa-test namespace:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 200m
          requests:
            cpu: 100m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

This will deploy a simple PHP web app in a Apache web server container. One thing to note is that it has no ReplicaSet configured.

Verify

Check that the pods are running. Also check that the web server responds properly:

curl http://php-apache.hpa-test.svc

PS! This service url is only available from inside other containers. Use a direct service IP, port forwarder or some other container to access it. For example, using a temporary container:

kubectl run -i --tty --rm debug --image=curlimages/curl --restart=Never -- curl http://php-apache.hpa-test.svc

NB! As was demonstrated in the Security lecture, you should be careful about trusting containers from global registries

Complete

Next, we will create a HPA configuration for enabling automatic scaling of this deployment.

Create the folowing manifest for the HPA and apply it.

This will configure the HPA controller to track the performance of the php-apache deployment.

We configure the Target metric to be CPU utilization and configure the automatic scaling target to be avaerage 50% CPU utilization.

The HPA controller will try to keep the average CPU utilization below 50%, while adding Pod replicas when it increases above this target and removing replicas when the average CPU utilization drops below it.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: hpa-test
spec:
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - resource:
      name: cpu
      target:
       averageUtilization: 50
        type: Utilization
    type: Resource
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache

Verify

Lets now verify that the HPA works as required by generating synthetic traffic for the deployed application to initiate scaling operations.

Lets sart watching the current state of the HPA configuration. Run the following command to keep checking the state changes of the created HPA:

kubectl get hpa php-apache -n hpa-test --watch

Open a different terminal and start generating traffic:

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache.hpa-test.svc; done"

The first terminal window should start showing a changed state where the CPU usage raises above 50% and you should notice that HPA starts scaling the number of replicas after a while.

Danger

Do not leave the synthetic traffic generator running for long periods. verify that the number of replicas are scaled back to 1 after you stop the traffic generation (This can take a few minutes).

Configuring HPA for Ghostfolio

The final task of the lab is to configure HPA for the Ghostfolio deployment.

Complete

Create a new HPA for the Ghostfolio deployment we have earlier set up in the default namespace. NB! Not for the Ghostfolio Helm deployment in the lab9 namespace!

Set the number of minimum replicas to 1, maximum replicas to 10, and average CPU utilization to 60%.

If you notice that Ghostfolio scaling is too agressive, and want to configure the behaviour of scaling, check the Configuring scaling behavior guide in the Kubernetes HPA documentation.

One reason for default configuration not working perfectly is that it takes some time before Ghostfolio is verified to be in ready state (we configured readyness probes).

You can change the default values of the HPA Policy, including how long the scaling trigger event (CPU higher than 60% in our case) must last before the HPA controller takes action, what is the duration that is used for computing the average, how many Pods can be added or removed at once, how often scaling descisions can be initiated, and several other things.

Verify

Generate some traffic to the Ghostfolio deployment and verify that the number of Pods is scaled up and down as you create and stop generated traffic.

Url for testing access to ghostfolio service from inside another container: http://ghostfolio.default.svc:3333 (You can use this url while generating traffic). Or you can use the Ghostfolio nodeport service.

Complete

Cleanup the hpa-test namespace, remove the deployment, service and php-apache HPA configuration.