Lab 12 - Kubernetes security

Overview¶

Base security methods in Kubernetes
Pod Security Admission
Network policies
Service mesh
Defending
Detection with Tetragon
Enforcement with Kyverno

Base security methods¶

There are two main runtime security methods in Kubernetes that cluster administrators and application developers can use to enforce security rules upon the cluster, or specific namespaces.

Pod Security Admission rules
Network policies

There are also other tools and possibilities, like Pod security contexts, which define privilege and access control settings for Pods or containers, but these are more difficult to configure and enforce, as they're application specific. Later on in the lab, you'll see how to do that as well, but if you need to start off with Kubernetes security, these two methods should be first in line, as they cover the most ground.

Pod Security Admission¶

Pod Security Admission is a technology, that enforces a specific rule-set on the pods inside the namespace, where the Pod Security Admission rules have been set up. The Kubernetes Pod Security defines three standards:

Privileged - Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
Baseline - Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
Restricted - Heavily restricted policy, following current Pod hardening best practices.

All of these standards have their own use cases:

BaselinePrivilegedRestricted

The Baseline policy is aimed at ease of adoption for common containerized workloads while preventing known privilege escalations. This policy is targeted at application operators and developers of non-critical applications.

This standard should work for most applications, and defends against the known privilege escalations. If you don't know which policy the application needs, this one should be a good starting point.

The Privileged policy is purposely-open, and entirely unrestricted. This type of policy is typically aimed at system- and infrastructure-level workloads managed by privileged, trusted users.

Most applications in the cluster do not need access to system level things, like mounting folders from Kubernetes node file system. But some technologies, especially monitoring (node-exporter), storage (longhorn), GPU tools or even networking tools might still need access to them. This is the policy to use in that case. This is the same as not using a policy, but defining it helps to keep the definitions explicit, and will help with other tools you'll see later in the lab.

The Restricted policy is aimed at enforcing current Pod hardening best practices, at the expense of some compatibility. It is targeted at operators and developers of security-critical applications, as well as lower-trust users.

This is the most restrictive policy, that on top of what baseline does, requires the pods to:

Run as non-root users.
Use only very specific volume types
Define a specific seccomp profile.
Containers must drop the ALL capability set, and are only allowed to use NET_BIND_SERVICE.

Adding this policy to a namespace is very straightforward. You need to add these labels to the Namespace manifest, like so:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    # The per-mode level label indicates which policy level to apply for the mode.
    #
    # MODE must be one of `enforce`, `audit`, or `warn`.
    # LEVEL must be one of `privileged`, `baseline`, or `restricted`.
    pod-security.kubernetes.io/<MODE>: <LEVEL>

    # Optional: per-mode version label that can be used to pin the policy to the
    # version that shipped with a given Kubernetes minor version (for example v1.28).
    #
    # MODE must be one of `enforce`, `audit`, or `warn`.
    # VERSION must be a valid Kubernetes minor version, or `latest`.
    pod-security.kubernetes.io/<MODE>-version: <VERSION>
  name: <namespace-name>

You need to substitute the mode with either enforce, audit or warn, depending on what you want to do. warn lets the user now, when deploying something that would go against the defined Pod security level, audit logs it for Kubernetes admins, and enforce rejects the submitted workloads during submission.

Try setting up a new namespace, first without a Pod security standard defined.

Complete

Create a new namespace called pod-security-demo. Do not set pod-security rules yet. First, deploy a manifest there:

apiVersion: v1
kind: Pod
metadata:
  name: high-risk-pod
  namespace: pod-security-demo
spec:
  containers:
  - name: high-risk-container
    image: nginx
    volumeMounts:
    - name: shadow
      mountPath: /mnt/shadow
  volumes:
  - name: shadow
    hostPath:
      path: /etc/shadow

After deployment, you can exec into the container, to run commands, and see, that it has, indeed, mounted the shadow file of the system. This is the file that contains the password hashes of the system, if there are any users with passwords. The same method could be used to, for example, mount the /root/.ssh/authorized_keys file, and you can even edit it through the container!

[root@teacher-test lab12]# kubectl exec -ti pod/high-risk-pod -n pod-security-demo /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@high-risk-pod:/# cat /mnt/shadow
...

This also works with any non-privileged Kubernetes account. If you have permissions to create Pods, Deployments, Jobs, CronJobs, StatefulSets, ReplicaSets or DaemonSets, you can use this method to try to get access to Kubernetes hosts by default. Also, if a container has been mounted a token, which is allowed to deploy Pods in some way, then those also can do the same. You can test this out by creating a new user and using that to try it out.

Now that you established, that by default, Kubernetes doesn't really protect you from malicious workloads, try out the first layer of defense - Pod security admission.

Complete

Edit the Namespace manifest to include the following lines:

metadata:
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: latest

And deploy the namespace. Something you'll see, is that you'll be warned about Pods that violate the rules, but nothing happens to those Pods automatically..

Warning: existing pods in namespace "pod-security-demo" violate the new PodSecurity enforce level "baseline:latest"
Warning: high-risk-pod: hostPath volumes

You can still go and use the existing container for maliciousness. To finally fix this issue, you need to first delete the deployment, and then try to deploy it again.

Verify

When deploying it again, you should be denied with:

Error from server (Forbidden): error when creating "podsecurity.yaml": pods "high-risk-pod" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volume "shadow")

Which means, that now the Pod security is working, and disallows running such Pods in this namespace.

As said before, due to how easy this method is to setup, and how much it protects, you should always start securing your cluster from setting up namespaces properly with Pod security rules. As you saw, this needs to be done the moment a namespace is made to be fully effective. Always make sure to choose the correct security level though, things like kube-system, longhorn or even node-exporter need access to host system, which means you'd have to use privileged level.

Network policies¶

You've used some NetworkPolicy objects in previous labs, but so far it has been fairly random and one-off. Network policies in Kubernetes can be thought of to work like a firewall in other systems. If done properly, a system of NetworkPolicy rules can help largely with preventing initial intrusion, as you can define what should be able to access where.

As malicious example you can try connecting from an existing Kubernetes Pod in any namespace to your Longhorn namespace's longhorn-ui Pod.

[root@teacher-test lab12]# kubectl exec -ti pod/high-risk-pod -n pod-security-demo /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@high-risk-pod:/# curl 10.0.0.58:8000
<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <!--[if lte IE 10]>
      <script
        src="https://as.alipayobjects.com/g/component/??console-polyfill/0.2.2/index.js,media-match/2.0.2/media.match.min.js"></script>
  <![endif]-->
  <style>
    ::-webkit-scrollbar {
      width: 10px; 
      height: 1px;
    }

    ::-webkit-scrollbar-thumb {
      border-radius: 10px;
      -webkit-box-shadow: inset 0 0 5px rgba(0,0,0,0.1);
      background: #535353;
    }
  </style>
<link href="./styles.css?c378196633c22a6f7698" rel="stylesheet"></head>

<body>
  <div id="root"></div>
<script type="text/javascript" src="./runtime~main.6d7bda24.js?c378196633c22a6f7698"></script><script type="text/javascript" src="./styles.f40bcd81.async.js?c378196633c22a6f7698"></script><script type="text/javascript" src="./main.4a045fc1.async.js?c378196633c22a6f7698"></script></body>

</html>

As you can see, it's up to the maintainer of the application to defend their application and namespace. Longhorn, by default, does none of this - there's no authentication or network policies - so anyone in the cluster can access the Longhorn UI, which allows to configure the whole storage layer, including save backups to some remote location or delete the volumes.

A solution to this is to have properly setup network policies in all namespaces. There's two steps to this - first is to have network policies for all of your cluster infrastructure namespaces like kube-system, longhorn, prometheus and any other important ones. The second step is to force all other namespaces to also have network policies.

Pod capabilities¶

Pod Capabilities in Kubernetes refer to specific privileges that can be granted to a pod, allowing it to perform operations that are normally restricted. Derived from Linux capabilities, these are used to provide fine-grained access control to system-level operations, enhancing security by limiting the privileges of pods.

In Kubernetes, you can assign or remove specific capabilities to a container within a pod, allowing you to adhere to the principle of least privilege. For instance, a common practice is to drop the NET_RAW capability, which reduces the risk of attacks using raw socket communication, like certain types of network spoofing. Another example is adding the SYS_TIME capability to a pod that requires permissions to modify the system clock, which is otherwise restricted.

Pod Security rules define a set of allowed capabilities, allowing to adhere to the principle of least privileges.

`sysctl`¶

Sysctls in Kubernetes are a mechanism for adjusting kernel parameters at runtime, important for tuning system behavior to enhance security and performance. They are divided into 'safe' and 'unsafe' categories, where safe sysctls are allowed by default, and unsafe sysctls require explicit enabling due to their potential impact on the overall system stability.

In the context of security, sysctls can be used to fine-tune network settings and manage resource limits, thereby mitigating certain types of attacks and resource exhaustion issues. For example, net.ipv4.tcp_syncookies sysctl is used to enable SYN cookies, helping protect against SYN flood attacks. Another example is kernel.shmmax, which can be adjusted to control the maximum size of shared memory segments, preventing excessive resource usage by a single pod. Kubernetes allows these settings to be configured at the pod level, offering a flexible way to apply security enhancements tailored to specific application needs.

Pod Security rules define a set of allowed sysctl parameters, particularly the unsafe ones, thereby mitigating risks associated with altering kernel behavior.

Service mesh¶

Your Kubernetes is using Cilium as a CNI. Beyond offering networking, Cilium is able to provide service mesh functionality. In this lab we shall investigate how to add transparent encryption to communication channel among pods and how to provide extra visibility into used protocols on OSI L7.

Info

Open Systems Interconnection (OSI) model is a well spread conceptual model of networking. Levels are:

Physical layer
Data link layer
Network layer
Transport layer
Session layer
Presentation layer
Application layer

Examples of L7 include protocols like HTTP, SSH, SMTP, DNS, SNMP and similar.

Enabling mTLS¶

Let's start by opening Hubble UI and selecting lab8 namespace where we have deployed our use case.

use case initial

Communication between use case microservice pods and minIO is not authenticated. Let's try and change it to use mutual TLS (mTLS), where both initiator and received authenticate each other. Mutual TLS aims at providing authentication, confidentiality and integrity to service-to-service communications. Cilium relies on SPIFFE and its implementation SPIRE for establishing and attesting identity of services.

Complete

Enable SPIRE server in Cilium deployment. If you are accessing Hubble over NodePort, make sure it is set as well in helm.

helm upgrade --reuse-values cilium cilium/cilium --version 1.16.1 --set authentication.mutual.spire.enabled=true --set authentication.mutual.spire.install.enabled=true --set hubble.ui.service.type=NodePort -n kube-system

We shall also need Cillium CLI. If you haven't yet installed it, you can do it as follows:

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

Validate that it works:

$ /usr/local/bin/cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet              cilium             Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Containers:            hubble-ui          Running: 1
                       cilium-operator    Running: 1
                       hubble-relay       Running: 1
                       cilium             Running: 2
Cluster Pods:          76/77 managed by Cilium
Helm chart version:    1.14.2
Image versions         cilium             quay.io/cilium/cilium:v1.14.2@sha256:6263f3a3d5d63b267b538298dbeb5ae87da3efacf09a2c620446c873ba807d35: 2
                       hubble-ui          quay.io/cilium/hubble-ui:v0.12.0@sha256:1c876cfa1d5e35bc91e1025c9314f922041592a88b03313c22c1f97a5d2ba88f: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.12.0@sha256:8a79a1aad4fc9c2aa2b3e4379af0af872a89fcec9d99e117188190671c66fc2e: 1
                       cilium-operator    quay.io/cilium/operator-generic:v1.14.2@sha256:52f70250dea22e506959439a7c4ea31b10fe8375db62f5c27ab746e3a2af866d: 1
                       hubble-relay       quay.io/cilium/hubble-relay:v1.14.2@sha256:a89030b31f333e8fb1c10d2473250399a1a537c27d022cd8becc1a65d1bef1d6: 1

Verify

Let's make sure that base infrastructure is working.

Make sure that SPIRE components are deployed:

$ kubectl get all -n cilium-spire
NAME                    READY   STATUS    RESTARTS   AGE
pod/spire-agent-4mrbq   1/1     Running   0          3m33s
pod/spire-agent-9nz9r   1/1     Running   0          3m33s
pod/spire-server-0      2/2     Running   0          3m32s

NAME                   TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/spire-server   ClusterIP   10.96.251.8   <none>        8081/TCP   3m33s

NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/spire-agent   2         2         2       2            2           <none>          3m33s

NAME                            READY   AGE
statefulset.apps/spire-server   1/1     3m33s

Make sure that server is up and running.

$ kubectl exec -n cilium-spire spire-server-0 -c spire-server -- /opt/spire/bin/spire-server healthcheck
Server is healthy.

Make sure that use case pods have received a Cilium identity

$ kubectl get cep -l microservice=application-server -n lab8 -o wide
NAME                                  SECURITY IDENTITY   INGRESS ENFORCEMENT   EGRESS ENFORCEMENT   VISIBILITY POLICY   ENDPOINT STATE   IPV4         IPV6
application-server-5d449c5769-bxkjv   2936                                                                               ready            10.0.2.233
application-server-5d449c5769-kmk5q   2936                                                                               ready            10.0.0.173
application-server-5d449c5769-m487p   2936                                                                               ready            10.0.2.145

Complete

Let's now enable transparent mTLS. For that we shall deploy a new pod into the lab8 environment and add a policy that would require all communication to History Server port (For example: 8080 or some other port you use in your container) to be encrypted.

Use the definition below to launch in a lab8 namespace.

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: pod-worker
  name: pod-worker
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot:latest
    command: ["sleep", "infinite"]

Next, execute a curl request from that Pod against a History Server pod. Make sure that you lookup the IP of a History Server pod your cluster.

$ kubectl exec -it pod-worker -n lab8 -- curl 10.0.1.110:3333

It should work - but checking Hubble we can see that traffic remains plaintext.

Hubble traffic

So, next step is to add a policy that will actually force encryption, when communicating with History Server. Cilium policies are very similar to standard Kubernetes network policies, you can also use an online editor to generate a more precise rule.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: mutual-auth-demo
spec:
  endpointSelector:
    matchLabels:
      app: electricity-calculator
      microservice: history-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: pod-worker
    authentication:
      mode: "required"
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"

A secret ingredient here is authentication.mode: "required". After applying it for a brief time you might see in the logs dropped packages caused by the requirement for authentication:

Hubble traffic

Now, try to run curl requests again and check Hubble.

Hubble mlauth enabled

Info

Behind the curtains, Cilium has helped to validate the certificates for the mTLS. Check the logs of the cilium pod on the same Kubernetes node where History Server pod is running.

2023-11-28T18:24:00.387901040-05:00 level=debug msg="Policy is requiring authentication" key="localIdentity=42955, remoteIdentity=51853, remoteNodeID=37176, authType=spire" subsys=auth
2023-11-28T18:24:00.436438988-05:00 level=debug msg="Validating Server SNI" SNI ID=51853 subsys=auth
2023-11-28T18:24:00.436873713-05:00 level=debug msg="Validated certificate" subsys=auth uri-san="[spiffe://spiffe.cilium/identity/51853]"
2023-11-28T18:24:00.439269832-05:00 level=debug msg="Successfully authenticated" key="localIdentity=42955, remoteIdentity=51853, remoteNodeID=37176, authType=spire" remote_node_ip=10.0.0.34 subsys=auth

Note the "remoteIdentity" in the logs. It corresponds to the originator of the traffic, in our case the worker-pod.

$ kubectl get cep -l app=pod-worker -n lab8 -o=jsonpath='{.items[0].status.identity.id}'
51853

Complete

Crate and apply CiliumNetworkPolicy manifests that require authentication between the following use case components in the lab8 namespace:

Application Server and History Server
History Server and Minio

Monitoring¶

Let's try out more fine-grained monitoring of traffic. For that, we need to switch on

Complete

Annote a pod with History Server app with annotation for outgoing traffic. Make sure that pod name and namespace is correct.

$ kubectl annotate pod history-server-646bb975b6-6pcvr -n lab8  policy.cilium.io/proxy-visibility="<Egress/53/UDP/DNS>,<Egress/80/TCP/HTTP>"

Make sure that this annotation remains after the lab as well.

Verify

Execute a workload that triggers external communication, e.g. pulling new package metadata.

$ kubectl exec pod/history-server-646bb975b6-6pcvr -n lab8 -- apk update

NB! The command apk update works for Alpine containers. For Ubuntu containers, it could be apt update instead.

Make sure that you observe detailed information in Hubble:

Hubble schema

Hubble traffic

Defending¶

While you've seen a few very powerful defensive technologies shown, these still have quite a few gaps, especially if the Kubernetes cluster isn't made with these defenses in mind. On top of this:

Any malicious user or service account having access to the network policy rules can edit and remove them.
It's still impossible to have default rules - making mistakes during application or infrastructure setup is very simple, especially with Helm packaged things, as you can't be sure if it includes the rules or not.
Already existing resources might behave maliciously.
You can never be sure if these defenses manage to defend against everything.

Thankfully, all of these issues have solutions. You'll start by fixing the last two first by adding a tool that is capable of detecting all the activities in the cluster, including any commands run in the pods themselves.

The other two issues you'll fix by adding a policy engine, which is capable of enforcing specific rules on the cluster by telling the API server to validate all the submitted manifests against it. It is also even capable of changing those manifests before they are deployed, automatically. This is called mutating and is done with mutatingwebhookcontrollers.

Mutating helps with doing things like, for example, adding nodeAffinity sections to all deployed pods automatically, allowing the cluster to be more balanced and failure tolerant without extra work from application developers or DevOps people.

Detection with Tetragon¶

Making sure your deployed resources don't behave maliciously, or any missed or zero-day issues aren't introduced to the cluster, you need continuous observability into all layers of your Kubernetes cluster.

You already have observability into your container application logs, Kubernetes audit logs, and Kubernetes and host metrics, but you're missing a big part of observability, that gives you information into what's actually running inside your containers, and what are they doing. You can think of this as audit logs for the operating system level, where every execution, file change and connection attempt is logged.

There's quite a few ways to do this, depending on how or what you're running on your machines. auditd is for base Linux, auditbeat for more specific endpoint protocols (like elasticsearch), and finally, in case of Kubernetes, you have Tetragon. Tetragon is very new, made by the same guys who made the Cilium CNI, and uses the same eBPF technology.

Start off by installing Tetragon.

Complete

Install Tetragon to your cluster, with the instructions here. It should be installed to kube-system namespace, as it needs to be very privileged to run:

helm repo add cilium https://helm.cilium.io
helm repo update
helm install tetragon ${EXTRA_HELM_FLAGS} cilium/tetragon -n kube-system
kubectl rollout status -n kube-system ds/tetragon

Tetragon starts up in the kube-system namespace, and instantly starts listening to specific events on all the nodes, in all the pods. By default, it listens to process lifecycle events - process_execute and process_exit, giving overview of all the execution events in your cluster.

You can do way more with Tetragon, like monitoring network traffic, host system, file access, and even more, but that will require extra configuration, and will probably also require more performant machines than ours. Always take into consideration, that monitoring also takes resources, so you should weigh necessity against available resources (budget).

Tetragon starts writing logs about these process lifecycle events to container standard output, but also to the /var/run/cilium/tetragon/ folder on the node. This makes it very easy to ingest the logs into Loki.

Complete

Ingest the JSON logs at /var/run/cilium/tetragon/tetragon.log into Loki with Promtail.

Add this part to config.snippets.extraScrapeConfigs:

  - job_name: tetragon
    static_configs:
      - targets:
          - localhost
        labels:
          job: tetragon
          __path__: /var/run/cilium/tetragon/tetragon.log

    pipeline_stages:
      - json:
          expressions:
            time: 
            node_name: 
            process_exec: 

      - json:
          expressions:
            process: process
          source: process_exec

      - json:
          expressions:
            process: process
          source: process_exit

      - json:
          expressions:
            exec_id: exec_id
            pid: pid
            uid: uid
            binary: binary
            arguments: arguments
            pod: pod
          source: process

      - json:
          expressions:
            pod_namespace: namespace
            pod_name: name
            container_id: container.id
            container_name: container.name
            container_image: container.image.name
            workload: workload
            workload_kind: workload_kind
          source: pod

      - match:
          selector: '{pod_namespace="longhorn-system"}'
          action: drop

      - labels:
          node_name:
          arguments:
          pod_namespace:
          pod_name:
          container_name:
          container_image:

      - timestamp:
          source: time
          format: RFC3339Nano

This is a very basic configuration, but it'll work for our purposes.

Also, do not forget to add necessary parts to the extraVolumes and extraVolumeMounts part in the Helm chart, otherwise the Promtail pods cannot access the specified folder. Mount the folder, not the log file directly.

Verify

You can verify if everything works by checking if the pods come up, whether the file is a successful target in the Promtail UI, and whether logs start appearing with job: tetragon in Grafana.

The last part to try out is trying to do malicious actions from any pod in the cluster (like the high-risk-pod) we setup in the first part of the lab. You should see the commands run inside the container in the logs, so if that container goes to read /etc/shadow or curl longhorn-ui, you should be able to see this.

Tetragon can be configured also to monitor network events, so all the malicious connections can be logged even better.

In real world, you'd route the events from Tetragon and Kubernetes API into a Security Information and Event Management (SIEM) system, which helps detect, analyze, and respond to security threats. Doing this, and using SIEM, are out of the scope of this course.

Enforcing rules with Kyverno¶

Kyverno is a policy engine, that can validate, mutate, generate, and cleanup Kubernetes resources, and verify image signatures and artifacts to help secure the software supply chain.

Once installed, Kyverno is configured with specific CRD-s. For this lab, you'll be using ClusterPolicy objects.

Danger

When you install Kyverno, it deploys a validating and mutating webhook controllers to the cluster, and all the applied manifests will be checked against those controllers in the future by the Kubernetes API server.

If you're not careful, this might cause a chicken-and-egg problem, where, if the Kyverno controllers are not accessible, you're not able to run any commands against the cluster, because the API server cannot validate the commands, and then denies the commands.

The easiest way to prevent this issue is to not run Kyverno on the kube-system namespace. Then, you might break everything else with your Kyverno policies, but you can at least fix it.

First, start with installing Kyverno.

Complete

Deploy Kyverno to a namespace called kyverno, with the instructions provided here. You can run Kyverno with one replica of every controller to save resources, but in production clusters, you should definitely run multiple replicas.

helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno -n kyverno --create-namespace

Verify

After deployment, make sure all the Kyverno pods are up, and the CRD-s have been made.

Now that Kyverno is up, we can start by defining some policies. Kyverno developers have really thought things through, and published a massive set of public policies which help understand, which policies are useful, how they work, and allow pulling inspiration from.

You'll be defining two policies for different purposes:

A policy, which mutates the namespace creation manifest to always create namespaces with proper Pod security labels.
A policy, which makes all namespaces have a default-deny network security policy. This makes sure only whitelisted traffic is allowed.

You should go through the list on the website to see the other very powerful policies possible with Kyverno.

Policy - add Pod Security Admission labels¶

Complete

Apply the first policy, which adds the Pod security labels, if they do not exist yet. You can find this rule in the Kyverno policies page, but you'll also do a few changes to better integrate with your cluster, and not break it.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-psa-labels
  annotations:
    policies.kyverno.io/title: Add PSA Labels
    policies.kyverno.io/category: Pod Security Admission
    policies.kyverno.io/severity: medium
    kyverno.io/kyverno-version: 1.7.1
    policies.kyverno.io/minversion: 1.6.0
    kyverno.io/kubernetes-version: "1.24"
    policies.kyverno.io/subject: Namespace
    policies.kyverno.io/description: >-
      Pod Security Admission (PSA) can be controlled via the assignment of labels
      at the Namespace level which define the Pod Security Standard (PSS) profile
      in use and the action to take. If not using a cluster-wide configuration
      via an AdmissionConfiguration file, Namespaces must be explicitly labeled.
      This policy assigns the labels `pod-security.kubernetes.io/enforce=baseline`
      and `pod-security.kubernetes.io/warn=restricted` to all new Namespaces if
      those labels are not included.
spec:
  rules:
  - name: add-baseline-enforce-restricted-warn
    match:
      any:
      - resources:
          kinds:
          - Namespace
    exclude: # (1)!
      resources:
        namespaces:
        - kube-system
        - longhorn-system
        - prometheus
        - cilium-spire
        - loki
    mutate: # (2)!
      patchStrategicMerge:
        metadata:
          labels:
            +(pod-security.kubernetes.io/enforce): baseline
            +(pod-security.kubernetes.io/warn): baseline

This makes sure that this rule won't be applied to namespaces that definitely should have different rules. While we could also avoid applying this rule by adding correct privileged labels to those namespaces first, we do this as a precaution.
This is where the magic happens. It uses the patchStartegicMerge with the + operator, which indicates that the label should be added if it's not already present.

Verify

After applying this ClusterPolicy rule to the cluster, try applying the manifest in the Pod Security Admission part of the lab without specifying the Pod security labels yourself. Your workload should still be rejected, and when checking the namespace, it should have the Pod security labels defined with baseline level.

This rule is a mutate rule, which means it is only executed on new submissions to the cluster, which it then potentially changes. While mutateExistingOnPolicyUpdate exists, which allows mutating old resources as well, this configuration requires extra permissions and is asynchronous. It's easier to just add the necessary labels to all old namespaces manually.

Policy - add default blocking `NetworkPolicy`¶

Complete

Apply the second policy, which creates a NetworkPolicy object to the cluster. This policy allows only intra-namespace traffic between Pods, while any other Pod in any other namespace in the cluster cannot connect. You can find this rule in the Kyverno policies page, but you'll also do a few changes to better integrate with your cluster, and not break it.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-networkpolicy
  annotations:
    policies.kyverno.io/title: Add Network Policy
    policies.kyverno.io/category: Multi-Tenancy, EKS Best Practices
    policies.kyverno.io/subject: NetworkPolicy
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: >-
      By default, Kubernetes allows communications across all Pods within a cluster.
      The NetworkPolicy resource and a CNI plug-in that supports NetworkPolicy must be used to restrict
      communications. A default NetworkPolicy should be configured for each Namespace to
      default deny all ingress and egress traffic to the Pods in the Namespace. Application
      teams can then configure additional NetworkPolicy resources to allow desired traffic
      to application Pods from select sources. This policy will create a new NetworkPolicy resource
      named `default-deny` which will deny all traffic anytime a new Namespace is created.      
spec:
  generateExisting: false # (1)!
  rules:
  - name: default-deny
    match:
      any:
      - resources:
          kinds:
          - Namespace
    exclude: # (2)!
      resources:
        namespaces:
        - kube-system
        - longhorn-system
        - kyverno
    generate: # (3)!
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      name: default-deny
      namespace: "{{request.object.metadata.name}}"
      synchronize: true # (4)!
      data:
        spec:
          # select all pods in the namespace
          podSelector: {}
          ingress:
            - from:
              - podSelector: {} # (5)!
          # deny all traffic
          policyTypes:
          - Ingress # (6)!

This flag makes sure the created or updated policy will also be applied to existing resources. We currently leave it as false, because we want to test it out first, to make sure it won't break anything.
Exclude the system namespaces, as blocking traffic to those can effectively shut your cluster down.
This is where the magic happens. generate rules generate new resources to the cluster, when the policy's conditions match.
synchronize: true makes it so, that if the generated resource gets edited or deleted, it gets automatically recreated to the format it's supposed to be. This is what makes enforment with this tool so easy, as namespace bound users cannot delete the generated resources and circumvent the rules.
This part makes all pods in the namespace be accessible by other pods.
Define only Ingress (incoming) traffic rule. Skip the Egress (outgoing) one to keep things simple. In high-security environments, you definitely want to define default rules with egress as well.

Verify

After applying the rule to the cluster, make a namespace. Instantly, you should see, that the namespace gets a NetworkPolicy object named default-deny.

If you try to deploy, for example, the nginx Pod inside the cluster, you won't be able to access it from other namespaces. Even kube-system!

This rule is a generate rule, which means it generates new objects to the cluster if the conditions in the policy are matched. By default, it works only when new objects match the conditions, but this behaviour can be changed with the generateExisting option.

Danger

Something to be very careful about, is that when you apply a default deny rule to the namespace, no other namespace can access it, not even the system level components like kube-system. This includes the ingress controller, PostgreSQL operator (if you have deployed a cluster in the namespace) and Prometheus.

For example, CloudNativePG is very specific about needing network policy in this case.

The solution to that is to have rules, which do allow the necessary traffic from these system level components. There's a few ways to handle this, depending on the capability of your users:

Leave it like it is, and tell your users to add necessary rules for all components they plan to use.
- This method is probably the most secure, as it's very explicit, but it's also very difficult for users, and requires you to have good documentation for them.
Write those rules into the default network policy that is deployed to all namespaces.
- This is slightly less secure (but still tremendously better than not having rules!), because you'll be opening traffic to namespaces which do not require it.
Use Kyverno to automatically deploy network policy rules when a specific object is created.
- This is the best compromise between usability and security, and puts the main responsibility on cluster admins. This solution is difficult to implement only for cluster administrators, as the rules can get complicated.

Here is an example of a Kyverno rule, that, when a CloudNativePG cluster gets defined in a namespace, it creates a NetworkPolicy in the namespace to allow the CloudNativePG operator to connect to the defined cluster.

Kyverno rule (click here)

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-networkpolicy-for-cnpg-operator
spec:
  generateExisting: true
  rules:
  - name: check-labels-and-generate
    match:
      resources:
        kinds:
        - postgresql.cnpg.io/v1/Cluster
    generate:
      synchronize: true
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      name: "{{ request.object.metadata.name }}-cnpg-policy"
      namespace: "{{ request.object.metadata.namespace }}"
      data:
        spec:
          podSelector:
            matchLabels:
              cnpg.io/cluster: "{{ request.object.metadata.name }}"
          policyTypes:
          - Ingress
          ingress:
          - from:
              - namespaceSelector:
                  matchLabels:
                    kubernetes.io/metadata.name: cnpg-system # Namespace where the operator is deployed
                podSelector:
                  matchLabels:
                    app.kubernetes.io/name: cloudnative-pg # Matches the Operator pod
            ports:
            - protocol: TCP
              port: 5432  # Postgresql
            - protocol: TCP
              port: 8000  # Status

Cluster Security labs' tasks¶

This week's task is simple:

turn generateExisting for the generate rule into true.
Add appropriate Pod Security Admission labels to all namespaces.

This will break bits and pieces of your previous labs functionality, which is visible in scoring by specific tests failing for previous weeks. Your task is to fix these tests.

Checks will verify that all your namespaces (except kube-system, longhorn-system and kyverno) have the default network policy named default-deny, and that all namespaces have the Pod Security Admission rules setup.

Important

By default, Cilium does not manage the Kubernetes control-plane components, like kube-api. This is due to a so-called chicken-and-egg problem, where these components need to be running before Cilium is. But if Cilium managed them, they could not start before Cilium is up. Cilium on the other hand cannot startup without the components.

You can check which endpoints are managed in Cilium with the following commands:

kubectl -n kube-system exec ds/cilium -- cilium status --verbose
kubectl -n kube-system exec ds/cilium -- cilium identity list

Being unmanaged by Cilium means, you cannot make Kubernetes NetworkPolicy objects that allow or deny traffic to these specific objects, but the default policy still blocks access from them. This basically causes a situation where you cannot allow traffic from them.

This usually isn't a problem, but in labs, the Nagios tests rely on proxying requests through the API server. Solving this requires us to use a Cilium-specific CiliumClusterwideNetworkPolicy manifest, and tell it to allow access to all endpoints from everything in the cluster, including kube-apiserver:

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "allow-kubeapi"
spec:
  endpointSelector: {}
  ingress:
    - fromEntities:
      - kube-apiserver
      - all

Make sure to deploy this into your cluster, otherwise Nagios checks will not work for some resources.

Lab 12 - Kubernetes security

Overview¶

Base security methods¶

Pod Security Admission¶

Network policies¶

Pod capabilities¶

sysctl¶

Service mesh¶

Enabling mTLS¶

Monitoring¶

Defending¶

Detection with Tetragon¶

Enforcing rules with Kyverno¶

Policy - add Pod Security Admission labels¶

Policy - add default blocking NetworkPolicy¶

Cluster Security labs' tasks¶

`sysctl`¶

Policy - add default blocking `NetworkPolicy`¶