Lab 9 - Monitoring

Overview¶

Metrics
- node_exporter
- Prometheus
Visualizations
- Grafana for Prometheus
Kubernetes specific metrics
- Kubernetes state metrics
- cAdvisor
Centralized logging
- Loki
- Promtail
- Grafana for Loki
Instrumenting your application

Introduction¶

Monitoring is the process of collecting, analyzing, and displaying data about the performance, availability, and health of systems, applications, and network infrastructure. By continuously observing the state of these components, monitoring enables operators and administrators to gain insights into the behavior of their environment, detect potential issues, and proactively resolve problems before they impact users or critical business processes.

The need for monitoring arises from the increasing complexity and scale of modern IT environments, which require effective tools and techniques to ensure their reliability, performance, and security. Monitoring helps organizations maintain a high level of service quality, adhere to service level agreements (SLAs), and optimize resource usage, all of which contribute to better user experience and overall operational efficiency.

In practice, monitoring involves gathering various types of data, such as metrics, logs, and traces, from different sources and presenting them in a meaningful way. This data is then used to analyze trends, detect anomalies, generate alerts, and drive informed decision-making. Additionally, monitoring supports capacity planning, performance tuning, incident management, and root cause analysis efforts, which are essential for maintaining the availability and performance of IT services.

Metrics¶

Metrics are numerical measurements that represent the behavior, performance, and health of systems, applications, and network infrastructure. They provide quantitative data that helps operators and administrators understand the current state of their environment and make data-driven decisions. By continuously collecting and analyzing metrics, sysadmins can identify trends, detect anomalies, and optimize their infrastructure to ensure a high level of service quality and performance.

A simple example of a metric is the amount of time it took for the web server to load the content of this page, or how much CPU the container running the web server takes.

In practice, metrics are gathered from different sources, such as operating systems, hardware components, and application instrumentation. This data is collected at regular intervals and stored in time-series databases, which enable efficient querying, aggregation, and analysis of historical and real-time data. Metrics can be visualized in dashboards, charts, and graphs, providing a clear and concise view of the monitored systems' performance and health.

In this part of the lab, you'll be setting up Prometheus - a time-series database made for primarily querying and storing metrics - and an exporter, called node_exporter, which Prometheus can query for operating system level information. Then, as Kubernetes publishes several types of metrics natively in Prometheus format, you'll query those metrics to give better visibility into your managed cluster and applications.

`node_exporter`¶

First you'll be setting up the node_exporter tool to become familiar with the general methodology, and also the format of metrics used by Prometheus. Node Exporter is a widely used, open source monitoring tool that collects hardware and operating system metrics from Linux-based systems.

Developed as part of the Prometheus project, Node Exporter exposes these metrics in a format that can be easily scraped and ingested by a Prometheus server.

You'll be using a resource type called DaemonSet, which is very similar to Deployment - the difference is in the fact, that while with Deployment, you can choose how many replicas are going to run run, a DaemonSet always runs one replica on each node of the cluster. This is perfect for distributing monitoring tools across the cluster, as monitoring tools usually require a single agent. Additionally, when you add another node to the cluster, DaemonSet will cause the monitoring agent to be scheduled on the node automatically, reducing management overhead.

Complete

Setup the node_exporter as a DaemonSet resource in Kubernetes. Create a new namespace called monitoring for this task, and deploy this manifest there.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        args:
        - "--path.procfs=/host/proc"
        - "--path.sysfs=/host/sys"
        - "--path.rootfs=/host/root"
        - "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)"
        ports:
        - containerPort: 9100
          name: http-metrics
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: root
          mountPath: /host/root
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /

A few things to note from this manifest:

It exposes a containerPort 9100.
It mounts several filesystems /proc, /sys and even just /.
Even though it does so in read-only mode, this has serious security implications, if you do not trust the software, as it can read the critical system locations /etc/passwd, /etc/shadow, /etc/kubernetes.
Sadly monitoring solutions need access to most system level locations to gather the information to report on.
- Process info comes from /proc.
- Mount info comes from /etc/fstab, /proc/mounts.
- and so on.

Verify

You should be able to see the node_exporter containers starting up fairly soon. When you query the port using port forwarding, or with curl from inside the cluster, you'll get a response which directs you to /metrics instead.

If you now query /metrics, you'll see a format like this:

...
# HELP node_filefd_allocated File descriptor statistics: allocated.
# TYPE node_filefd_allocated gauge
node_filefd_allocated 2048
# HELP node_filefd_maximum File descriptor statistics: maximum.
# TYPE node_filefd_maximum gauge
node_filefd_maximum 9.223372036854776e+18
# HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/sda1",fstype="xfs",mountpoint="/"} 4.5791588352e+10
node_filesystem_avail_bytes{device="none",fstype="ramfs",mountpoint="/run/credentials/systemd-sysctl.service"} 0
...

This is the OpenMetrics format, which Prometheus and it's exporters use for describing metrics. The format short description:

<metric name>{<label_key>=<label_value>} <metric_value>

The result is a list of metrics at the time of query. For example, it might be the response contains node_cpu_seconds_total metric with labels cpu="0" and mode="idle", with the metric value 2.19401742e+06.

Additionally, there's four important elements to the metric:

The HELP comment, that provides a human-readable description of the metric which follows (is lower in the list).
The TYPE comment, this specifies the metric's type, such as gauge, counter, histogram. It is mainly useful during querying and visualizing the metrics, as different types need to be visualized differently.
For example, gauge is a value that can go up or down, like temperature. You can use this for visualization without extra elements.
counter, on the other hand, is a metric that either grows forever or can be reset to zero. They are often used to represent cumulative metrics, like total number of bytes transferred. As this metric can get into billions or trillions very easily, it's not human readable without some tricks.
- One of the easiest tricks is to, instead of using the value itself, use it's rate of change. For example, how much has the bytes transferred metric changed in the past minute (derivative). This is usually much more human readable.
The metric itself, in the format of metric_name numeric_value.
Some systems are more complicated, and the usable monitoring requires more context - for that you can add labels to every metric. These are key-value pairs enclosed in curly braces, and they provide additional dimensions for the metric.
In case of node_filesystem_avail_bytes{device="/dev/sda1",fstype="xfs",mountpoint="/"} 4.5791588352e+10 - device, fstype and mountpoint are metrics. This allows you to see the available bytes based on specific mount points and filesystem types, not only as an aggregate over the system.

You'll also see, that there's thousands of lines of these metrics. This is normal - the more complicated the system, the more metrics there is going to be. It might also seem like a tremendous amount of information, but it's actually not, at least not for computers:

$ curl 10.0.1.76:9100/metrics | wc --bytes # (1)
129519

$ echo "129519.0/1024.0" | bc -l # (2)
126.48339843750000000000

Count the bytes in the response.
Convert the bytes into kilobytes.

This is less than the first response by google.com, not to mention any other websites with actual graphics.

Prometheus¶

Now that you have an exporter in place, setup a system to query that exporter, and provide a way to store and aggregate this data.

Prometheus is an open source monitoring and alerting toolkit designed for reliability and scalability, primarily used for monitoring containerized and distributed systems. Developed at SoundCloud and now a part of the Cloud Native Computing Foundation (CNCF), Prometheus has become a widely adopted monitoring solution in modern cloud-native environments. Its powerful query language, data model, and efficient storage design make it an ideal choice for organizations seeking to monitor their infrastructure, applications, and services.

Prometheus operates on a pull-based model, where it actively scrapes metrics from target systems or applications at regular intervals. This approach simplifies the overall architecture and makes it easier to scale and manage. You'll be configuring Prometheus to query the Node Exporter every 15 seconds.

Complete

Setting up Prometheus is a bit more work, as we also want to give it a configuration file. We will be doing this with a ConfigMap resource, which gets mounted to the prometheus Pod in a place where Prometheus software expects it.

Also, we will be using a StatefulSet resource here, instead of a Deployment or DaemonSet. It provides a better way of running stateful applications, as Kubernetes is less likely to try to move them away, and gives the resources handled by StatefulSet better ordering and uniqueness guarantees.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data: # (1)
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'kubernetes-node-exporter'
        static_configs:
          - targets: ["10.0.1.72:9100"]
          - targets: ["10.0.0.139:9100"]
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  serviceName: prometheus
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      initContainers: # (2)
      - name: prometheus-data-permission-setup
        image: busybox
        command: ["/bin/chown","-R","65534:65534","/data"]
        volumeMounts:
        - name: data
          mountPath: /data
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/data"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          name: http-metrics
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: prometheus-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi
---

Change the IPs to be the cluster-internal IP addresses of your node-exporter pods. You'll deal with the problems this will cause a bit later.
Init containers are special types of containers, that get run before the main container runs. It's often used to do operations before the main container can start, like in this case, where we set correct permissions for Longhorn /data volume.

Things to note here:

We will be using volumeClaimTemplates, which automatically generates a PVC for us.
The container exposes the port 9090.
We create a ConfigMap resource, which will get mounted to the Pod at the path /etc/prometheus.
Data will be kept by Prometheus on the persistent volume, mounted to /data in the Pod.
If the node-exporter pods get deleted, then this configuration currently goes out of sync.

Before we can deal with the node-exporter issue, create a NodePort service for Prometheus, so you can access it.

Verify

You can make sure everything works properly by either:

Going to Status tab -> Targets. You should have one endpoint listed there, with the state UP.
Going to the Graph tab, and inserting a PromQL query up{} into the expression input
On pressing Enter or Execute, you should receive a result.

The second option is what querying Prometheus looks like. By using their own query language called PromQL, you can query the different metrics.

If you want to play around with more complicated queries, here's a few examples: Awesome Prometheus

Now, back to the issue of non-dynamic node-exporter targets. Your first reaction might be to just create a service for node-exporter pods, and use that address in the scrape target configuration of Kubernetes.

The problem with this is, remember, that service forwards the request to a random backend pod, but makes no guarantees about which one. So you can have a situation, when the next 15 requests go to only the pod on the worker node, completely removing control plane node from visibility.

As powerful as Kubernetes might be, there's no way to easily solve this with Kubernetes' own tools - Kubernetes does not provide a resource that automatically allows for that. Thankfully, Kubernetes and Prometheus allow for this using another methodology.

Prometheus allows querying Kubernetes API to automatically find specific resources. This is called service discovery (SD). Doing this requires creating a Kubernetes service account, mounting it to the Prometheus container, giving the service account appropriate permissions, and using it to query Kubernetes API.

Complete

---
apiVersion: v1
kind: ServiceAccount # (1)
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole # (2)
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/metrics
  - nodes/proxy
  - nodes/stats
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding # (3)
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data: # (4)
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'kubernetes-node-exporter'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: node-exporter
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  serviceName: prometheus
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus # (5)
      initContainers:
      - name: prometheus-data-permission-setup
        image: busybox
        command: ["/bin/chown","-R","65534:65534","/data"]
        volumeMounts:
        - name: data
          mountPath: /data
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/data"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          name: http-metrics
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: prometheus-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi
---

The ServiceAccount resource creates a service account with no permissions.
The ClusterRole resource creates a role that is allowed to query the Kubernetes API for the list and description of nodes, pods, services, nodes and endpoints in the whole cluster.
The ClusterRoleBinding resource gives the ClusterRole membership to specific subjects, in this case our ServiceAccount.
The kubernetes_sd_configs does the magic here, and is configured to go over the list of all the pods in the cluster. The relabel_config part configures Prometheus to keep the scrape data only if the app label of the pods is node-exporter.
This option mounts the ServiceAccount secret to the pods, so applications running inside the Pod could query the Kubernetes API.

Verify

After the pod restarts, you can check whether everything works from the Status -> Targets and Status -> Service Discovery Prometheus UI pages. The Service discovery page lists all resources that are found, together with all the information that Prometheus gets about the service.

Visualizations with Grafana¶

While you can already use Prometheus to do some basic queries, using it is not very pleasant - you need to go to a page, type in the query, and so on. This would be a nightmare to use in a team, as you can't expect everyone to know the queries. Also, this methodology does not allow for incremental improvements by team members.

This is why, usually, monitoring systems come with some way to build visualisations, dashboards, graphs, and diagrams, as humans are way better at seeing and correlating patterns in images, than in textual representation of numbers.

Grafana¶

Grafana is an open source, feature-rich visualization and analytics platform used for monitoring and observability. It provides an intuitive and customizable interface for visualizing time-series data from various data sources, such as Prometheus, InfluxDB, Elasticsearch, and many more. Grafana's popularity stems from its flexibility, ease of use, and extensive plugin ecosystem.

You'll be using Grafana as a visualisation frontend to Prometheus, and later, Loki, mainly because Grafana is so widely used, feature rich and open source.

Complete

Deploy Grafana to a namespace called grafana.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi  # adjust size as needed
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      initContainers:
      - name: grafana-data-permission-setup
        image: busybox
        command: ["/bin/chown","-R","472:472","/var/lib/grafana"]
        volumeMounts:
        - name: grafana-storage
          mountPath: /var/lib/grafana
      containers:
      - name: grafana
        image: grafana/grafana:latest
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: "admin"
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: "password" # (1)
        ports:
        - name: http
          containerPort: 3000
        readinessProbe:
          httpGet:
            path: /api/health
            port: http
          initialDelaySeconds: 30
          timeoutSeconds: 5
        volumeMounts:
        - name: grafana-storage
          mountPath: /var/lib/grafana
      volumes:
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-pvc
---

Make sure to set this password to something else, but be careful - it's not secret. Anyone with access to Kubernetes can see it.

Concerning the password, sometimes in Kubernetes ecosystem you see deployments that set passwords this way, even though it would be fairly simple to convert it to a Secret based approach. Here we use it this way just to keep the manifest as short as possible, and we will make it perfect when we use operators to deploy this system in a later lab.

Make sure to expose the Grafana on port 30000 using NodePort.

Complete

You can verify this service by going to the exposed port with your browser. You should be greeted with a graphical user interface, asking you to login. This is where the environment variables come into play, that we setup in the manifest. Use those to login.

Once you have logged into the system, you should have a fairly empty Grafana with no visualisations or dashboards. Fix this issue by importing a dashboard to view your metrics.

Add a Prometheus data source

In the left burger menu (three horizontal stripes), go to Connections -> Data Sources, and click on the + Add new data source button in the top left.

From the list of possible data sources, we now need to choose Prometheus, and click on it. In the new window, we need to only fill in the URL field, with the cluster-internal DNS for Prometheus. The format is <service_name>.<namespace>.svc:<service_port>.

In the bottom of the page, there's a Save & Test button. Upon clicking it, you should get a message that says everything succeeded.

Import a dashboard

In the left burger menu (three horizontal stripes), click on Dashboards. A new view is available, with a blue New button on the right. Click on New -> Import.

There should be a text input box asking for Grafana.com dashboard URL or ID. Insert the number 1860 there. This number corresponds to this dashboard from Grafana's page (check the lecture slides).

You can now click load. In the next screen, you need to fill in the Prometheus data source, and then you can click import.

After clicking import, you should have a nice screen with a lot of metrics visible to you. If something is broken, these metrics are going to be empty, in which case it's a good idea to ask teachers for help.

You can now browse the dashboard, and see the different node specific metrics about the different nodes in the cluster.

Kubernetes specific metrics¶

In this section, you'll be adding some metrics which allow obtaining information about Kubernetes as well. With microservices based architectures, like Kubernetes is and is mainly meant to accommodate, good monitoring is of upmost importance, and often the only way to receive a full picture of the cluster.

Kubernetes state metrics¶

kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects like Deployments, Pods, and Services.

For example, it can tell you the number of desired replicas for a Deployment, the current CPU requests or limits for a Pod, or even the number of nodes in specific conditions. These metrics are often essential for scaling, monitoring, and alerting purposes.

It exposes these metrics natively in the Prometheus format, making integration with Prometheus fairly simple. The only problem is that because it tries to give information about the whole state of the cluster, it needs to have permissions to the whole cluster.

The repository for it is located here in GitHub.

Complete

Deploy the kube-state-metrics service from the repository, from examples/standard folder. You will need to deploy all the YAML files (except kustomization.yaml) there. As per usual, security wise, you should understand why each file is deployed and what they do.

cluster-role.yaml: Defines the permissions that kube-state-metrics needs to query the Kubernetes API server. Specifies what kinds of resources and actions (like get, list, watch) are permitted.
cluster-role-binding.yaml: Binds the ClusterRole to a specific ServiceAccount, granting those permissions to any pod running under that ServiceAccount.
service-account.yaml: Creates a ServiceAccount under which the kube-state-metrics pod will run, providing a way to control API access.
deployment.yaml: Specifies the desired state for the kube-state-metrics Deployment.
service.yaml: Creates a Kubernetes service to expose the kube-state-metrics Deployment for internal access.

The kube-state-metrics is deployed to the kube-system namespace, as it's security requirement is quite high. Deploying it into system-level components namespace keeps things understandable, as everyone knows that all resources there require highest level of security.

Verify

If everything is working properly, you should be able to query the kube-state-metrics service with curl inside the cluster, or visit it using kubectl port-forward from the browser. It should give you a long list of metrics, some of which are:

kube_pod_status_phase: Indicates the lifecycle phase of a pod (for example, Running, Pending, Failed). This is critical for understanding the state of workloads in your cluster.
kube_node_status_condition: Indicates various conditions of a node like Ready, OutOfDisk, MemoryPressure, and so on. Essential for understanding the health of the nodes in the cluster.

These metrics are now ready to be gathered by Prometheus, and used for visualization, and in production clusters, alerting.

Now that you have the metrics, you need to also configure Prometheus to query this exporter.

Complete

Change the Prometheus ConfigMap appropriately.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-node-exporter'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: node-exporter
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: kubernetes_pod_label_app

      - job_name: 'kubernetes-kube-state-metrics'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
            action: keep
            regex: kube-state-metrics
          - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
            target_label: kubernetes_pod_label_app

When you make any changes to Prometheus configuration, you need to either submit HTTP POST request to Prometheus /-/reload endpoint, or restart the container. Our simple version of Prometheus cannot detect configuration file changes.

Verify

Verify whether Prometheus can access the metrics by checking the list of targets in the UI.

You can use Grafana dashboard 13332 to visualize the scraped data easily.

If the grafana dashboard shows no information, but the targets show up in Prometheus, then make sure that the name of the data source (when importing the Dashboard) matches precisely (including letter capitalization) with the name of the previously created datasource. It is possible the dashboard import form assigns a wrong name with extra Capital letters.

Now you have a monitoring system that shows metrics about all the workloads managed by Kubernetes. This is the basis of proper monitoring for Kubernetes - in a production Kubernetes cluster metrics like the state of nodes are of crucial value, and you can build whole monitoring systems based on these.

Adding useful labels¶

When you start deep-diving into the metrics, one of the more annoying things you're going to notice, is that the metrics do not have good labels with them. For example, the Host dropdown in the Node Exporter dashboard has the internal IP addresses and ports of the node-exporter pods.

This is not very useful, as you cannot directly correlate which IP is about which node. Additionally, the IP changes on pod restart, meaning you cannot have a static graph about a node.

Thankfully, Prometheus relabel_configs, which you have already used to add the kubernetes_pod_label_app label, can also be used to fix this problem. Specifically, if you check the service discovery page in UI, it shows all the fields and their values that Prometheus scrapes from Kubernetes.

One such field is __meta_kubernetes_pod_node_name.

Complete

Relabel the __meta_kubernetes_pod_node_name label into a Prometheus metric label instance for the node-exporter scrapes.

  - source_labels: [__meta_kubernetes_pod_node_name]
    target_label: instance

Add this part to the correct place and reload/restart the Prometheus again.

Verify

When you go to the Node Exporter dashboard, the Host should contain correct node names now. If the dashboards don't cooperate, reduce time from 24 hours to 15 minutes.

cAdvisor¶

cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of running containers. Integrated into the Kubelet agent in every Kubernetes node, cAdvisor collects and exposes metrics about CPU, memory, network, and disk usage for all containers running on that node. These metrics are useful for monitoring and performance tuning.

You might be asking what is the difference between node-exporter and cAdvisor - both of them show CPU, memory, network, and so on. The problem is, that node-exporter shows these metrics aggregated over the whole host. With cAdvisor, you can see which pod is using how much CPU at a point in time, which is immensely useful for debugging, but also for billing.

Kubernetes nodes expose cAdvisor metrics on the path <node_name>:<node_port>/metrics/cadvisor. This endpoint requires a valid TLS certificate, and also uses HTTPS. The only secure method is to both use the Kubernetes CA to validate the certificate, and some kind of service account or KUBECONFIG to authenticate against the endpoint. Thankfully, Kubernetes makes this rather simple.

Complete

Add this configuration to the scrape_configs part in the Prometheus ConfigMap.

 - job_name: 'kubernetes/cadvisor'
   scheme: https # (1)
   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # (2)
   tls_config:
     ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # (3)
   kubernetes_sd_configs:
     - role: node # (4)
   relabel_configs:
     - target_label: __address__
       replacement: kubernetes.default.svc:443 # (5)
     - source_labels: [__meta_kubernetes_node_name]
       regex: (.+)
       target_label: __metrics_path__
       replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor # (6)
     - source_labels: [__meta_kubernetes_node_name]
       target_label: kubernetes_node_name
     - source_labels: [pod]
       target_label: pod_name
       action: replace
     - action: labelmap
       regex: __meta_kubernetes_node_label_(.+) # (7)

Use HTTPS to query the cAdvisor endpoint.
Use the token from the service account which we made to use service discovery.
Validate the HTTPS connection with the Kubernetes CA, which also gets mounted to the pod when you use a service account.
Use node service discovery, meaning that you'll get a list of nodes instead of pods.
Query the Kubernetes API endpoint, instead of going to the address of the node.
Use Kubernetes API proxying to proxy the connection to the Kubelet node endpoint. This is just to keep the traffic inside the cluster.
Convert all labels from SD __meta_kubernetes_node_label_* to Prometheus labels.

This configuration is a bit more difficult, but already takes care of very many issues that you'd get, trying setup Prometheus to scrape Kubelet manually.

Verify

As with other metrics, you can verify this by going to the UI, and checking under targets. The targets should be there, with state "UP". If it's in any other state, then there's issues with the permissions or certificates.

You can use the dashboard 15398 to look into these metrics. While this dashboard ties together with other metrics, the last two sections Workload and Container CPU / Memory / Filesystem show the metrics from cAdvisor.

Centralized logging¶

Centralized logging is the practice of aggregating logs from multiple sources, such as applications, services, and infrastructure components, into a single, centralized location. A logging solution deployed in a distributed fashion prevents from operators losing their visibility into the systems from a single failure in a monitoring solution.

Newer logging solutions also allow for ways to aggregate logs with other information from their systems, like metrics. This kind of centralized logging aggregated with other systems also simplifies the troubleshooting process by providing a single, unified view of log data, reducing the time spent searching for relevant information across disparate systems. You'll be brushing past this very quickly, but if you want, you can play around inside the Grafana once you have both Prometheus and Loki connected with it.

Loki¶

Loki is an open source, horizontally scalable log aggregation system developed by Grafana Labs. Inspired by Prometheus, Loki is designed to be cost-effective, easy to operate, and efficient in handling large volumes of log data. It provides a simple yet powerful solution for centralizing logs from various sources, such as applications, services, and infrastructure components, making it easier to search, analyze, and visualize log data. Loki integrates seamlessly with the Grafana visualization platform, enabling users to explore logs alongside metrics for a comprehensive view of their systems.

The need for a tool like Loki arises from the challenges posed by modern, distributed environments, where logs are generated by numerous components running across multiple nodes. Traditional log management solutions can struggle to cope with the volume and complexity of log data produced by these systems, leading to increased operational overhead, storage costs, and difficulty in extracting meaningful insights. Loki addresses these challenges with a unique approach that indexes only the metadata of log data (for example labels and timestamps), rather than the log content itself. This results in a more efficient and cost-effective storage solution, while still providing fast and accurate log querying capabilities.

Even though Loki is designed to be simple to deploy and manage, it's not. It is simpler than other such distributed logging solutions, but it is definitely not simple. For simplicity sake, you'll be deploying this application in a single-binary fashion - as a monolithic application, and you'll take no benefit from it being a distributed logging solution, but it's a good stepping stone, and way easier to do if you have general knowledge of the system.

As logging systems usually follow the push methodology, where an agent pushes logs to the central system, then you need to start from setting up the central system.

Setup Loki

Deploy Loki in a new namespace called loki. You'll also need to expose it via NodePort service - dedicate the host port 30310 for this.

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
  namespace: loki
data:
  loki.yaml: |
    auth_enabled: false

    server:
      http_listen_port: 3100

    common:
      ring:
        instance_addr: 127.0.0.1
        kvstore:
          store: inmemory
      replication_factor: 1
      path_prefix: /data/loki

    ingester:
      lifecycler:
        address: 127.0.0.1
        ring:
          kvstore:
            store: inmemory
          replication_factor: 1
      chunk_idle_period: 15m
      chunk_retain_period: 5m
      wal:
        dir: /data/loki/wal

    schema_config:
      configs:
        - from: 2024-01-01
          schema: v13
          store: tsdb
          object_store: filesystem
          index:
            period: 24h
            prefix: kubernetes_

    storage_config:
      filesystem:
        directory: /data/loki/chunks
      tsdb_shipper:
        active_index_directory: /data/loki/index
        cache_location: /data/loki/index_cache

    limits_config:
      reject_old_samples: true
      reject_old_samples_max_age: 168h

    table_manager:
      retention_deletes_enabled: true
      retention_period: 168h

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: loki
  namespace: loki
spec:
  serviceName: loki
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      initContainers:
      - name: init-chown-data
        image: busybox:1.33.1
        command: ["sh", "-c", "chown -R 10001:10001 /data/loki"]
        volumeMounts:
        - name: data
          mountPath: /data/loki
      containers:
      - name: loki
        image: grafana/loki:latest
        args:
        - "-config.file=/etc/loki/loki.yaml"
        ports:
        - containerPort: 3100
          name: http-metrics
        volumeMounts:
        - name: config
          mountPath: /etc/loki
        - name: data
          mountPath: /data/loki
      volumes:
      - name: config
        configMap:
          name: loki-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi

The deployment in similar to Prometheus, but the configuration is much more complicated. The main things to remember from here are, that we save data onto a Longhorn volume, and keep the data for a week, to prevent issues with disk storage. Other settings you'll learn about when they become relevant.

Verify

You can verify whether Loki started up by querying it over the exposed port with curl, by doing curl <hostname>:<port>/ready. By Loki API specification, it should answer it with ready in a bit of time.

Make sure to give it a minute or two the first time it starts up - it needs time to initialise Longhorn disks and its internal processes.

Once you have the central system in place, you can continue with installing the log exporter called promtail.

Setup Promtail¶

Promtail is an open source log collection agent developed by Grafana Labs, specifically designed to integrate with the Loki log aggregation system. As a crucial component of the Loki ecosystem, Promtail is responsible for gathering log data from various sources, such as files, systemd/journald, or syslog, and forwarding it to a Loki instance.

First, while keeping it simple, you'll just send the container logs from /var/log/pods/*/*/*.log on Kubernetes hosts to Loki using Promtail. When combined with Kubernetes service discovery, it already becomes fairly powerful.

Complete

Setup Promtail in the loki namespace. Also, configure it to use Kubernetes service discovery, so that the logs would be useful and have Kubernetes related contexts (namespaces, pod names, and so on.)

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: promtail-service-account
  namespace: loki
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: promtail-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: promtail-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: promtail-role
subjects:
- kind: ServiceAccount
  name: promtail-service-account
  namespace: loki
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: loki
data:
  promtail.yaml: |
    server:
      http_listen_port: 3101

    positions:
      filename: "/tmp/positions.yaml"

    clients:
      - url: http://loki-svc.loki.svc.cluster.local:3100/loki/api/v1/push

    scrape_configs:
    - job_name: pod-logs
      kubernetes_sd_configs:
        - role: pod
      pipeline_stages:
        - cri: {}
      relabel_configs:
        - source_labels:
            - __meta_kubernetes_pod_node_name
          target_label: __host__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          replacement: $1
          separator: /
          source_labels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_name
          target_label: job
        - action: replace
          source_labels:
            - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_name
          target_label: pod
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_container_name
          target_label: container
        - replacement: /var/log/pods/*$1/$2/*.log
          separator: /
          source_labels:
            - __meta_kubernetes_pod_uid
            - __meta_kubernetes_pod_container_name
          target_label: __path__
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
  namespace: loki
spec:
  selector:
    matchLabels:
      name: promtail
  template:
    metadata:
      labels:
        name: promtail
    spec:
      serviceAccountName: promtail-service-account
      containers:
      - name: promtail
        image: grafana/promtail:latest
        ports:
        - name: http-metrics
          containerPort: 3101
          protocol: TCP
        args:
        - "-config.file=/etc/promtail/promtail.yaml"
        env:
        - name: 'HOSTNAME'
          valueFrom:
            fieldRef:
              fieldPath: 'spec.nodeName'
        volumeMounts:
        - name: config
          mountPath: /etc/promtail
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: config
        configMap:
          name: promtail-config
      - name: varlog
        hostPath:
          path: /var/log

Keep in mind, you might need to change the clients URL in the ConfigMap, depending on which service name you used for loki.

Verify

You can check whether everything works by checking the Promtail UI at their port 3101. Similarly to Prometheus, you can see the service discovery and targets information.

Grafana¶

As the last thing, also make these logs visible in the Grafana, by adding Loki as a data source and importing a new dashboard.

Add a Loki data source and dashboard

In the left burger menu (three horizontal stripes), go to Connections -> Data Sources, and click on the + Add new data source button in the top left. From the list of possible data sources, we now need to choose Loki, and click on it.

In the new window, we need to only fill in the URL field, with the cluster-internal DNS for Loki. The format was <service_name>.<namespace>.svc:<container_port>.

In the bottom of the page, there's a Save & Test button. Upon clicking it, you should get a message that says everything succeeded.

To check the logs, either use the Explore page in Grafana, or import a dashboard with ID 13639.

You are now free to play around with your newfound monitoring capability.

Danger

In the future labs, the automated monitoring system might rely on your monitoring to detect workloads or pods, or to check logs.

Teachers will also ask you to add or improve logging, as visibility in micro-services architecture needs to go hand-in-hand with the workloads themselves.

Instrumenting your application¶

One of the more useful applications of observability is getting runtime information from your own developed software. The Prometheus ecosystem makes it easy by publishing client libraries, that can easily be used to publish Prometheus metrics from applications on a single port: Prometheus client libraries

Complete

Your task is to extend your Application Server to expose Prometheus metrics. You can use the Prometheus client libraries, or build your own.

Your application should be able to expose these two metrics:

application_server_start_time - a metric, that gives a Unix timestamp of when the application started up. This metric gets changed once, when the pod running the application starts. It should also have two labels:
- version - this should be filled by a unique identifier about the version of the application. Could be used to detect different versions of applications running.
- container - this should be filled by using the Kubernetes Downward API and show the name (metadata.name field) of the container which was queried for metrics.
application_server_requests_total - a metric, that tracks the count of requests to your Application Server. Each request that your application handles, should increment this metric by one. It should also have the labels from before, but also one unique one:
- status_code - the HTTP status code for the response of the request (200 for OK and so on). This would be used to track how your application handles requests.

The port and path which your application exposes the metrics are up to you to decide, but you will need to configure Prometheus to scrape these metrics.

Example of these metrics:

# HELP application_server_start_time Unix timestamp of when the application started
# TYPE application_server_start_time gauge
application_server_start_time{version="v1.2.3",container="application-server-1"} 1699113600
# HELP application_server_requests_total Count of requests to the Application Server
# TYPE application_server_requests_total counter
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="200"} 1500
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="404"} 35
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="500"} 7

You can add more metrics which help you observe your applications, but these two need to exist. Scoring will check whether they exist by querying Prometheus.

To configure Prometheus to scrape your metrics, you need to change the ConfigMap to make Prometheus aware of your application, similarly how it was done for node_exporter and kube-state-metrics. Keep in mind the relabel config which has action: keep - it is an important part of Prometheus ingesting the results.

Info

If you get stuck with implementing prometheus metrics in your application server code, here are a few suggestions:

Use Prometheus client Gauge class for creating a server start time metric when your application is started
Use Counter class for creating the request counter metric. Incement it every time your code returns a response.
Create a new API endpoint /metrics and HTTP GET method implementation in your server. The prometheus client method generate_latest() can be used to automatically generate the full string of metrics that this method should return as a HTTP response (use 'text/plain; charset=utf-8' as mime type).

Verify

You can verify whether your application exposes these metrics by accessing the metrics port (similar to how you accessed node_exporter). As it's text information, you should nicely see the output similar to the example.

In Prometheus, you can just query these labels in the Explore view. These labels should be unique to the Prometheus, so if no results are given, Prometheus is unable to query the metrics. In this case, open Prometheus UI and see what's up.

Lab 9 - Monitoring

Overview¶

Introduction¶

Metrics¶

node_exporter¶

Prometheus¶

Visualizations with Grafana¶

Grafana¶

Kubernetes specific metrics¶

Kubernetes state metrics¶

Adding useful labels¶

cAdvisor¶

Centralized logging¶

Loki¶

Setup Promtail¶

Grafana¶

Instrumenting your application¶

`node_exporter`¶