Lab 9 - Monitoring
Overview¶
- Metrics
node_exporter
- Prometheus
- Visualizations
- Grafana for Prometheus
- Kubernetes specific metrics
- Kubernetes state metrics
- cAdvisor
- Centralized logging
- Loki
- Promtail
- Grafana for Loki
- Instrumenting your application
Introduction¶
Monitoring is the process of collecting, analyzing, and displaying data about the performance, availability, and health of systems, applications, and network infrastructure. By continuously observing the state of these components, monitoring enables operators and administrators to gain insights into the behavior of their environment, detect potential issues, and proactively resolve problems before they impact users or critical business processes.
The need for monitoring arises from the increasing complexity and scale of modern IT environments, which require effective tools and techniques to ensure their reliability, performance, and security. Monitoring helps organizations maintain a high level of service quality, adhere to service level agreements (SLAs), and optimize resource usage, all of which contribute to better user experience and overall operational efficiency.
In practice, monitoring involves gathering various types of data, such as metrics, logs, and traces, from different sources and presenting them in a meaningful way. This data is then used to analyze trends, detect anomalies, generate alerts, and drive informed decision-making. Additionally, monitoring supports capacity planning, performance tuning, incident management, and root cause analysis efforts, which are essential for maintaining the availability and performance of IT services.
Metrics¶
Metrics are numerical measurements that represent the behavior, performance, and health of systems, applications, and network infrastructure. They provide quantitative data that helps operators and administrators understand the current state of their environment and make data-driven decisions. By continuously collecting and analyzing metrics, sysadmins can identify trends, detect anomalies, and optimize their infrastructure to ensure a high level of service quality and performance.
A simple example of a metric is the amount of time it took for the web server to load the content of this page, or how much CPU the container running the web server takes.
In practice, metrics are gathered from different sources, such as operating systems, hardware components, and application instrumentation. This data is collected at regular intervals and stored in time-series databases, which enable efficient querying, aggregation, and analysis of historical and real-time data. Metrics can be visualized in dashboards, charts, and graphs, providing a clear and concise view of the monitored systems' performance and health.
In this part of the lab, you'll be setting up Prometheus - a time-series database made for primarily querying and storing metrics - and an exporter, called node_exporter
, which Prometheus can query for operating system level information. Then, as Kubernetes publishes several types of metrics natively in Prometheus format, you'll query those metrics to give better visibility into your managed cluster and applications.
node_exporter
¶
First you'll be setting up the node_exporter
tool to become familiar with the general methodology, and also the format of metrics used by Prometheus. Node Exporter is a widely used, open source monitoring tool that collects hardware and operating system metrics from Linux-based systems.
Developed as part of the Prometheus project, Node Exporter exposes these metrics in a format that can be easily scraped and ingested by a Prometheus server.
You'll be using a resource type called DaemonSet
, which is very similar to Deployment
- the difference is in the fact, that while with Deployment
, you can choose how many replicas are going to run run, a DaemonSet
always runs one replica on each node of the cluster. This is perfect for distributing monitoring tools across the cluster, as monitoring tools usually require a single agent. Additionally, when you add another node to the cluster, DaemonSet
will cause the monitoring agent to be scheduled on the node automatically, reducing management overhead.
Complete
Setup the node_exporter
as a DaemonSet
resource in Kubernetes. Create a new namespace called monitoring
for this task, and deploy this manifest there.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter:latest
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/host/root"
- "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)"
ports:
- containerPort: 9100
name: http-metrics
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
A few things to note from this manifest:
- It exposes a containerPort 9100.
- It mounts several filesystems /proc, /sys and even just /.
- Even though it does so in read-only mode, this has serious security implications, if you do not trust the software, as it can read the critical system locations
/etc/passwd
,/etc/shadow
,/etc/kubernetes
. - Sadly monitoring solutions need access to most system level locations to gather the information to report on.
- Process info comes from
/proc
. - Mount info comes from
/etc/fstab
,/proc/mounts
. - and so on.
- Process info comes from
Verify
You should be able to see the node_exporter
containers starting up fairly soon. When you query the port using port forwarding, or with curl
from inside the cluster, you'll get a response which directs you to /metrics
instead.
If you now query /metrics
, you'll see a format like this:
...
# HELP node_filefd_allocated File descriptor statistics: allocated.
# TYPE node_filefd_allocated gauge
node_filefd_allocated 2048
# HELP node_filefd_maximum File descriptor statistics: maximum.
# TYPE node_filefd_maximum gauge
node_filefd_maximum 9.223372036854776e+18
# HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/sda1",fstype="xfs",mountpoint="/"} 4.5791588352e+10
node_filesystem_avail_bytes{device="none",fstype="ramfs",mountpoint="/run/credentials/systemd-sysctl.service"} 0
...
This is the OpenMetrics format, which Prometheus and it's exporters use for describing metrics. The format short description:
<metric name>{<label_key>=<label_value>} <metric_value>
The result is a list of metrics at the time of query. For example, it might be the response contains node_cpu_seconds_total
metric with labels cpu="0"
and mode="idle"
, with the metric value 2.19401742e+06
.
Additionally, there's four important elements to the metric:
- The
HELP
comment, that provides a human-readable description of the metric which follows (is lower in the list). - The
TYPE
comment, this specifies the metric's type, such asgauge
,counter
,histogram
. It is mainly useful during querying and visualizing the metrics, as different types need to be visualized differently. - For example,
gauge
is a value that can go up or down, like temperature. You can use this for visualization without extra elements. counter
, on the other hand, is a metric that either grows forever or can be reset to zero. They are often used to represent cumulative metrics, like total number of bytes transferred. As this metric can get into billions or trillions very easily, it's not human readable without some tricks.- One of the easiest tricks is to, instead of using the value itself, use it's rate of change. For example, how much has the bytes transferred metric changed in the past minute (derivative). This is usually much more human readable.
- The metric itself, in the format of
metric_name numeric_value
. - Some systems are more complicated, and the usable monitoring requires more context - for that you can add labels to every metric. These are key-value pairs enclosed in curly braces, and they provide additional dimensions for the metric.
- In case of
node_filesystem_avail_bytes{device="/dev/sda1",fstype="xfs",mountpoint="/"} 4.5791588352e+10
-device
,fstype
andmountpoint
are metrics. This allows you to see the available bytes based on specific mount points and filesystem types, not only as an aggregate over the system.
You'll also see, that there's thousands of lines of these metrics. This is normal - the more complicated the system, the more metrics there is going to be. It might also seem like a tremendous amount of information, but it's actually not, at least not for computers:
$ curl 10.0.1.76:9100/metrics | wc --bytes # (1)
129519
$ echo "129519.0/1024.0" | bc -l # (2)
126.48339843750000000000
- Count the bytes in the response.
- Convert the bytes into kilobytes.
This is less than the first response by google.com
, not to mention any other websites with actual graphics.
Prometheus¶
Now that you have an exporter in place, setup a system to query that exporter, and provide a way to store and aggregate this data.
Prometheus is an open source monitoring and alerting toolkit designed for reliability and scalability, primarily used for monitoring containerized and distributed systems. Developed at SoundCloud and now a part of the Cloud Native Computing Foundation (CNCF), Prometheus has become a widely adopted monitoring solution in modern cloud-native environments. Its powerful query language, data model, and efficient storage design make it an ideal choice for organizations seeking to monitor their infrastructure, applications, and services.
Prometheus operates on a pull-based model, where it actively scrapes metrics from target systems or applications at regular intervals. This approach simplifies the overall architecture and makes it easier to scale and manage. You'll be configuring Prometheus to query the Node Exporter every 15 seconds.
Complete
Setting up Prometheus is a bit more work, as we also want to give it a configuration file. We will be doing this with a ConfigMap
resource, which gets mounted to the prometheus Pod
in a place where Prometheus software expects it.
Also, we will be using a StatefulSet
resource here, instead of a Deployment
or DaemonSet
. It provides a better way of running stateful applications, as Kubernetes is less likely to try to move them away, and gives the resources handled by StatefulSet
better ordering and uniqueness guarantees.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data: # (1)
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-node-exporter'
static_configs:
- targets: ["10.0.1.72:9100"]
- targets: ["10.0.0.139:9100"]
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
serviceName: prometheus
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
initContainers: # (2)
- name: prometheus-data-permission-setup
image: busybox
command: ["/bin/chown","-R","65534:65534","/data"]
volumeMounts:
- name: data
mountPath: /data
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/data"
- "--web.enable-lifecycle"
ports:
- containerPort: 9090
name: http-metrics
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: data
mountPath: /data
volumes:
- name: config
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
- Change the IPs to be the cluster-internal IP addresses of your
node-exporter
pods. You'll deal with the problems this will cause a bit later. - Init containers are special types of containers, that get run before the main container runs. It's often used to do operations before the main container can start, like in this case, where we set correct permissions for Longhorn
/data
volume.
Things to note here:
- We will be using volumeClaimTemplates, which automatically generates a PVC for us.
- The container exposes the port 9090.
- We create a ConfigMap resource, which will get mounted to the Pod at the path /etc/prometheus.
- Data will be kept by Prometheus on the persistent volume, mounted to /data in the Pod.
- If the
node-exporter
pods get deleted, then this configuration currently goes out of sync.
Before we can deal with the node-exporter
issue, create a NodePort
service for Prometheus, so you can access it.
Verify
You can make sure everything works properly by either:
- Going to Status tab -> Targets. You should have one endpoint listed there, with the state UP.
- Going to the Graph tab, and inserting a PromQL query up{} into the expression input
- On pressing Enter or Execute, you should receive a result.
The second option is what querying Prometheus looks like. By using their own query language called PromQL, you can query the different metrics.
If you want to play around with more complicated queries, here's a few examples: Awesome Prometheus
Now, back to the issue of non-dynamic node-exporter
targets. Your first reaction might be to just create a service for node-exporter
pods, and use that address in the scrape target configuration of Kubernetes.
The problem with this is, remember, that service forwards the request to a random backend pod, but makes no guarantees about which one. So you can have a situation, when the next 15 requests go to only the pod on the worker node, completely removing control plane node from visibility.
As powerful as Kubernetes might be, there's no way to easily solve this with Kubernetes' own tools - Kubernetes does not provide a resource that automatically allows for that. Thankfully, Kubernetes and Prometheus allow for this using another methodology.
Prometheus allows querying Kubernetes API to automatically find specific resources. This is called service discovery (SD). Doing this requires creating a Kubernetes service account, mounting it to the Prometheus container, giving the service account appropriate permissions, and using it to query Kubernetes API.
Complete
---
apiVersion: v1
kind: ServiceAccount # (1)
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole # (2)
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/metrics
- nodes/proxy
- nodes/stats
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding # (3)
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data: # (4)
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-node-exporter'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: node-exporter
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
serviceName: prometheus
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus # (5)
initContainers:
- name: prometheus-data-permission-setup
image: busybox
command: ["/bin/chown","-R","65534:65534","/data"]
volumeMounts:
- name: data
mountPath: /data
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/data"
- "--web.enable-lifecycle"
ports:
- containerPort: 9090
name: http-metrics
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: data
mountPath: /data
volumes:
- name: config
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
- The
ServiceAccount
resource creates a service account with no permissions. - The
ClusterRole
resource creates a role that is allowed to query the Kubernetes API for the list and description of nodes, pods, services, nodes and endpoints in the whole cluster. - The
ClusterRoleBinding
resource gives theClusterRole
membership to specific subjects, in this case ourServiceAccount
. - The
kubernetes_sd_configs
does the magic here, and is configured to go over the list of all the pods in the cluster. Therelabel_config
part configures Prometheus to keep the scrape data only if theapp
label of the pods isnode-exporter
. - This option mounts the
ServiceAccount
secret to the pods, so applications running inside thePod
could query the Kubernetes API.
Verify
After the pod restarts, you can check whether everything works from the Status
-> Targets
and Status
-> Service Discovery
Prometheus UI pages. The Service discovery
page lists all resources that are found, together with all the information that Prometheus gets about the service.
Visualizations with Grafana¶
While you can already use Prometheus to do some basic queries, using it is not very pleasant - you need to go to a page, type in the query, and so on. This would be a nightmare to use in a team, as you can't expect everyone to know the queries. Also, this methodology does not allow for incremental improvements by team members.
This is why, usually, monitoring systems come with some way to build visualisations, dashboards, graphs, and diagrams, as humans are way better at seeing and correlating patterns in images, than in textual representation of numbers.
Grafana¶
Grafana is an open source, feature-rich visualization and analytics platform used for monitoring and observability. It provides an intuitive and customizable interface for visualizing time-series data from various data sources, such as Prometheus, InfluxDB, Elasticsearch, and many more. Grafana's popularity stems from its flexibility, ease of use, and extensive plugin ecosystem.
You'll be using Grafana as a visualisation frontend to Prometheus, and later, Loki, mainly because Grafana is so widely used, feature rich and open source.
Complete
Deploy Grafana to a namespace called grafana
.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 1Gi # adjust size as needed
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
initContainers:
- name: grafana-data-permission-setup
image: busybox
command: ["/bin/chown","-R","472:472","/var/lib/grafana"]
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
containers:
- name: grafana
image: grafana/grafana:latest
env:
- name: GF_SECURITY_ADMIN_USER
value: "admin"
- name: GF_SECURITY_ADMIN_PASSWORD
value: "password" # (1)
ports:
- name: http
containerPort: 3000
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 30
timeoutSeconds: 5
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
---
- Make sure to set this password to something else, but be careful - it's not secret. Anyone with access to Kubernetes can see it.
Concerning the password, sometimes in Kubernetes ecosystem you see deployments that set passwords this way, even though it would be fairly simple to convert it to a Secret
based approach. Here we use it this way just to keep the manifest as short as possible, and we will make it perfect when we use operators to deploy this system in a later lab.
Make sure to expose the Grafana on port 30000 using NodePort
.
Complete
You can verify this service by going to the exposed port with your browser. You should be greeted with a graphical user interface, asking you to login. This is where the environment variables come into play, that we setup in the manifest. Use those to login.
Once you have logged into the system, you should have a fairly empty Grafana with no visualisations or dashboards. Fix this issue by importing a dashboard to view your metrics.
Add a Prometheus data source
In the left burger menu (three horizontal stripes), go to Connections -> Data Sources, and click on the + Add new data source button in the top left.
From the list of possible data sources, we now need to choose Prometheus, and click on it. In the new window, we need to only fill in the URL field, with the cluster-internal DNS for Prometheus. The format is <service_name>.<namespace>.svc:<service_port>
.
In the bottom of the page, there's a Save & Test button. Upon clicking it, you should get a message that says everything succeeded.
Import a dashboard
In the left burger menu (three horizontal stripes), click on Dashboards. A new view is available, with a blue New button on the right. Click on New -> Import.
There should be a text input box asking for Grafana.com dashboard URL or ID. Insert the number 1860
there. This number corresponds to this dashboard from Grafana's page (check the lecture slides).
You can now click load. In the next screen, you need to fill in the Prometheus data source, and then you can click import.
After clicking import, you should have a nice screen with a lot of metrics visible to you. If something is broken, these metrics are going to be empty, in which case it's a good idea to ask teachers for help.
You can now browse the dashboard, and see the different node specific metrics about the different nodes in the cluster.
Kubernetes specific metrics¶
In this section, you'll be adding some metrics which allow obtaining information about Kubernetes as well. With microservices based architectures, like Kubernetes is and is mainly meant to accommodate, good monitoring is of upmost importance, and often the only way to receive a full picture of the cluster.
Kubernetes state metrics¶
kube-state-metrics
is a service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects like Deployments, Pods, and Services.
For example, it can tell you the number of desired replicas for a Deployment, the current CPU requests or limits for a Pod, or even the number of nodes in specific conditions. These metrics are often essential for scaling, monitoring, and alerting purposes.
It exposes these metrics natively in the Prometheus format, making integration with Prometheus fairly simple. The only problem is that because it tries to give information about the whole state of the cluster, it needs to have permissions to the whole cluster.
The repository for it is located here in GitHub.
Complete
Deploy the kube-state-metrics
service from the repository, from examples/standard
folder. You will need to deploy all the YAML
files (except kustomization.yaml) there. As per usual, security wise, you should understand why each file is deployed and what they do.
cluster-role.yaml
: Defines the permissions that kube-state-metrics needs to query the Kubernetes API server. Specifies what kinds of resources and actions (like get, list, watch) are permitted.cluster-role-binding.yaml
: Binds the ClusterRole to a specific ServiceAccount, granting those permissions to any pod running under that ServiceAccount.service-account.yaml
: Creates a ServiceAccount under which the kube-state-metrics pod will run, providing a way to control API access.deployment.yaml
: Specifies the desired state for the kube-state-metrics Deployment.service.yaml
: Creates a Kubernetes service to expose the kube-state-metrics Deployment for internal access.
The kube-state-metrics
is deployed to the kube-system
namespace, as it's security requirement is quite high. Deploying it into system-level components namespace keeps things understandable, as everyone knows that all resources there require highest level of security.
Verify
If everything is working properly, you should be able to query the kube-state-metrics
service with curl inside the cluster, or visit it using kubectl port-forward
from the browser. It should give you a long list of metrics, some of which are:
kube_pod_status_phase
: Indicates the lifecycle phase of a pod (for example, Running, Pending, Failed). This is critical for understanding the state of workloads in your cluster.kube_node_status_condition
: Indicates various conditions of a node like Ready, OutOfDisk, MemoryPressure, and so on. Essential for understanding the health of the nodes in the cluster.
These metrics are now ready to be gathered by Prometheus, and used for visualization, and in production clusters, alerting.
Now that you have the metrics, you need to also configure Prometheus to query this exporter.
Complete
Change the Prometheus ConfigMap
appropriately.
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-node-exporter'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: node-exporter
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: kubernetes_pod_label_app
- job_name: 'kubernetes-kube-state-metrics'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
target_label: kubernetes_pod_label_app
When you make any changes to Prometheus configuration, you need to either submit HTTP POST request to Prometheus /-/reload
endpoint, or restart the container. Our simple version of Prometheus cannot detect configuration file changes.
Verify
Verify whether Prometheus can access the metrics by checking the list of targets in the UI.
You can use Grafana dashboard 13332
to visualize the scraped data easily.
If the grafana dashboard shows no information, but the targets show up in Prometheus, then make sure that the name of the data source (when importing the Dashboard) matches precisely (including letter capitalization) with the name of the previously created datasource. It is possible the dashboard import form assigns a wrong name with extra Capital letters.
Now you have a monitoring system that shows metrics about all the workloads managed by Kubernetes. This is the basis of proper monitoring for Kubernetes - in a production Kubernetes cluster metrics like the state of nodes are of crucial value, and you can build whole monitoring systems based on these.
Adding useful labels¶
When you start deep-diving into the metrics, one of the more annoying things you're going to notice, is that the metrics do not have good labels with them. For example, the Host
dropdown in the Node Exporter dashboard has the internal IP addresses and ports of the node-exporter
pods.
This is not very useful, as you cannot directly correlate which IP is about which node. Additionally, the IP changes on pod restart, meaning you cannot have a static graph about a node.
Thankfully, Prometheus relabel_configs
, which you have already used to add the kubernetes_pod_label_app
label, can also be used to fix this problem. Specifically, if you check the service discovery
page in UI, it shows all the fields and their values that Prometheus scrapes from Kubernetes.
One such field is __meta_kubernetes_pod_node_name
.
Complete
Relabel the __meta_kubernetes_pod_node_name
label into a Prometheus metric label instance
for the node-exporter
scrapes.
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: instance
Add this part to the correct place and reload/restart the Prometheus again.
Verify
When you go to the Node Exporter dashboard, the Host
should contain correct node names now. If the dashboards don't cooperate, reduce time from 24 hours to 15 minutes.
cAdvisor¶
cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of running containers. Integrated into the Kubelet
agent in every Kubernetes node, cAdvisor collects and exposes metrics about CPU, memory, network, and disk usage for all containers running on that node. These metrics are useful for monitoring and performance tuning.
You might be asking what is the difference between node-exporter
and cAdvisor - both of them show CPU, memory, network, and so on. The problem is, that node-exporter
shows these metrics aggregated over the whole host. With cAdvisor, you can see which pod is using how much CPU at a point in time, which is immensely useful for debugging, but also for billing.
Kubernetes nodes expose cAdvisor metrics on the path <node_name>:<node_port>/metrics/cadvisor
. This endpoint requires a valid TLS certificate, and also uses HTTPS. The only secure method is to both use the Kubernetes CA to validate the certificate, and some kind of service account or KUBECONFIG
to authenticate against the endpoint. Thankfully, Kubernetes makes this rather simple.
Complete
Add this configuration to the scrape_configs
part in the Prometheus ConfigMap
.
- job_name: 'kubernetes/cadvisor'
scheme: https # (1)
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # (2)
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # (3)
kubernetes_sd_configs:
- role: node # (4)
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443 # (5)
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor # (6)
- source_labels: [__meta_kubernetes_node_name]
target_label: kubernetes_node_name
- source_labels: [pod]
target_label: pod_name
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+) # (7)
- Use HTTPS to query the cAdvisor endpoint.
- Use the token from the service account which we made to use service discovery.
- Validate the HTTPS connection with the Kubernetes CA, which also gets mounted to the pod when you use a service account.
- Use node service discovery, meaning that you'll get a list of nodes instead of pods.
- Query the Kubernetes API endpoint, instead of going to the address of the node.
- Use Kubernetes API proxying to proxy the connection to the
Kubelet
node endpoint. This is just to keep the traffic inside the cluster. - Convert all labels from SD
__meta_kubernetes_node_label_*
to Prometheus labels.
This configuration is a bit more difficult, but already takes care of very many issues that you'd get, trying setup Prometheus to scrape Kubelet
manually.
Verify
As with other metrics, you can verify this by going to the UI, and checking under targets. The targets should be there, with state "UP". If it's in any other state, then there's issues with the permissions or certificates.
You can use the dashboard 15398
to look into these metrics. While this dashboard ties together with other metrics, the last two sections Workload
and Container CPU / Memory / Filesystem
show the metrics from cAdvisor.
Centralized logging¶
Centralized logging is the practice of aggregating logs from multiple sources, such as applications, services, and infrastructure components, into a single, centralized location. A logging solution deployed in a distributed fashion prevents from operators losing their visibility into the systems from a single failure in a monitoring solution.
Newer logging solutions also allow for ways to aggregate logs with other information from their systems, like metrics. This kind of centralized logging aggregated with other systems also simplifies the troubleshooting process by providing a single, unified view of log data, reducing the time spent searching for relevant information across disparate systems. You'll be brushing past this very quickly, but if you want, you can play around inside the Grafana once you have both Prometheus and Loki connected with it.
Loki¶
Loki is an open source, horizontally scalable log aggregation system developed by Grafana Labs. Inspired by Prometheus, Loki is designed to be cost-effective, easy to operate, and efficient in handling large volumes of log data. It provides a simple yet powerful solution for centralizing logs from various sources, such as applications, services, and infrastructure components, making it easier to search, analyze, and visualize log data. Loki integrates seamlessly with the Grafana visualization platform, enabling users to explore logs alongside metrics for a comprehensive view of their systems.
The need for a tool like Loki arises from the challenges posed by modern, distributed environments, where logs are generated by numerous components running across multiple nodes. Traditional log management solutions can struggle to cope with the volume and complexity of log data produced by these systems, leading to increased operational overhead, storage costs, and difficulty in extracting meaningful insights. Loki addresses these challenges with a unique approach that indexes only the metadata of log data (for example labels and timestamps), rather than the log content itself. This results in a more efficient and cost-effective storage solution, while still providing fast and accurate log querying capabilities.
Even though Loki is designed to be simple to deploy and manage, it's not. It is simpler than other such distributed logging solutions, but it is definitely not simple. For simplicity sake, you'll be deploying this application in a single-binary fashion - as a monolithic application, and you'll take no benefit from it being a distributed logging solution, but it's a good stepping stone, and way easier to do if you have general knowledge of the system.
As logging systems usually follow the push methodology, where an agent pushes logs to the central system, then you need to start from setting up the central system.
Setup Loki
Deploy Loki in a new namespace called loki
. You'll also need to expose it via NodePort
service - dedicate the host port 30310 for this.
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: loki
data:
loki.yaml: |
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /data/loki
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 15m
chunk_retain_period: 5m
wal:
dir: /data/loki/wal
schema_config:
configs:
- from: 2024-01-01
schema: v13
store: tsdb
object_store: filesystem
index:
period: 24h
prefix: kubernetes_
storage_config:
filesystem:
directory: /data/loki/chunks
tsdb_shipper:
active_index_directory: /data/loki/index
cache_location: /data/loki/index_cache
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
table_manager:
retention_deletes_enabled: true
retention_period: 168h
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: loki
spec:
serviceName: loki
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
initContainers:
- name: init-chown-data
image: busybox:1.33.1
command: ["sh", "-c", "chown -R 10001:10001 /data/loki"]
volumeMounts:
- name: data
mountPath: /data/loki
containers:
- name: loki
image: grafana/loki:latest
args:
- "-config.file=/etc/loki/loki.yaml"
ports:
- containerPort: 3100
name: http-metrics
volumeMounts:
- name: config
mountPath: /etc/loki
- name: data
mountPath: /data/loki
volumes:
- name: config
configMap:
name: loki-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
The deployment in similar to Prometheus, but the configuration is much more complicated. The main things to remember from here are, that we save data onto a Longhorn volume, and keep the data for a week, to prevent issues with disk storage. Other settings you'll learn about when they become relevant.
Verify
You can verify whether Loki started up by querying it over the exposed port with curl
, by doing curl <hostname>:<port>/ready
. By Loki API specification, it should answer it with ready
in a bit of time.
Make sure to give it a minute or two the first time it starts up - it needs time to initialise Longhorn disks and its internal processes.
Once you have the central system in place, you can continue with installing the log exporter called promtail
.
Setup Promtail¶
Promtail is an open source log collection agent developed by Grafana Labs, specifically designed to integrate with the Loki log aggregation system. As a crucial component of the Loki ecosystem, Promtail is responsible for gathering log data from various sources, such as files, systemd/journald, or syslog, and forwarding it to a Loki instance.
First, while keeping it simple, you'll just send the container logs from /var/log/pods/*/*/*.log
on Kubernetes hosts to Loki using Promtail. When combined with Kubernetes service discovery, it already becomes fairly powerful.
Complete
Setup Promtail in the loki
namespace. Also, configure it to use Kubernetes service discovery, so that the logs would be useful and have Kubernetes related contexts (namespaces, pod names, and so on.)
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: promtail-service-account
namespace: loki
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: promtail-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: promtail-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: promtail-role
subjects:
- kind: ServiceAccount
name: promtail-service-account
namespace: loki
---
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: loki
data:
promtail.yaml: |
server:
http_listen_port: 3101
positions:
filename: "/tmp/positions.yaml"
clients:
- url: http://loki-svc.loki.svc.cluster.local:3100/loki/api/v1/push
scrape_configs:
- job_name: pod-logs
kubernetes_sd_configs:
- role: pod
pipeline_stages:
- cri: {}
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: $1
separator: /
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: /var/log/pods/*$1/$2/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail
namespace: loki
spec:
selector:
matchLabels:
name: promtail
template:
metadata:
labels:
name: promtail
spec:
serviceAccountName: promtail-service-account
containers:
- name: promtail
image: grafana/promtail:latest
ports:
- name: http-metrics
containerPort: 3101
protocol: TCP
args:
- "-config.file=/etc/promtail/promtail.yaml"
env:
- name: 'HOSTNAME'
valueFrom:
fieldRef:
fieldPath: 'spec.nodeName'
volumeMounts:
- name: config
mountPath: /etc/promtail
- name: varlog
mountPath: /var/log
volumes:
- name: config
configMap:
name: promtail-config
- name: varlog
hostPath:
path: /var/log
Keep in mind, you might need to change the clients URL in the ConfigMap
, depending on which service name you used for loki
.
Verify
You can check whether everything works by checking the Promtail UI at their port 3101. Similarly to Prometheus, you can see the service discovery and targets information.
Grafana¶
As the last thing, also make these logs visible in the Grafana, by adding Loki as a data source and importing a new dashboard.
Add a Loki data source and dashboard
In the left burger menu (three horizontal stripes), go to Connections -> Data Sources, and click on the + Add new data source button in the top left. From the list of possible data sources, we now need to choose Loki, and click on it.
In the new window, we need to only fill in the URL field, with the cluster-internal DNS for Loki. The format was <service_name>.<namespace>.svc:<container_port>
.
In the bottom of the page, there's a Save & Test button. Upon clicking it, you should get a message that says everything succeeded.
To check the logs, either use the Explore
page in Grafana, or import a dashboard with ID 13639
.
You are now free to play around with your newfound monitoring capability.
Danger
In the future labs, the automated monitoring system might rely on your monitoring to detect workloads or pods, or to check logs.
Teachers will also ask you to add or improve logging, as visibility in micro-services architecture needs to go hand-in-hand with the workloads themselves.
Instrumenting your application¶
One of the more useful applications of observability is getting runtime information from your own developed software. The Prometheus ecosystem makes it easy by publishing client libraries, that can easily be used to publish Prometheus metrics from applications on a single port: Prometheus client libraries
Complete
Your task is to extend your Application Server to expose Prometheus metrics. You can use the Prometheus client libraries, or build your own.
Your application should be able to expose these two metrics:
application_server_start_time
- a metric, that gives a Unix timestamp of when the application started up. This metric gets changed once, when the pod running the application starts. It should also have two labels:version
- this should be filled by a unique identifier about the version of the application. Could be used to detect different versions of applications running.container
- this should be filled by using the Kubernetes Downward API and show the name (metadata.name
field) of the container which was queried for metrics.
application_server_requests_total
- a metric, that tracks the count of requests to your Application Server. Each request that your application handles, should increment this metric by one. It should also have the labels from before, but also one unique one:status_code
- the HTTP status code for the response of the request (200 for OK and so on). This would be used to track how your application handles requests.
The port and path which your application exposes the metrics are up to you to decide, but you will need to configure Prometheus to scrape these metrics.
Example of these metrics:
# HELP application_server_start_time Unix timestamp of when the application started
# TYPE application_server_start_time gauge
application_server_start_time{version="v1.2.3",container="application-server-1"} 1699113600
# HELP application_server_requests_total Count of requests to the Application Server
# TYPE application_server_requests_total counter
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="200"} 1500
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="404"} 35
application_server_requests_total{version="v1.2.3",container="application-server-1",status_code="500"} 7
You can add more metrics which help you observe your applications, but these two need to exist. Scoring will check whether they exist by querying Prometheus.
To configure Prometheus to scrape your metrics, you need to change the ConfigMap to make Prometheus aware of your application, similarly how it was done for node_exporter
and kube-state-metrics
. Keep in mind the relabel config which has action: keep
- it is an important part of Prometheus ingesting the results.
Info
If you get stuck with implementing prometheus metrics in your application server code, here are a few suggestions:
- Use Prometheus client Gauge class for creating a server start time metric when your application is started
- Use Counter class for creating the request counter metric. Incement it every time your code returns a response.
- Create a new API endpoint
/metrics
and HTTP GET method implementation in your server. The prometheus client methodgenerate_latest()
can be used to automatically generate the full string of metrics that this method should return as a HTTP response (use'text/plain; charset=utf-8'
as mime type).
Verify
You can verify whether your application exposes these metrics by accessing the metrics port (similar to how you accessed node_exporter
). As it's text information, you should nicely see the output similar to the example.
In Prometheus, you can just query these labels in the Explore view. These labels should be unique to the Prometheus, so if no results are given, Prometheus is unable to query the metrics. In this case, open Prometheus UI and see what's up.