Lab 2 - Containerization technologies

Introduction¶

Welcome to the second lab. Here is a short description of stuff we will do during this lab:

Install and get familiar with containerd
Run a container with glibc
Write a Dockerfile and build an image from it with Kaniko
Push images to the registry

1. Introduction to containerd¶

Containers are a kind of logical units, that are separated from the underlying OS by an container engine that creates an abstraction layer between the two. This is not unlike a Virtual Machine, but the main difference here is the fact that a VM has a full operating system running inside it, while a container runs directly inside the OS with only necessary bits and a bit of compatibility code.

In this lab we are going to use containerd, which is a lightweight implementation of a container runtime. For a similar lab but using Docker, you are welcome to check System administration / Lab 9.

2. Installing containerd and nerdctl¶

Installation of containerd means installing the runtime and client packages, setting up the configuration and starting it.

We shall use RPM packages for installing containerd in CentOS9.

Note

Note that containerd RPM packages are built and shared via Docker repositories, which also contains other Docker utilities.

Login to your VM over SSH and run the following:

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

This will setup the repositories on your machine.

Next, install start a containerd daemon and start it.

$ sudo dnf install containerd.io
$ sudo systemctl start containerd
$ sudo systemctl enable containerd  # to survive the reboots

And validate that it works:

# Pull the image
$ sudo ctr images pull docker.io/library/redis:alpine
# Run the container in detached mode
$ sudo ctr run -d docker.io/library/redis:alpine redis
# List containers
$ sudo ctr containers list
# Stop the container
$ sudo ctr task kill redis
# Delete the redis container
$ sudo ctr container delete redis

Warning

If you are seeing errors concerning hitting the limits of the Docker Hub, please check workarounds explained in System administration / Lab 9 / Docker images. TL;DR: Use mirror registry.hpc.ut.ee/mirror/library/.

Feel free to play around with ctr command to find out more information about the running service. For the lab it is also easier to setup another command line utility that emulates UX of Docker CLI: nerdctl

$ curl -L https://github.com/containerd/nerdctl/releases/download/v1.7.6/nerdctl-1.7.6-linux-amd64.tar.gz --output nerdctl-1.7.6-linux-amd64.tar.gz
$ sudo tar xvfz nerdctl-1.7.6-linux-amd64.tar.gz -C /usr/sbin/
$ sudo nerdctl ps

3. Network configuration¶

Containers are not very useful if applications running inside them cannot be accessed over the network.

Let's try to create a new basic container with web server answering on local port 80:

$ sudo nerdctl run -d --name nginx -p 80:80 nginx:alpine
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time="2023-09-10T08:56:15-04:00" level=fatal msg="failed to call cni.Setup: plugin type=\"bridge\" failed (add): failed to find plugin \"bridge\" in path [/opt/cni/bin]"
Failed to write to log, write /var/lib/nerdctl/1935db59/containers/default/9e2f4a34f28eab5d242595870909c4f101d48087f0868e496cbddebac7c89880/oci-hook.createRuntime.log: file already closed: unknown

What this error tells us, is that when creating a container, nerdctl wanted to connect it with a CNI plugin bridge, however it couldn't have been found.

CNI plugins are responsible for the networking part of the containers and are crucial building blocks of Kubernetes systems.

In this lab we are going to use a default bridge plugin, but you are very welcome to investigate other ones that are maintained by the container networking team.

You can see details of the CNI plugins here: link

$ sudo dnf install iptables  # bridge plugin relies on the availability of iptables
$ curl -L https://github.com/containernetworking/plugins/releases/download/v1.5.1/cni-plugins-linux-amd64-v1.5.1.tgz --output cni-plugins-linux-amd64-v1.5.1.tgz
$ sudo mkdir -p /opt/cni/bin/
$ sudo tar xvfx cni-plugins-linux-amd64-v1.5.1.tgz -C /opt/cni/bin/

We can now run the container with network:

$ sudo nerdctl run --rm --name nginx -p 80:80 nginx:alpine
# ctrl-c to cancel and remove the container

Info

CNI plugin used when creating a container has also created a number of iptables rules. You can see them by running sudo iptables -L. The logic of the rules requires a better understanding of firewalling and reader is welcome to investigate them on her own.

4. Accessing containers¶

Running a container to be persistent - to stay up and consistently respond to queries, requires it to be run in detached mode. Let's take an example container that just prints the hostname, IP address and a bit more information about the environment when queried over HTTP. Run the container like so: sudo nerdctl run -d --name whoami -p 8001:80 registry.hpc.ut.ee/mirror/containous/whoami

After running it, you will get back a long cryptic ID. This is the ID of the running container. Because we specified --name whoami we can also refer to this container with the name of whoami. Checking sudo nerdctl ps should list you a running container.

$ sudo nerdctl ps
CONTAINER ID    IMAGE                                                 COMMAND      CREATED          STATUS    PORTS                   NAMES
2a1e6ec6a32d    registry.hpc.ut.ee/mirror/containous/whoami:latest    "/whoami"    2 seconds ago    Up        0.0.0.0:8001->80/tcp    whoami

You can see some information from the previous command. The main question now is, how to query it?

There are two options:

Option A: port mapping¶

It has a PORTS 0.0.0.0:8001->80/tcp defined, which hints that a host port 8001 is forwarded to container port 80. So if there is an application working on port 80 inside, it will respond on localhost port 8001.

Warning

By default opening a port from an application on all interfaces of the host is a dangerous operation. For details please check System Administration / Lab 9: Docker networking. As a rule of thumb for local testing try to open ports only on the localhost interface, that is nerdctl run -p 127.0.0.1:8001:80 .... This way you will only be able to access your container from inside your container host.

Option B: direct lookup¶

Lookup IP of the container and submit request directly to the IP.

When you create a container using bridge CNI, it creates a new network interface, usually called nerdctl0. You can see this network interface with the command ip a.

Its configuration is defined in /etc/cni/net.d/nerdctl-bridge.conflist. By default, ipam type (IP address management) is set to local-host, which allocates addresses from a specific range of IPs. Please check also other attributes of the plugin.

  ...
  "ipam": {
    "ranges": [
      [
        {
          "gateway": "10.4.0.1",
          "subnet": "10.4.0.0/24"
        }
      ]
    ],
    "routes": [
      {
        "dst": "0.0.0.0/0"
      }
    ],
    "type": "host-local"
  }
 },
 ...

Info

Format of notation x.x.x.x/xx is known as CIDR notation. Please make sure that you are familiar with it as it is often used for the definitions of network resources.

When you start a container, it is given an IP address in that specified range, in our case 10.4.0.0/24. To see which IP address your container got, check the command sudo nerdctl inspect whoami. You are interested in the NetworkSettings section.

"NetworkSettings": {
    "Ports": {
        "80/tcp": [
            {
                "HostIp": "0.0.0.0",
                "HostPort": "8001"
            }
        ]
    },
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "IPAddress": "10.4.0.14",
    "IPPrefixLen": 24,
    "MacAddress": "2a:8f:e0:2c:a2:15",
    "Networks": {
        "unknown-eth0": {
            "IPAddress": "10.4.0.14",
            "IPPrefixLen": 24,
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "MacAddress": "2a:8f:e0:2c:a2:15"
        }
    }
}

This container got the IP address of 10.4.0.14. If we now query this IP address, we should get an appropriate response:

$ curl 10.4.0.14
Hostname: 2a1e6ec6a32d
IP: 127.0.0.1
IP: ::1
IP: 10.4.0.14
IP: fe80::288f:e0ff:fe2c:a215
RemoteAddr: 10.4.0.1:34052
GET / HTTP/1.1
Host: 10.4.0.14
User-Agent: curl/7.76.1
Accept: */*

Now we have a nice working container, that can be accessed from inside the machine itself. Getting access from the outside world is a bit more complicated, though, but we'll come back to it later.

To make lookup of the IP easier, we can also make parsing faster using jq utility. We are going to use it for evaluation of the lab so please make sure that it is installed in the system. Once it is, you can look up IP using, for example, a command like this:

$ sudo nerdctl inspect whoami | jq -r '.[]."NetworkSettings"."Networks"."unknown-eth0"."IPAddress"'

5. Building container images¶

Before we run containers visible to the outside world, we should learn how we can build, debug and audit containers ourselves. Public containers are a great resource, but if you are not careful, they are also a way for a bad actor to gain access to your machine. You should always know what is running inside your container, otherwise you will open up the possibility being targeted by several types of attacks, including but not limited by supply chain attacks, attacks against non-updated software, attacks against misconfiguration, etc.

One of the best ways to make sure you know what is happening inside your container, is to build it yourself. Building an image yourself is not black magic - anyone can do it. You need two things - a description of what and how to build - Dockerfile - and a tool that can build an image. In this lab we are going to use Kaniko as it is easier to run it in a containerized environment later on, that is in Kubernetes.

The Dockerfile is a new syntax that allows to put together a working container, which is then snapshotted. That snapshot is the image. The first step to building an image is to choose a base image. Think of a base image like an operating system, or a Linux distribution. The only difference is, that a base image might be any image, you could even use the image we used before, but as we are worried about unknown stuff inside our image and container, let's use an image from a trusted source - an image called alpine. This is a small Linux distribution that specializes in making small images. This is a benefit for containers, as larger containers require much more resources to be run. More information here: https://hub.docker.com/_/alpine

Let's set up our environment from building a container image. First, create a folder called dockerfile_lab (you can put it anywhere).

Inside that folder, create two files, one named Dockerfile and the second called server.py.

The logic will be following - we will build the container using the Dockerfile. Inside that Dockefile, there are instructions to install python3, python3-pip and to copy our Flask server.py file inside the container. Also, it is also instructed to expose the container port 5000 on startup, and run the server.py file.

server.py:

import socket

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/")
def hello():
    response = "Client IP: " + request.remote_addr + "\nHostname: " + socket.gethostname() + "\n"
    return response, 200


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Dockerfile:

FROM registry.hpc.ut.ee/mirror/library/alpine

RUN apk add --no-cache python3 py3-pip
RUN pip3 install flask --break-system-packages

COPY server.py /opt/server.py

EXPOSE 5000
CMD python3 /opt/server.py

After having both of these files inside the same folder, we need to build our container image, and to do that we are going to use Kaniko, moreover, we are going to use a container with Kaniko and mount required contents inside.

$ cd dockerfile_lab
$ sudo nerdctl run -v $PWD:/workspace gcr.io/kaniko-project/executor:latest --no-push --tar-path=/workspace/my.server.tar --destination=server:latest

After entering the command, you can see it starts running the commands we specified in order. Every step it runs creates something called a "layer." Every layer is one difference from the last layer, and this is how images are built. This provides some benefits, like being able to reuse layers when you run the build again if you are using caching.

Important to highlight is that default behaviour of kaniko is not just to build but also push the image to the registry. In our first iteration we skip it by providing --no-push parameter.

Please note also -v $PWD:/workspace part. It means that we mount content of the folder where we run the command into a container folder, where kaniko's executor will run and where it will expect to find Dockerfile by default.

Running nerdctl run gcr.io/kaniko-project/executor:latest will show you other options that can be used when building, including options for caching.

After the image has been built, we can find the built image in TAR format in our dockerfile_lab folder. Can you explain why it appeared in that specific folder?

The contents of the TAR archive is metadata of the image along with built layers. You can check what's inside, for example, using tar utility:

$ tar -tvf my.server.tar
-rw-r--r-- 0/0            1352 1969-12-31 19:00 sha256:7b56e51a41c4bcc61bb9a1163e499012584c69792f892bc916a8d02302403334
-rw-r--r-- 0/0         3401613 1969-12-31 19:00 7264a8db6415046d36d16ba98b79778e18accee6ffa71850405994cffa9be7de.tar.gz
-rw-r--r-- 0/0        26325594 1969-12-31 19:00 712004e0392c2b6ccf8b64ede3165a27076bac5e41d85a30f8391f9e2723e28a.tar.gz
-rw-r--r-- 0/0         4803342 1969-12-31 19:00 beaec8bdc3bd766e29d9ba2b3cbdd8fa6be8382b6f95734fe849111d959cb82f.tar.gz
-rw-r--r-- 0/0             568 1969-12-31 19:00 7ec7ded5ceb24ab9fd7927c693580ed1b17cee49ceff77a57c392fd2b54a6256.tar.gz
-rw-r--r-- 0/0             422 1969-12-31 19:00 manifest.json

Try extracting manifest.json from the file and checking its content.

To use it for running the containers, we must first import it:

$ sudo nerdctl load -i my.server.tar

You can check that the image we have built is now available locally:

$ sudo nerdctl image list

Now the only thing left to do is to run a container using the image we built. Let's run a detached container called dockerfile_lab from our image with the tag of server:latest.

$ sudo nerdctl run -d --name dockerfile_lab server:latest

After finding out the IP address of the container, and using curl against the container's exposed port, you should see output similar to the following:

Client IP: 10.4.0.1
Hostname: 7bfb5b81ce38

You can try deleting the container with sudo nerdctl rm -f dockerfile_lab, and rerunning it, to see the Hostname field change. The Client IP field stays the same, as this is the IP the container sees the client query come from, which will always be the IP of the host from where curl request originates.

6. Manipulating containers¶

In this part of the lab we will go over a few debugging methods. This is not mandatory, but will help you in the next lab.

You can view the logs about a container like so: nerdctl logs <container_name>

This prints out all the information the container has printed into its stdout.

You can actually execute commands inside the container. This only works sometimes, if container has bash or sh built into it.

The command looks like this: nerdctl exec -ti <container_name> /bin/bash OR nerdctl exec -ti <container_name> /bin/sh

If it worked, then you can traverse and use commands inside the container itself. Remember, the changes are not persistent - if you delete the container and then start a new one, it will be a fresh slate.

7. Publishing containers¶

When building a container, we most probably want to use it in other places as well. While we can just upload a built TAR archive to an FTP server, a better solution is to use a registry service. In particular, Kubernetes expects that images used by applications are coming from the registries.

There a number of solutions and services that provide registry capabilities, typically along with additional functionalities like image scanning or visual interfaces.

The most popular services for container registries are:

Docker Hub from Mirantis;
Amazon ECR from AWS;
ACR from MS Azure;
Google Container Registry from Google;
GitHub Container Registry from GitHub / MS;
Quay from Red Hat / IBM.

In addition, a number of projects exist that allow to run registry on premise, some of them are:

Harbor - https://goharbor.io/;
Quay - https://www.projectquay.io/;
Artifactory - https://jfrog.com/artifactory/;
GitLab Container Registry - https://docs.gitlab.com/ee/user/packages/container_registry/.

In this lab we shall use a minimal Registry implementation - https://github.com/distribution/distribution. You can find details in its documentation.

Launching registry with basic set of configurations is very easy:

$ sudo nerdctl run -d --restart=always --name registry registry.hpc.ut.ee/mirror/library/registry:2

Once the registry is running, we can try building our image again, but this time pushing directly to the registry. Don't forget to look up the IP of the created container registry.

sudo nerdctl run -v $PWD:/workspace gcr.io/kaniko-project/executor:latest --destination=10.4.0.28:5000/server:latest

If all goes well, you should see output similar to this:

...
INFO[0018] Adding exposed port: 5000/tcp
INFO[0018] CMD python3 /opt/server.py
INFO[0018] Pushing image to 10.4.0.28:5000/server:latest
INFO[0019] Pushed 10.4.0.28:5000/server@sha256:d2f1d1d287e53954edb540a0a83442c582d99a3ee9d35221f37e035de11eabbe

So, where did the image go? We can use nerdctl inspect again on the registry container and this time pay attention to the section Mounts:

    ...
    "Mounts": [
        {
            "Type": "volume",
            "Name": "978cb1f5abba612561e47c75c9c3e2d8c32794bff74e7c2fb3f31002ac6b2d71",
            "Source": "/var/lib/nerdctl/1935db59/volumes/default/978cb1f5abba612561e47c75c9c3e2d8c32794bff74e7c2fb3f31002ac6b2d71/_data",
            "Destination": "/var/lib/registry",
            "Driver": "local",
            "Mode": "",
            "RW": true,
            "Propagation": ""
        }
    ],
    ...

By default, path inside the registry container /var/lib/registry is mounted as a containerd volume with a local mount from the folder on the host machine listed under Source attribute. Go ahead and browse that folder. Can you recognise the SHA256 hashes in that folder?

As a next step, we can use the image that we pushed into the registry when creating a new container. To do that, we do exactly as we do when we are creating a container from a non-Docker Hub registry:

$ nerdctl run 10.4.0.28:5000/server
INFO[0000] trying next host                              error="failed to do request: Head \"https://10.4.0.28:5000/v2/server/manifests/latest\": http: server gave HTTP response to HTTPS client" host="10.4.0.28:5000"
ERRO[0000] server "10.4.0.28:5000" does not seem to support HTTPS  error="failed to resolve reference \"10.4.0.28:5000/server:latest\": failed to do request: Head \"https://10.4.0.28:5000/v2/server/manifests/latest\": http: server gave HTTP response to HTTPS client"
INFO[0000] Hint: you may want to try --insecure-registry to allow plain HTTP (if you are in a trusted network)
FATA[0000] failed to resolve reference "10.4.0.28:5000/server:latest": failed to do request: Head "https://10.4.0.28:5000/v2/server/manifests/latest": http: server gave HTTP response to HTTPS client

It almost worked - the error is caused by the fact that we are using insecure deployment of the registry. You can disable check as is described in the documentation.

8. Lab Use Case development¶

In this lab, we will create a container image for the electricity price fetcher service that we created in the previous lab and run it using containerd. We will configure the service to run as a foreground Cron job, meaning the container will stay running and execute the script once the configured schedule is triggered.

Complete

Modify your electricity price fetcher application, so that the created CSV file is stored in /tmp/pricedata/ folder.

Create a new Dockerfile for your electricity price fetcher application.

You can use the following Dockerfile as a basis, which will launch a foreground Cron job that is configured inside the cronjobs text file:

FROM registry.hpc.ut.ee/mirror/library/alpine

# Install any prerequisite libraries required

# Copy the application files into the container

# Create the data folder for CSV files
RUN mkdir -p /tmp/price-data/

# Copy the cronjobs file for root user
COPY cronjobs /etc/crontabs/root

# Assign ownership of the crontabs file to root user
RUN chown root:root /etc/crontabs/root

# Start crond in foreground, and output logs to stderr
CMD ["crond", "-f", "-d", "8"]

Also prepare the cronjobs file. It should contain the same Cron configuration that you used in the previous lab. For example, something like this:

0 12 * * * /usr/bin/python /root/script.py

Update the Dockerfile with statements that move the application files inside the container, prepare the runtime environment and necessary prerequisite libraries. If you used Python then you can use the same approach than we used as in the Flask Dockerfile example in one the earlier tasks.

Also, feel free to replace the base image from bland Alpine to a different one where the required runtime environment has already been prepared (For example Python or Java). It should also be fine to use docker images directly from Dockerhub instead of registry.hpc.ut.ee.

Complete

Build the container image in the local registry and name it as: price-fetcher-service:latest.

Create a folder inside the Virtual machine at /mnt/price-data/.

Deploy the created image as a container in detached mode (-d) and name it container price-fetcher. Also mount the external folder /mnt/price-data/ to the internal folder /tmp/price-data/ inside the container.

Danger

If you rebuild the docker image and push it to register, do not forget to pull it again from the registry! Otherwise your local image might not be updated.

Verify

Check that you are able to run the Cron command (For example /usr/bin/python /root/script.py) directly inside the container. This will show whether the runtime has been configured properly and everything necessary has been installed inside the container.
Check that the output CSV file has been created successfully.
Check that the Cron job has been configured properly for the root user, by running the following command inside the container:
- crontab -l
Also, check back later that the Cron job has run correctly as scheduled. To test this right away, you can temporarily configure the Cron job to run at a different time.

9. Scoring¶

To pass this lab, make sure that you have a container registry (name must be unique) with loaded image with the name price-fetcher-service:latest. Also, the Cron job running inside the container must have created the CSV file that is accessible at /mnt/price-data/prices.csv inside the VM. The CSV file should contain the original CSV header "Ajatempel (UTC)";"Kuupäev (Eesti aeg)";"NPS Eesti".