0. Overview of Lab 8:
Welcome to the 8th lab. Here is a short description of stuff we will do during this lab:
- Install and get familiar with Docker
- Run a Docker container
- Write a Dockerfile and build an image from it
- Try different methods of routing traffic to it
1. Introduction to Docker
Docker is a containerisation solution which allows you to separate your application layer from the underlying infrastructure, so developers can deliver, build and manage software without focusing too much on the underlying operating systems or hardware.
Containers are a kind of logical units, that are separated from the underlying OS by an container engine (we will use Docker), that creates an abstraction layer between the two. This is not unlike a Virtual Machine, but the main difference here is the fact that a VM has a full operating system running inside it, while Docker runs directly inside the OS with only necessary bits and a bit of compatibility code.
| |
2. Installing Docker
Installation of Docker means installing the Docker runtime and client packages, setting up the configuration and starting it.
As there is an unfortunate coincidence with Docker and the University network, it is very important that you do not start Docker before you have set up the configuration file.
Create the following file /etc/docker/daemon.json
, and put the following inside of it:
{ "bip": "192.168.67.1/24", "fixed-cidr": "192.168.67.0/24", "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ], "default-address-pools": [ { "base": "192.168.167.1/24", "size": 24 }, { "base": "192.168.168.1/24", "size": 24 }, { "base": "192.168.169.1/24", "size": 24 }, { "base": "192.168.170.1/24", "size": 24 }, { "base": "192.168.171.1/24", "size": 24 }, { "base": "192.168.172.1/24", "size": 24 }, { "base": "192.168.173.1/24", "size": 24 }, { "base": "192.168.174.1/24", "size": 24 } ] }
Then follow this guide to install Docker: https://docs.docker.com/engine/install/centos/. Our recommendation is to use the Repository installation method and just install the latest Docker version you can get from there.
If you did the configuration step appropriately first, then you can start up your Docker service.
First step to test if everything works is to run a Hello World
container.
docker run hello-world
You should get a "Hello from Docker!"
message. If, instead, you get an error like this:
docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit.
Then run this command to run hello world instead:
docker run registry.hpc.ut.ee/mirror/library/hello-world:latest
This uses the same image, just fetched from a different registry. We'll talk about registries a bit later.
You can see some fun information with: docker info
3. Docker images
Another way containers are very different from VMs is that they usually utilize something called images. An image is an static, inert, immutable file that's essentially a snapshot or a template of a container.
Containers are always started from an image. Basically, the image is taken, and a copy of it is started up. This is what we did with the docker run hello-world
command. You can check the existing images in your machine, by doing the command:
docker image ls
Because running a container from an image in no way impacts the image, then you can spin up thousands of containers from the same image, if you want to, and they always start up from the same initial state - the image.
You can see all the containers that are currently running with the command docker ps
. As we did not start any persistent containers (hello-world
only prints out a text and then exits), then it won't show anything. Appending a -a
flag to the command shows all the containers. You should see your hello-world
container there.
Docker images are hosted in something called a registry. Registries are basically customized object storage servers that know how to deal with images. There are even public registries - https://hub.docker.com/ or AWS Public ECR, but they have different policies which sometimes make using them difficult.
This is why we use the University of Tartu cache registry, https://registry.hpc.ut.ee, that pulls and caches images.
If you run your container without specifying a registry, it will use the hub.docker.com
one. As that has a maximum limit of 60 pulls per 4h for the whole University internal network, then it is unlikely that you will be able to pull anything.
This is why, when you are running container by doing this: docker run centos:latest
Do the following instead: docker run registry.hpc.ut.ee/mirror/library/centos
4. Running Docker containers
Running a Docker container to be persistent - to stay up and consistently respond to queries, requires it to be run in detached
mode.
Let's take an example container that just prints the hostname, IP address and a bit more information about the environment when queried over HTTP.
Run the container like so: docker run -d --name whoami registry.hpc.ut.ee/mirror/containous/whoami
After running it, you will get back a long cryptic ID. This is the ID of the running container. Because we specified --name whoami
we can also refer to this container with the name of whoami
.
Checking docker ps
should list you a running container.
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES eb16cf413128 registry.hpc.ut.ee/mirror/containous/whoami "/whoami" 2 days ago Up 2 days 80/tcp whoami
You can see some information from the previous command. The main question now is, how to query it? It has a PORTS 80/tcp
defined.
This means that the container itself listens on port 80, but not the port 80 of your machine's main network interface.
When you set up Docker, it creates a new network interface, usually called docker0
. You can see this network interface with the command ip a
.
We specified it to get the IP address 192.168.67.1/24
in the /etc/docker/daemon.json
file, but it can actually be whichever valid IP address space.
When you start a container, it is given an IP address in that specified range, in our case 192.168.67.1/24
. To see which IP address your container got, check the command docker inspect whoami
. You are interested in the NetworkSettings
section.
"NetworkSettings": { "Bridge": "", "SandboxID": "fa03bbe4998f048e5c2a78cf7aa27dad8f262ddf5dcecf363d899d7a958eb08f", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": { "80/tcp": null }, "SandboxKey": "/var/run/docker/netns/fa03bbe4998f", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "0467b498ba8fe20bcd86f052ef80230744c518ebdccaaefba48e5f472189a59d", "Gateway": "192.168.67.1", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "192.168.67.2", "IPPrefixLen": 24, "IPv6Gateway": "", "MacAddress": "02:42:c0:a8:43:02", "Networks": { "bridge": { "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": "2054a8ccadfe58741bc92b4ca9212e459e43e23b8a36eac9dbc7798fb725240a", "EndpointID": "0467b498ba8fe20bcd86f052ef80230744c518ebdccaaefba48e5f472189a59d", "Gateway": "192.168.67.1", "IPAddress": "192.168.67.2", "IPPrefixLen": 24, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:c0:a8:43:02", "DriverOpts": null } } }
This container got the IP address of 192.168.67.2
. If we now query this IP address, we should get an appropriate response:
$ curl 192.168.67.2 Hostname: 8052649b4dcb IP: 127.0.0.1 IP: 192.168.67.2 RemoteAddr: 192.168.67.1:49636 GET / HTTP/1.1 Host: 192.168.67.2 User-Agent: curl/7.61.1 Accept: */*
Now we have a nice working container, that can be accessed from inside the machine itself. Getting access from the outside world is a bit more complicated, though, but we'll come back to it later.
5. Building Docker images
Before we run containers visible to the outside world, we should learn how we can build, debug and audit containers ourselves. Public containers are a great resource, but if you are not careful, they are also a way for a bad actor to gain access to your machine. You should always know what is running inside your container, otherwise you will open up the possiblity being targeted by several types of attacks, including but not limited by supply chain attacks, attacks against non-updated software, attacks against misconfiguration, etc.
One of the best ways to make sure you know what is happening inside your container, is to build it yourself. Building an image yourself is not black magic - anyone can do it. You need two things - docker build
command and a Dockerfile
, which is basically a code for the image.
The Dockerfile
is a new syntax that allows to put together a working container, which is then snapshotted. That snapshot is the image. The first step to building an image is to choose a base image. Think of a base image like an operating system, or a linux distribution. The only difference is, that a base image might be any image, you could even use the image we used before, but as we are worried about unknown stuff inside our image and container, let's use an image from a trusted source - an image called alpine
. This is a small linux distribution that specializes in making small images. This is a benefit for Docker, as larger containers require much more resources to be run. More information here: https://hub.docker.com/_/alpine
Let's set up our environment from building a Docker image. First, create a folder called docker_lab
(you can put it anywhere).
Inside that folder, create two files, one named Dockerfile
and the second called server.py
.
The logic will be following - we will build the container using the Dockerfile
. Inside that Dockefile
, there are instructions to install python3, python3-pip and to copy our Flask server.py
file inside the container. Also, it is also instructed to expose the container port 5000 on startup, and run the server.py
file.
server.py
:
#!/bin/env python3 from flask import Flask, request, jsonify import socket app = Flask(__name__) @app.route("/") def hello(): response = "Client IP: " + request.remote_addr + "\nHostname: " + socket.gethostname() + "\n" return response, 200 if __name__ == "__main__": app.run(host="0.0.0.0", port=5000)
Dockerfile
:
FROM registry.hpc.ut.ee/mirror/library/alpine RUN apk add --no-cache python3 py3-pip RUN pip3 install flask COPY server.py /opt/server.py EXPOSE 5000 CMD python3 /opt/server.py
After having both of these files inside the same folder, we need to build our container image with the command:
docker build -t docker_lab .
This line means that build a container image with the tag of docker_lab
using the Dockerfile in current directory.
After entering the command, you can see it starts running the commands we specified in order. Every step it runs creates something called a "layer". Every layer is one difference from the last layer, and this is how images are built. This provides some benefits, like being able to reuse layers when you run the build again. On the second run you should see, how for every layer cache is being used, and the command is basically instantaneous.
After the image has been built, the only thing left to do is to run a container using the image we built. Let's run a detached container called docker_lab
from our image with the tag of docker_lab
.
docker run -d --name docker_lab docker_lab
After finding out the IP address of the container, and using curl against the container's exposed port, you should see output similar to the following:
Client IP: 192.168.67.1 Hostname: d1409f26cb5f
You can try deleting the container with docker rm -f docker_lab
, and rerunning it, to see the **Hostname** field change.
The Client IP
field stays the same, as this is the IP the container sees the client query come from, which will always be the Docker IP address if we query from another network.
6. Docker networking
We have now run containers in different ways, but running containers to only be accessible from inside the machine itself is kind of useless, as usually most services need someone to connect to it somehow.
This is why Docker and it's ecosystem supports several ways to publish a container to the network. We are going to introduce three different ways. Two of them this lab, and the last one in the next, Kubernetes, lab.
- Opening a port from host port to container port.
- Using a static proxy to route traffic from public network (in our case the VM's external network) to the container.
- Using a dynamic proxy, but with the autodetection of containers.
Opening a port from the outside world
Before we continue with this part, make sure to open port 5005/tcp
in both firewalld and ETAIS firewall.
The easiest way to publish a container to the network is to just open a port between the host's public interface port and container's port.
To test this out, let's deploy a new container using the same image, but this time with a bit more flags. This time we also need to give the container a different name:
docker run -d -p 5005:5000 docker_lab
After running this, you should be able to access the website on your machine's name and ip, and port 5005.
The problem with this approach is that first, you need to open arbitrary ports, which is not always possible, and second is that users do not really want to use arbitrary ports through their browsers.
Using a static proxy to route traffic from public network to the container.
The same way we proxied an application in the Web lab, we can do the same to the containers.
Let's proxy our own built container through our Apache server.
Before we continue with this part, point a DNS name container-proxy.<vm-name>.sa.cs.ut.ee
to your machine.
Create a virtual host in your web server.
<VirtualHost *:80> ServerName container-proxy.<vm_name>.sa.cs.ut.ee # ServerName sets the name to listen for with requests ErrorLog /var/log/httpd/container-proxy-error_log CustomLog /var/log/httpd/container-proxy-access_log common ForensicLog /var/log/httpd/container-proxy-forensic_log ProxyPreserveHost On ProxyPass / http://<container_ip>:5000/ ProxyPassReverse / http://<container_ip>:5000/ </VirtualHost>
After restarting the webserver, you should be able to access the container on the specified name.
This approach is better than the port opening one, because here you can also specify extra TLS, security configuration and logging on the webserver level.
The problem with this approach is that every time you recreate the container, you need to come and change the IP.
7. Manipulating Docker
In this part of the lab we will go over a few debugging methods. This is not mandatory, but will help you in the next lab.
You can view the logs about a container like so: docker logs <container_name>
This prints out all the information the container has printed into its stdout.
You can actually execute commands inside the container. This only works sometimes, if container has bash
or sh
built into it.
The command looks like this: docker exec -ti <container_name> /bin/bash
OR docker exec -ti <container_name> /bin/sh
If it worked, then you can traverse and use commands inside the container itself. Remember, the changes are not persistent - if you delete the container and then start a new one, it will be a fresh slate.
8. Ansible and Docker
Putting the whole lab into Ansible is not something that would result in a idempotent playbook. So we will list the things that can be automated in an idempotent way, but some part of the labs will require manual intervention by the user. The following suggestions and steps are just guidelines, if you think you can do better, feel free to do so.
- Create
/etc/docker
directory - Copy
daemon.json
file into that directory - Perform necessary steps to install docker on your VM according to this.
- Open internal port
- Create a separate directory for flask app.
- Copy the
server.py
and necessary Dockerfile into it. - Copy the VirtualHost file into appropriate place.
Pay attention here. We don't really suggest starting containers and building them with ansible as that would result in a non-idempotent playbook. But setting up all the necessary files for few manual command is welcome.
9. Keep your playbook safe
As per usual always push your playbook to course's Gitlab.
- In your ansible playbook directory:
git add .
git commit -m "Docker lab"
git push -u origin master
Go to your gitlab page in to see all of your latest push reached the git repository. If you play around with your repository and have made changes to the ansible that you wish to utilize also in the future, always remember to commit and push them.