Dan Walsh wrote a blog a couple of years ago on running containers in read-only mode. He stressed that when you run containers in production, you really do not want the processes inside of the container to be able modify the content from the image. When you are building your containers, you need to modify the images, but not when you are running the apps.

A couple of things you need to understand about containers. Containers are processes running on a linux system and they usually run on top of a rootfs, a directory on the system that looks like / on a linux system. This directory is usually provided by a container image, pulled from a container registry like Quay.io or Docker Hub. Most container runtimes allow the container processes to write content to all directories in this container image using a Copy on Write (COW) file system. (OverlayFS, devicemapper,...). Volumes are usually bind mounts from other directories outside of the rootfs on the the file system. In Read-Only mode the container processes are only able to write to the volumes mounted into the rootfs and to tmpfs file systems mounted on the rootfs. This means in read-only mode a container process would be blocked from writing to /usr, but would be allowed to write to /tmp, /var/tmp and /run since these are tmpfs file systems mounted on the rootfs. The container processes would also be allowed to write to /var/log/myapp.log if /var/log was a volume.

CRI-O

CRI-O is the dedicated container runtime for Kubernetes workloads, replacing the Docker daemon. One of our goals with CRI-O is to make it the most secure way to run OCI containers. We wanted to make running containers in read-only mode by default a feature of CRI-O. By running containers in read-only mode, we remove an attack vector, stymieing attackers who want to inject exploits into a container. As an added bonus, we might just prevent data loss as well.

The development process of an application includes building and updating it with dependencies and code particular to it. Once developers are done with this process, the application is handed off to the Quality Engineers for testing and then eventually into production. When this happens, the developers expect the content of the application to be treated as read-only and be immutable.

From a security point of view, this is a great feature. Imagine you are running a containerized application which gets hacked. Hackers often want to put a back door in place, such that if the application gets started a second time, the application will already be running the hackers code. Running your containers in read-only mode prevents the hacker from modifying the application since /usr of the container is immutable. The hacker can’t write an exploit into the application. In the case of script kiddies the lack of places to write and execute code could block the hack altogether.

Other container runtimes, currently combine the development process of building container images and running container in production into the same runtime. We believe that these two systems should be separate, and your development environment should run with different privileges than your production environment.

The CRI-O daemon now supports this feature with the “--read-only” flag, which forces all unprivileged containers to run in read-only mode. This means that the process within a container cannot write to just any path in the container, it can only write to external volumes and the following tmpfs mounts - “/tmp”, “/var/tmp”, “/dev/shm”, and “/run”.

You can start up CRI-O in read-only mode, like this:

# crio --log-level debug --log my.log --read-only

A simple test is trying a dnf install in a Fedora container. The process would fail as dnf install expects to write logs to “/var/log/dnf.log”

Snippet of the container config:

{
"metadata": {
"name": "podsandbox1-fedora"
},
"image": {
"image": "registry.fedoraproject.org/fedora"
},
"args": [
"/bin/bash"
],
"working_dir": "/",

}

Create and run the container:


# POD_ID=$(crictl runp test/testdata/sandbox_config.json)
# CTR=$(crictl create $POD_ID test/testdata/container_fedora.json test/testdata/sandbox_config.json)
# crictl start $CTR
1729127d10bee8c4bd7d920988bbddf364469c59b6876ffe8c50d4dd22c3f46a
# crictl exec -i -t $CTR sh
sh-4.4# dnf install nc
dnf install nc
Config error: Read-only file system: '/var/log/dnf.log'

But now, if I start up CRI-O without the read-only flag, when I do a dnf install inside the container, it will be successful. Make sure to clear out the old pod and create a new one before doing this.

Start up CRI-O:

# crio --log-level debug --log my.log

# CTR=$(crictl create $POD_ID test/testdata/container_redis.json test/testdata/sandbox_config.json)
# crictl start $CTR
6ebeb39a4472c298b4a6cc01b85201a71b0cfcd4f0db5e5a83175a32e7f2b24c
# crictl exec -i -t $CTR sh
sh-4.4# dnf install nc
dnf install nc
Fedora 28 - x86_64 - Updates 8.3 MB/s | 10 MB 00:01
Fedora 28 - x86_64 13 MB/s | 60 MB 00:04
Last metadata expiration check: 0:00:02 ago on Thu May 24 17:24:21 2018.
Dependencies resolved.

Installed:
nmap-ncat.x86_64 2:7.60-12.fc28
Complete!

Data Loss

Outside of Security, running CRI-O with --read-only mode also helps you to diagnose problems where a container is accidently writing to the COW image, and valuable data could be lost.

For example, imagine you are running a container that writes logs to /var/log or data to /data, and these directories are in the running containers image, not in a volume mounted into it. When Kubernetes moves a container from one node to another, the container is destroyed and this data is lost. If you ran the same container in --read-only mode, it would fail to write data to /data or /var/log. You would discover this early on in your development process and be able to fix the application or volume mount in the correct data.

Quota

A third reason this is a good idea is if your system is relying on a quota mechanism to control the amount of disk space a user can allocate. Most quota systems ignore the size of the image on the COW filesystem, while just focusing on the volumes mounted into the container. Containers writing to the COW image could get around the quota limits.

Bottom Line

Running containers in production, OpenShift/Kubernetes users should take advantage of CRI-O security features to force all non-privileged containers to run in --read-only mode.