How Docker Changed the Way We Develop and Release OpenShift Online

By now you’ve probably heard about Docker, the lightweight container management project. As the Docker website mentions “the same container that a developer builds and tests on a laptop can run at scale, in production, on VMs”, etc. The OpenShift Online engineers and operations team saw this as an opportunity to change the way we develop and release the independent parts of our product.

How did it work before?

The management console (the web-based user interface of OpenShift) and the broker (the REST API) are two self-contained components in the OpenShift stack. Traditionally, in our online deployment, the broker and management console have lived on the same instances, so we have several systems and each one runs both the broker and management console. This keeps the number of maintained systems down and allows a console instance to talk to a broker instance without needing to go across the network, keeping our response times shorter.

However, this kind of deployment architecture does come with its down-sides. Since the two components are sharing a filesystem, any changes to one component and its dependencies may have an effect on the other component. Because of this, releases of the broker and console are not independent and occur on the same schedule so that full regression of both components can be performed on staging servers. You may be asking, what’s so bad about that? The answer: this prevents minor visual updates of the console from being released on a faster cycle. If it’s not an urgent change, it’ll have to wait for the next release cycle.

Additionally, OpenShift engineers work on development instances that not only contain the console and broker, but also act as the node (the part of the backend where user applications live). This means that engineers’ environments are fundamentally different than staging and production environments.

How is Docker changing things?

Consistent images from development to production

Because the console and broker are independent components we were able to package them as separate Docker images. They are first built to run inside an engineer’s development environment as part of the continuous integration process. These same images will then be tested, tagged, promoted to staging, and finally placed into production.

Let’s look at things from the perspective of the management console on a new Docker-based development instance. As far as it knows, it is the only thing running in its environment, and everything installed on its system is to meet its dependencies. It is told what the URL is for its broker and where to log its output.

Now let’s look at things from the perspective of the management console on a new Docker-based production instance. As far as it knows, it is the only thing running in its environment, and everything installed on its system is to meet its dependencies. It is told what the URL is for its broker and where to log its output. Sound familiar? So what was the difference between these environments? Configuration files.

Docker makes local development easier for OpenShift engineers

Traditionally OpenShift development environments have been cloud instances, and engineers sync their code changes out to these instances to make changes. These instances are launched from images that are built several times a day as part of the continuous integration process. This keeps developers from needing to go through a lengthy setup process on their local systems, and reduces the need for developers to keep local pre-reqs up to date.

With docker images engineers can docker pull the images to their local systems and have self-contained environments with all the latest pre-reqs already installed. With the local images there is no longer a need to sync code changes out to a remote system. Instead, developers can run containers with their source mounted in directly, so changes to source are seen immediately inside the container.

You may be thinking, “now wait a minute, you can do that with local VMs…” Well sure, but from past experience there are several annoying things about using local VMs. You have to guess how much hard drive space the VM will need, guess how many CPUs to give it, guess how much RAM to give it, etc. Whatever you do give it, is no longer available to your host system, which is typically a laptop. The images are often very large (many GBs), so pulling these down every day takes a long time and eats network bandwidth (rough for remote developers).

With containers, there is no more guessing, the container uses exactly what it needs. They are typically much smaller in size, usually < 1GB. And with Docker, if the developer wants to, they can build the image themselves locally in minutes using a checked in Dockerfile.

Rollbacks are faster and easier

With docker images we can keep both the previous image and the new image on an instance at the same time. This gives us an advantage if we hit an unexpected issue during a release. Since the previous images are still on the instance, rollbacks are just a matter of restarting the container pointing to the previous image. Before docker this would have required stopping the processes, downgrading all the packages, and then starting again.

Update all instances without ever reducing traffic capacity

We already have zero-downtime updates today by only bringing down a portion of our broker/console instances at a time. With docker images we have the potential to change this process and never take down our instances. This is possible by leaving the existing broker/console containers running, pulling the new images to the system, starting containers for the new images, spot testing against these containers with something like curl, switching the traffic to the new containers, and then finally stopping/removing the old containers.

What did we learn?

Configuration files are OK, but environment variables are better!

As everyone knows, environments are different for good reasons. You don’t want every developer to spin up multiple instances just to do their day to day work. Your staging hostname is not the same as your production hostname. Your secret keys are not the same. Etc, etc, etc. So when we decided to containerize the console and broker, we knew we still needed a way to inject configuration into the running containers. To ease the transition into a Docker-based operational environment, we wanted to reduce the number of changes both our engineers and our operations team would be facing. So it seemed logical to keep our existing method of configuration, i.e. config files.

What does this mean with Docker? Well the configuration has to be “mounted in” to the running container, so the configuration files actually live on the host system, and not inside the image’s filesystem. This means the broker and console images are not actually 100% self-contained.

So we asked ourselves, how could we make that even better in the future? Environment variables of course! Passing in environment variables to a Docker container on a run command has been around for awhile, but requires specifying each individual environment variable with its own flag. The problem? We have a lot of configuration options in our console and broker, so passing these all individually was a bit unfeasible. The OpenShift team saw this as an opportunity to contribute to Docker upstream and added the ability to specify these environment variables with a single file.

OK, so how is an environment file any different than a config file? The key is when you look at distributed Docker, with a local client talking to a remote daemon. The configuration file no longer has to live on every host running one of the containers. Instead, the environment file only needs to exist on the client where the docker run command is called.

Sticking with file-based logging is a pain

OpenShift traditionally has logged to files in a standard location that all of our engineers, operations, and customer enablement team members are familiar with. So again to ease the transition, we left that alone. But what happens when you log to a file inside a container? Well that file only exists inside the container. We chose to mount the OpenShift log directory from the host into the container, allowing the existing infrastructure around log rotation and developer access to logs on staging systems to all remain unchanged.

Seems like its not that bad, right? Except it gets tricky when you start looking at file system permissions. OpenShift log files are set up with very specific permissions, as you might expect, only certain users and groups have access to read them, and even fewer to write to them. The problem is that the Docker container has its own view of the world when it comes to “users” and “groups.”

It has its own copy of /etc/passwd that it relies on for who it thinks is who. So if the configuration says, “Start up this process as user X”, the container starts the process with whoever it thinks user X is. So when mounting write-restricted log files from the host into the container, we have to guarantee that user X in the container has the same uid as user X on the host. Annoying? Yes!

Fortunately there are other options for logging. For example, in the geard project we are using journald to aggregate logs. But unfortunately there are many other use cases for mounted volumes where the permissions are still a concern. With the addition of user namespaces to Docker we will have good solutions.

Common patterns emerge around controlling services in containers

Our operations team is accustomed to running service start/stop/restart commands to control our components. We didn’t want that to have to change, and we certainly didn’t want them directly running gigantic docker run commands with mounts and ports and every other little thing that had to be included. So we wrote new init scripts to do all the heavy lifting. In the process we noticed our init scripts for our two components were almost identical, except for very specific details like which image to start, what port we wanted it exposed on, and mounting in the right directory to log to.

It was clear very early on that there should be an easier and more automated way of controlling and linking services running inside docker containers. This lead to the ideas behind some of the core features of the geard project.

OpenShift as an early adopter of Docker on RHEL, thanks to Project Atomic

OpenShift Online runs on RHEL 6.5 today and the docker-io package has been in the epel-testing repo for months now. OpenShift engineers began working with these packages in December, and have been putting load against the devicemapper based filesystem with our existing automated test suites for several months. As of March all OpenShift Online development instances are running Docker, which includes all instances launched during continuous integration testing.

The process of early adoption has helped with stabilizing Docker on RHEL 6.5. This would have never been possible without the RHEL engineers focused on Project Atomic and RHEL + Docker integration. OpenShift engineers are working closely with the Project Atomic community to design the next generation DevOps stack.

OpenShift Online
, , , ,
Comments are closed.