Using an Image Source to reduce build times

In a prior post, Jorge Morales described a number of techniques for how one could reduce build times for a Java based application when using OpenShift. Since then there have been a number of releases of the upstream OpenShift Origin project.

In the 1.1.2 release of Origin a new feature was added to builds called an Image Source, which can also be useful in helping to reduce build times by offloading repetitive build steps to a separate build process. This mechanism can for example be used to pre build assets which wouldn’t change often, and then have them automatically made available within the application image when it is being built.

To illustrate how this works, I am going to use an example from the Python world, using some experimental S2I builders for Python I have been working on. I will be using the All-In-One VM we make available for running OpenShift Origin on your laptop or desktop PC.

Deploying a Python CMS

The example I am going to start with is the deployment of a CMS system called Wagtail. This web application is implemented using the popular Django web framework for Python.

Normally Wagtail would require a database to be configured for storage of data. As I am more concerned with the build process here rather than seeing the site running, I am going to skip the database setup for now.

To create the initial deployment for our Wagtail CMS site, we need to create a project, import the Docker image for the S2I builder I am going to use and then create the actual application.

$ oc new-project image-source
Now using project "image-source" on server "https://10.2.2.2:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-hello-world.git

to build a new hello-world application in Ruby.

$ oc import-image grahamdumpleton/warp0-debian8-python27 --confirm
The import completed successfully.

Name:           warp0-debian8-python27
Created:        Less than a second ago
Labels:         <none>
Annotations:        openshift.io/image.dockerRepositoryCheck=2016-03-02T00:14:37Z
Docker Pull Spec:   172.30.118.161:5000/image-source/warp0-debian8-python27

Tag Spec                    Created         PullSpec                            Image
latest  grahamdumpleton/warp0-debian8-python27  Less than a second ago  grahamdumpleton/warp0-debian8-python27@sha256:ae947cc679d2c1... <same>

$ oc new-app warp0-debian8-python27~https://github.com/GrahamDumpleton/wagtail-demo-site.git
--> Found image b75e1fa (4 hours old) in image stream warp0-debian8-python27 under tag "latest" for "warp0-debian8-python27"

    Python 2.7 (Warp Drive)
    -----------------------
    S2I builder for Python web applications.

    Tags: builder, python, python27, warpdrive, warpdrive-python27

    * A source build using source code from https://github.com/GrahamDumpleton/wagtail-demo-site.git will be created
      * The resulting image will be pushed to image stream "wagtail-demo-site:latest"
    * This image will be deployed in deployment config "wagtail-demo-site"
    * Port 8080/tcp will be load balanced by service "wagtail-demo-site"
      * Other containers can access this service through the hostname "wagtail-demo-site"

--> Creating resources with label app=wagtail-demo-site ...
    imagestream "wagtail-demo-site" created
    buildconfig "wagtail-demo-site" created
    deploymentconfig "wagtail-demo-site" created
    service "wagtail-demo-site" created
--> Success
    Build scheduled for "wagtail-demo-site", use 'oc logs' to track its progress.
    Run 'oc status' to view your app.

The initial build and deployment of the Wagtail site will take a little while for a few reasons. The first is that because we didn’t already have the S2I builder loaded into our OpenShift cluster, it needs to download it from the Docker Hub registry where it resides. Because I live down in Australia where our Internet is only marginally better than using two tin cans joined by a piece of wet string, this can take some time.

The next most time consuming part of the process is one which actually needs to be run every time we do a build. That is that we need to download all the Python packages that the Wagtail CMS application requires. This includes Wagtail itself, Django, as well as database clients, image manipulation software and so on.

Many of the packages it requires are pure Python code and so it is just a matter of downloading the Python code and installing it. In other cases, such as with the database client and image manipulation software, it contains C extension modules which need to be first compiled into a dynamically loadable object library.

The delay points are therefore the time taken to download the packages from the Python package index, followed by actually code compilation times.

A final source of an extra delay for the initial deploy is the pushing up of the image to the nodes in the OpenShift cluster so that the application can then be started. This takes a little bit of extra time on the first deploy as all the layers of the base image for this S2I builder will not be present on each node. Subsequent deploys will not see this delay unless the S2I builder image itself were updated.

When finally done, for me down here in this Internet deprived land we call OZ, that takes a total time of just under 15 minutes. This included around 5 minutes to pull down the S2I builder the first time and about 5 minutes to push the final image out to the OpenShift nodes the first time.

The actual build of the Wagtail application itself, consisting of the pulling down and compilation of the required Python packages, therefore took about 5 minutes.

Because we are using an S2I builder, which downloads the application code from the Git repository, and downloads any Python packages, compiling and installing them, all in one step, we have no way of speeding things up by using separate layers in Docker. Well we could, but it would mean needing to create a custom version of the S2I builder which had preinstalled into the base image all the Python packages we required. Although technically possible, this would not be the preferred option.

Using a Python Wheelhouse

If we were using Docker directly, an alternative one can use with Python is what is called a Wheelhouse.

What this entails is downloading and pre building all the Python packages we require to produce what are called Python wheels. These are stored in a directory called a ‘wheelhouse’.

When we now go to build our Python application, when installing all the packages we want, we would point the Python ‘pip’ program used to install the packages at our directory of wheels we pre built for the packages. What ‘pip’ will then do is that rather than download the packages and build them again, it will use our pre built wheels instead. We are therefore able to skip all that time taken to download and compile everything, resulting in a reduction of the time taken to build the Docker image.

Integrating the use of a wheelhouse directory into a build process when using Docker directly can be quite fiddly and involves a number of steps. Using the capabilities of OpenShift, we can however make that a very simple process.

All we need is an S2I builder for Python which is setup to be able to use a wheelhouse directory, as well as a way of constructing the wheelhouse directory in the first place. Having that, we can then use the ‘Image Source’ feature of OpenShift to combine the two.

As it happens the S2I builder I have been using here has both these capabilities, so lets see how that can work.

So we already have our Wagtail CMS application running with the name ‘wagtail-demo-site’.

The next step is to create that wheelhouse. To do this we are going to use oc new-build with the same S2I builder and Git repository as we used before, but we are going to set an environment variable to have the S2I builder create a wheelhouse instead of preparing the image for our application.

$ oc new-build warp0-debian8-python27~https://github.com/GrahamDumpleton/wagtail-demo-site.git --env WARPDRIVE_BUILD_TARGET=wheelhouse --name wagtail-wheelhouse
--> Found image b75e1fa (4 hours old) in image stream warp0-debian8-python27 under tag "latest" for "warp0-debian8-python27"

    Python 2.7 (Warp Drive)
    -----------------------
    S2I builder for Python web applications.

    Tags: builder, python, python27, warpdrive, warpdrive-python27

    * A source build using source code from https://github.com/GrahamDumpleton/wagtail-demo-site.git will be created
      * The resulting image will be pushed to image stream "wagtail-wheelhouse:latest"

--> Creating resources with label build=wagtail-wheelhouse ...
    imagestream "wagtail-wheelhouse" created
    buildconfig "wagtail-wheelhouse" created
--> Success
    Build configuration "wagtail-wheelhouse" created and build triggered.
    Run 'oc logs -f bc/wagtail-wheelhouse' to stream the build progress.

Since we have already downloaded the S2I builder when initially deploying the application, and because we aren’t deploying anything, just building an image, this should take about 5 minutes. This is equivalent to what we saw for installing the packages as part of the application build.

Loading files using Image Source

Right now the wheelhouse build and the application build are separate. The next step is to link these together so that the application build can use the by products of what is created by the wheelhouse build.

To do this we are going to edit the build configuration for the application. To see the current build configuration from the command line, you can run oc get bc wagtail-demo-site -o yaml. We are only going to be concerned with a part of that configuration, so I am only quoting the source and strategy sections.

  source:
    git:
      uri: https://github.com/GrahamDumpleton/wagtail-demo-site.git
    secrets: []
    type: Git
  strategy:
    sourceStrategy:
      from:
        kind: ImageStreamTag
        name: warp0-debian8-python27:latest
        namespace: image-source
    type: Source

The main change we are going to make is to enable the Image Source feature. To do this we are going to change the source section. This can be done using oc edit bc wagtail-demo-site. We are going to change the section to read:

  source:
    git:
      uri: https://github.com/GrahamDumpleton/wagtail-demo-site.git
    images:
    - from:
        kind: ImageStreamTag
        name: wagtail-wheelhouse:latest
        namespace: image-source
      paths:
      - destinationDir: .warpdrive/wheelhouse
        sourcePath: /opt/warpdrive/.warpdrive/wheelhouse/.
    secrets: []
    type: Git

What we have added is the images sub section. Here we have linked the application image to our wheelhouse image called wagtail-wheelhouse. We have also under paths described where the pre built files are located that we want to have copied from the wheelhouse image into our application image. These being in the directory /opt/warpdrive/.warpdrive/wheelhouse/. and that we want them copied into the directory .warpdrive/wheelhouse relative to our application code directory.

A second change we make, although this is actually optional, is that since we have pre-built all the packages we know are needed by ‘pip’, it need not actually bother checking with the Python Package Index (PyPi) at all. We can therefore say to trust that the package versions in the wheelhouse are exactly what we need. This we can do by setting an environment variable in the sourceStrategy sub section.

  strategy:
    sourceStrategy:
      env:
      - name: WARPDRIVE_PIP_NO_INDEX
        value: "1"
      from:
        kind: ImageStreamTag
        name: warp0-debian8-python27:latest
        namespace: image-source
    type: Source

Having made these changes we can now trigger a rebuild and see whether things have improved.

Tracking build times

As to tracking building times, the best visual way of doing that is by using the build view in the web interface of OpenShift. Using this, what we find as a our end result is the following.

TrackingBuildTimes

Ignoring our initial build, which as explained will take longer due to needing to first download the S2I builder and distribute it to nodes, our build time for the application turned out to be a bit under 5 minutes.

We would have expected this built time to always be about that for every application code change we made, even though we hadn’t changed what packages needed to be installed.

When we introduced the wheelhouse image and linked our application build to it so that the pre built packages could be used, the build time for the application has now dropped down to about a minute and a half. Hardly enough time to go get a fresh cup of coffee.

Wheelhouse build time

We have successfully managed to offload the more time consuming parts of the application image build off to the wheelhouse image. Because the wheelhouse is only concerned with pre building any required Python packages it doesn’t need to be rebuilt every time an application change is made. You only need to trigger a rebuild of it when you want to change what packages are to be built, or what versions of the packages.

Having to rebuild the wheelhouse would therefore generally be a rare event. Even so, there is actually a way we can reduce how long it takes to be rebuilt as well. This is by using an optional feature of S2I builds called incremental builds.

With support for incremental builds already implemented in the special S2I builder for Python I am using, to enable incremental builds all we need to do is edit the build configuration for the wheelhouse and enable it. In this case we are going to amend the sourceStrategy sub section and add the incremental setting and give it the value true.

  strategy:
    sourceStrategy:
      env:
      - name: WARPDRIVE_BUILD_TARGET
        value: wheelhouse
      from:
        kind: ImageStreamTag
        name: warp0-debian8-python27:latest
        namespace: image-source
      incremental: true
    type: Source

By doing this, what will now happen is that when the wheelhouse is being rebuilt, a copy of the ‘wheelhouse’ directory of the prior build will first be copied over from the prior version of the wheelhouse image.

Similar with how the application build time was sped up, ‘pip’ will realise that it already has pre-built versions of the packages it is interested in and skip rebuilding them. It would only need to go out and download a package if it was a new package that had been added, or the version required had been changed.

The end result is that by using both the Image Source feature of builds and the incremental builds, we have not only reduced how long it takes to build our application image, we have reduced how long it would take to rebuild our wheelhouse image that contains our pre-built packages.

Experimental S2I builder

As indicated above, this has all been done using an experimental S2I Python builder, it is not the default S2I Python builder that comes with OpenShift. The main point of this post hasn’t been to promote this experimental builder, but to highlight the Image Source feature of builds in OpenShift and provide an example of how it might be used.

The experimental builder only exists at this point as a means for me personally to experiment with better ways of handling Python builds with OpenShift. What I learn from this is being fed back to the OpenShift developers so they can determine what direction the default S2I Python builder will take.

If you are interested in the experiments I am doing with my own S2I Python builder, and how that can fit into a broader system for making Python web application deployments easier, I would suggest keeping an eye on my personal blog site. I have recently written two blogs posts about some of my work that may be of interest.

You can drop me any comments if you have feedback about that separate project via Twitter (@GrahamDumpleton).

Categories
News, OpenShift Origin, Python
Tags
,
Comments are closed.