Jupyter on OpenShift Part 6: Running as an Assigned User ID

When you deploy an application to OpenShift, by default it will be run with an assigned user ID unique to the project the application is running in. This user ID will override whatever user ID a Docker-formatted image may declare as the user it should be run as.

Running applications under a project as a user ID different to applications running in any other project is part of the multi-layered approach to security used in OpenShift. This is important in a multi-tenant platform such as OpenShift and provides an extra layer of separation between applications run by different users, or which are different parts of a complex system which is deployed across multiple projects and which should have limited visibility of other parts.

A consequence of applications being forced to run as a specific assigned user ID is that if you pull down an arbitrary Docker-formatted container image from a public registry such as Docker Hub, there is a chance that the application in it will not run.

This can occur where the image expects to be run as the root user, or even where run as a non root user listed in the UNIX password file of the image. The problem that usually arises is that the application when run as an assigned user ID, different to what the image wants, is that the application will not have read/write access to parts of the container file system it requires.

To ensure portability of images to different deployment environments, it is good practice to design the image so that it can be run as an arbitrary user ID not appearing in the UNIX password file.

In situations where this isn’t possible, in order to run such an image in OpenShift, it is necessary to override the default security policy of OpenShift and enable the image to be run as the user ID it specifies. In the second post of this series of posts on running Jupyter Notebooks on OpenShift, this is what was done to allow the images for Jupyter Notebook provided by the Jupyter Project to be run.

In this post, we will delve more into the topic of user IDs, as well as what changes would need to be made to the Jupyter Notebook image being used to enable it to run as the user ID OpenShift assigns to it.

Assigned User ID of a Project

When people discuss running applications under OpenShift, you will hear it said that applications are run as a random user ID. As far as what you should assume when creating an image containing an application, this is a reasonable view to take, but in practice to say applications are run under a random user ID is not entirely accurate. This is because saying a random user ID is used, can give the impression that each time an application is re-started, or where multiple replicas are run, that it is assigned a different user ID for each instance. This is not the case.

What actually occurs is that each project created in OpenShift is assigned a range of user IDs it can use. By default any application deployed within that project will use the lowest numbered user ID in that range. You can see what the range of user IDs assigned to a project is by querying the details of the project.

$ oc describe project myproject
Name:           myproject
Namespace:      <none>
Created:        45 hours ago
Labels:         <none>
Annotations:        openshift.io/description=Initial developer project
            openshift.io/display-name=My Project
Display Name:       My Project
Description:        Initial developer project
Status:         Active
Node Selector:      <none>
Quota:          <none>
Resource limits:    <none>

In this example, the annotation openshift.io/sa.scc.uid-range indicates that the project is assigned the user ID range starting at 1000040000 and ending at 1000049999.

Unless you do anything to override the user ID a specific deployment runs the application as, it would for this project use the user ID 1000040000. You can confirm this by accessing a running application and running the id command.

$ oc rsh myapp-1-36clr id
uid=1000040000 gid=0(root) groups=0(root),1000040000

You can also query the resource object for the pod to see what OpenShift assigned to the application.

$ oc get pod/myapp-1-36clr \
    -o 'jsonpath="{.spec.containers[0].securityContext.runAsUser}"'

If for some reason you wanted to run different applications in the same project with different user IDs, you can set the property spec.template.spec.containers.securityContext.runAsUser on the deployment configuration resource object. The user ID you use must come from the range of user IDs allocated to the project. If you attempt to use a user ID outside of the range, the deployment will be blocked and fail.

Error creating: pods "myapp-5-" is forbidden: unable to validate against any
security context constraint: [securityContext.runAsUser: Invalid value: 1000:
UID on container myapp does not match required range. Found 1000, required
min: 1000040000 max: 1000049999]

Filesystem Access Permissions

Returning back to the contents of our first post on running Jupyter Notebooks on OpenShift, we deployed the jupyter/mininal-notebook image from the Jupyter Project by running from the command line:

$ oc new-app jupyter/minimal-notebook:latest
--> Found Docker image acba6ac (4 days old) from Docker Hub for "jupyter/minimal-notebook:latest"

    * An image stream will be created as "minimal-notebook:latest" that will track this image
    * This image will be deployed in deployment config "minimal-notebook"
    * Port 8888/tcp will be load balanced by service "minimal-notebook"
      * Other containers can access this service through the hostname "minimal-notebook"

--> Creating resources ...
    imagestream "minimal-notebook" created
    deploymentconfig "minimal-notebook" created
    service "minimal-notebook" created
--> Success
    Run 'oc status' to view your app.

This ultimately failed though, with the Jupyter Notebook application failing to start up due to the error:

  File "/opt/conda/lib/python3.5/site-packages/jupyter_core/migrate.py", line 241, in migrate
    with open(os.path.join(env['jupyter_config'], 'migrated'), 'w') as f:
PermissionError: [Errno 13] Permission denied: '/home/jovyan/.jupyter/migrated'

The problem here was that because OpenShift was overriding that the image had indicated it wanted to run as the jovyan user, and instead ran it as a user ID from the range of user IDs allocated to the project, the application couldn’t write files to the directory used by the application.

Looking at the ownership and permissions of the directory /home/jovyan we find:

$ ls -las
total 44
4 drwxr-xr-x 9 jovyan users 4096 Mar 23 05:04 .
4 drwxr-xr-x 8 root   root  4096 Mar 23 05:04 ..
4 -rw-r--r-- 1 jovyan users  220 Nov  5 21:22 .bash_logout
4 -rw-r--r-- 1 jovyan users 3515 Nov  5 21:22 .bashrc
4 drwxr-xr-x 2 jovyan users 4096 Mar  4 02:56 .continuum
4 -rw-r--r-- 1 jovyan users   42 Mar  4 02:56 .curlrc
4 drwxr-xr-x 2 jovyan users 4096 Mar 17 18:18 .jupyter
4 drwx------ 3 jovyan users 4096 Mar 23 05:04 .local
4 -rw-r--r-- 1 jovyan users  675 Nov  5 21:22 .profile
4 drwxr-xr-x 2 jovyan users 4096 Mar  4 02:56 work

All directories and files are owned by the jovyan user and with group users. Members of the group can only read files and not write them.

We know from before that the application is running as:

uid=1000040000 gid=0(root) groups=0(root),1000040000

It therefore doesn’t have the required access rights over the .jupyter directory.

Granting Group Write Access

The prior solution for running the Jupyter Notebook images was to add the anyuid role to the service account under which the application was deployed. By enabling this, it meant that the application would run as the user jovyan, which the image had declared it wanted to be run as.

uid=1000(jovyan) gid=100(users) groups=100(users)

This obviously works because the user ID the application runs as is the same as the owner of the directories and files the application needs write access to.

The question now is whether it is possible to change the permissions of the directories and files such that the image can still be run as jovyan, but also work when run as the assigned user ID OpenShift uses. Keep in mind that what that user ID will be is not going to be known in advance. All that is known is that it will be a user ID which does not correspond to any existing user, nor will the user ID be listed as being a member of any UNIX group defined by the image.

It is this last fact which is actually the answer. Specifically, if an application is run with a user ID that does not have a corresponding user name, nor is a member of any group, although the application will still run as that user ID, the group will always be set as 0, corresponding to the root group. This is why the id command showed gid=0(root) groups=0(root).

We can use this fact to enable the application to work, changing the group associated with the files and directories, and granting members of the group write access.

The current version of the Dockerfile we used for the S2I enabled version of the Jupyter Notebook image was as follows.

FROM jupyter/minimal-notebook:latest

# Switch to the root user so we can install additional packages.

USER root

# Install additional libraries required by Python packages which are in
# the minimal base image. Also install 'rsync' so the 'oc rsync' command
# can be used to copy files into the running container.

RUN apt-get update && \
    apt-get install -y --no-install-recommends libav-tools rsync && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Add labels so OpenShift recognises this as an S2I builder image.

LABEL io.k8s.description="S2I builder for Jupyter (minimal-notebook)." \
      io.k8s.display-name="Jupyter (minimal-notebook)" \
      io.openshift.expose-services="8888:http" \
      io.openshift.tags="builder,python,jupyter" \

# Copy in S2I builder scripts for installing Python packages and copying
# in of notebooks and data files.

COPY s2i /opt/app-root/s2i

# Revert the user but set it to be an integer user ID else the S2I build
# process will reject the builder image as can't tell if user name
# really maps to user ID for root.

USER 1000

# Override command to startup Jupyter notebook. The original is wrapped
# so we can set an environment variable for notebook password.

CMD [ "/opt/app-root/s2i/bin/run" ]

To this, just before we revert back to the jovyan user, with user ID of 1000, we insert:

# Adjust permissions on home directory so writable by group root.

RUN chgrp -Rf root /home/$NB_USER && chmod -Rf g+w /home/$NB_USER

This will recursively change the group to root for all directories and files under the home directory used by the Jupyter Notebook application. It will also ensure that members of the group root can make changes to the directories and files.

With this change made, the idea is that if the image is run as the jovyan user, that it will still be able to make changes based on the fact that the jovyan user is the owner of the directories and files. If however run as an assigned user ID not in the UNIX password file for the image, then the application would be able to make changes based on the group ID of the running application being root.

Providing an Identity for the User

Trying the updated image with this change and the Jupyter Notebook image does indeed now appear to start up correctly, being able to write to the home directory. To be sure everything is okay we can go back and repeat the steps run in the previous two posts of attaching a persistent volume, and then creating a Python virtual environment in the persistent volume.

Doing that, everything still seems to be okay, but there is one thing which does stand out as being a bit odd. This is the value displayed in the prompt for the interactive shell. Specifically, it displays:

I have no name!

The reason for this derives from the fact that when running as the assigned user ID, there is no entry for that user ID in the UNIX password file. This means that anything that attempts to look up details for the user by the user ID will fail. This could be UNIX shell commands such as whoami:

I have no name!@notebook-6-7vn4f:~/volume$ whoami
whoami: cannot find name for user ID 1000040000

It could also be code within Python as well:

I have no name!@notebook-6-7vn4f:~/volume$ python
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, pwd
>>> pwd.getpwuid(os.getuid())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'getpwuid(): uid not found: 1000040000'

This turns out to be the tip of the iceberg for potential problems that could arise, and at various times there have been Python packages that would fail when used in an application which is run as a user ID with no entry in the UNIX password file.

To avoid the potential for problems, what is necessary is to somehow ensure that when operating system libraries are used to look up UNIX password details, that a valid entry is returned for whatever is the assigned user ID.

One method which can be used to do this, and which is used in some of the builtin S2I builders provided by OpenShift, is to use a package called nss_wrapper.

What this package does is provide a shared library which is forcibly preloaded into any applications run in the container, and which intercepts any calls which look up details of a user and returns a valid entry. This works by virtue of using a copy of the UNIX password file, created when the image is run, which has had an additional user added corresponding to the assigned user ID.

This method can be a little bit complicated to setup, especially with the Jupyter Project images used, as they are based on Debian and there is no package for nss_wrapper in the stable Debian package repositories.

One could build the nss_wrapper from source code, but it turns out there is a simpler way of getting around this problem that doesn’t require any additional package be installed. That is, make the UNIX password database file writable from the Dockerfile when creating the image and add the additional user to it directly prior to any application being started up.

We therefore first add to the Dockerfile the following:

# Adjust permissions on /etc/passwd so writable by group root.

RUN chmod g+w /etc/passwd

Then in the run script used to start the Jupyter Notebook application we add:

# Ensure that assigned uid has entry in /etc/passwd.

if [ `id -u` -ge 10000 ]; then
    cat /etc/passwd | sed -e "s/^$NB_USER:/builder:/" > /tmp/passwd
    echo "$NB_USER:x:`id -u`:`id -g`:,,,:/home/$NB_USER:/bin/bash" >> /tmp/passwd
    cat /tmp/passwd > /etc/passwd
    rm /tmp/passwd

This will first modify the existing user entry, the name of which is stored in the NB_USER environment variable, changing the user name to builder. This is done so we can easily distinguish what were files created as part of the S2I build process. A new user entry is then created, using the current user ID and group ID that the image is being run as.

With this change done, things are starting to look a bit better.

jovyan@notebook-7-4ks5n:~$ id
uid=1000040000(jovyan) gid=0(root) groups=0(root),1000040000

jovyan@notebook-7-4ks5n:~$ whoami

jovyan@notebook-7-4ks5n:~$ ls -las
total 44
4 drwxrwxr-x 12 builder root 4096 Mar 27 10:53 .
4 drwxr-xr-x  9 root    root 4096 Mar 27 10:53 ..
4 -rw-rw-r--  1 builder root  220 Nov  5 21:22 .bash_logout
4 -rw-rw-r--  1 builder root 3515 Nov  5 21:22 .bashrc
4 drwxrwxr-x  2 builder root 4096 Mar  4 02:56 .continuum
4 -rw-rw-r--  1 builder root   42 Mar  4 02:56 .curlrc
4 drwxrwxr-x  2 builder root 4096 Mar 27 10:53 .jupyter
4 drwx------  3 jovyan  root 4096 Mar 27 10:53 .local
4 -rw-rw-r--  1 builder root  675 Nov  5 21:22 .profile
4 drwxrwxrwx  3 root    root 4096 Mar 27 10:52 volume
4 drwxrwxr-x  2 builder root 4096 Mar  4 02:56 work

Installing Extra Packages

Now that the issue of the user the Jupyter Notebook application runs as, not having a complete identity is fixed, it looks like we should be all good to go. There is though one more thing that needs to be checked. This is the ad-hoc installation of additional Python packages.

We already know that because these are going to be installed into the container file system, that if the container is restarted they will be lost, but this can still be convenient in some situations when testing or working out what packages are required.

Attempting to install an additional Python package, we do hit a further problem though.

$ conda install matplotlib
Fetching package metadata .........
Solving package specifications: ..........

Package plan for installation in environment /opt/conda:

The following packages will be downloaded:

    package                    |            build
    expat-2.1.0                |                2         402 KB  conda-forge

The following NEW packages will be INSTALLED:

    cycler:           0.10.0-py35_0     conda-forge (soft-link)

CondaIOError: IO error: Missing write permissions in: /opt/conda
# You don't appear to have the necessary permissions to install packages
# into the install area '/opt/conda'.
# However you can clone this environment into your home directory and
# then make changes to it.
# This may be done using the command:
# $ conda create -n my_root --clone=/opt/conda

The reason this fails is that we only fixed up the permissions on the home directory of the application, with the change that was made to the Dockerfile for our image. We did not fix up the permissions of the /opt/conda directory where the Anaconda Python installation was located.

The permissions on the /opt/conda directory could have been changed as well, but doing that exposes an ugly side of how Docker-formatted images work.

Specifically, because a Docker-formatted image consists of layers for every set of changes made, changing the permissions of everything under the /opt/conda directory would cause a complete copy of those directories and files to be made in a new layer. This is even though the contents of the files aren’t changed and only the permissions on the files are changed.

The consequence of changing the permissions on /opt/conda would therefore have been to increase the size of the image by an additional 400MB.

b00ad8753768  3 minutes ago  /bin/sh -c chgrp -Rf root /opt/conda && chmod  414 MB

So although fixing up the permissions on the home directory used by the Jupyter Notebook application was seen as being okay, as that directory was effectively empty, fixing the permissions on the /opt/conda directory has the risk of causing problems due to the increased size of the image.

This is where attempting to fix up problems in base images in a derived image can only take you so far. Certain issues such as incorrect permissions really need to be fixed by setting the correct permissions in the first place, in the same layer that any directories were created, be they explicitly, or due to installing some package.

Using the Builder Image

The files for this version of the Jupyter Project minimal notebook can be found on the s2i-assigned-uid branch of the Git repository found at:

To build the image using OpenShift you can use the command:

oc new-build https://github.com/getwarped/s2i-minimal-notebook#s2i-assigned-uid \
    --name s2i-minimal-notebook

Unlike before, there is no need to enable the anyuid role for the default service account for the project. If you had already added the role, you can remove it by having an administrator run:

oc adm policy remove-scc-from-user anyuid -z default -n myproject

To deploy the image to create an empty environment in which to start working on a notebook, along with an attached persistent volume, you can run:

oc new-app s2i-minimal-notebook \
    --env PERSISTENT_VOLUME_ROOTDIR=/home/jovyan/volume \
    --name notebook

oc set volume dc/notebook --add --mount-path /home/jovyan/volume --claim-size=1G

oc expose svc/notebook

As discussed above, you cannot install additional Python packages. This is because fixing the permissions on the root Python environment to enable that would cause the size of the image to increase dramatically.

In order to install additional Python packages, you should use the image as an S2I builder to pre-install any required packages into the image. Alternatively, attach a persistent volume, creating a Python environment on the persistent volume and use it instead.

Deploying From the Web Console

With the changes described in this post, we now have a version of the S2I enabled image for Jupyter Notebooks that can be deployed on a standard OpenShift installation without needing to make changes to the default security model.

The instructions provided to use the image as a S2I builder required the use of the command line. In order to be able to use the S2I builder image from the web console an extra step is required. This entails creating an annotated image stream definition so that OpenShift knows the image is a builder image. I will look at how to do this in the next blog post in this series.

OpenShift Ecosystem, OpenShift Origin, Python
, ,