The quickest way to run a Jupyter Notebook instance in a containerised environment such as OpenShift, is to use the Docker-formatted images provided by the Jupyter Project developers. Unfortunately the Jupyter Project images do not run out of the box with the typical default configuration of an OpenShift cluster.
In this second post of this series about running Jupyter Notebooks on OpenShift, I am going to detail the steps required in order to run the Jupyter Notebook software on OpenShift.
Jupyter Project Images
The original code for the Jupyter Project images can be found on GitHub, but the images are also hosted on the Docker Hub Registry.
The Jupyter Project provides a number of images with different capabilities and packages pre-installed. These are:
jupyter/minimal-notebook– Base image with support for working with Python 3.
jupyter/scipy-notebook– Builds on
minimal-notebook, adding Python packages commonly used in data analysis and visualisation, including
matplotlib. Also adds support for Python 2.
jupyter/tensorflow-notebook– Builds on
scipy-notebook, adding packages for working with
jupyter/datascience-notebook– Builds on
scipy-notebook, adding support for Julia and R.
jupyter/pyspark-notebook– Builds on
scipy-notebook, adding support for working with Spark and Hadoop clusters.
jupyter/all-spark-notebook– Builds on
pyspark-notebook, adding support for Scala and R.
jupyter/r-notebook– Base image with support for working with R.
Deploying Images to OpenShift
The Docker-formatted images from the Jupyter Project can be deployed to OpenShift using the web console Deploy Image page:
Alternatively you can deploy the
jupyter/mininal-notebook image from the command line using the
oc new-app command:
$ oc new-app jupyter/minimal-notebook:latest --> Found Docker image acba6ac (4 days old) from Docker Hub for "jupyter/minimal-notebook:latest" * An image stream will be created as "minimal-notebook:latest" that will track this image * This image will be deployed in deployment config "minimal-notebook" * Port 8888/tcp will be load balanced by service "minimal-notebook" * Other containers can access this service through the hostname "minimal-notebook" --> Creating resources ... imagestream "minimal-notebook" created deploymentconfig "minimal-notebook" created service "minimal-notebook" created --> Success Run 'oc status' to view your app.
To expose the Jupyter Notebook so that it will be accessible via a public URL, from the Overview page in the web console you can select Create Route. If using the command line, you can run
$ oc expose svc/minimal-notebook route "minimal-notebook" exposed
Having performed these steps, once the image has been pulled down and deployed, you will find that the image fails to start. Digging into the logs for the failed deployment, you will find an error:
File "/opt/conda/lib/python3.5/site-packages/jupyter_core/migrate.py", line 241, in migrate with open(os.path.join(env['jupyter_config'], 'migrated'), 'w') as f: PermissionError: [Errno 13] Permission denied: '/home/jovyan/.jupyter/migrated'
The reason for this is due to one aspect of the default security model applied by OpenShift to ensure that, in a multi tenant environment, one user cannot interfere with another.
In such a multi tenant environment, applications in different projects are run with different assigned user IDs. This is enforced by running an image as the assigned user ID, rather than any user ID the image itself says it wants to run as.
The image has failed to start up in this case as it hasn’t been constructed in a way so as to be started as an arbitrary user ID.
The good thing about the Jupyter Project images at least is that they don’t expect to run as the
root user. Instead they have been built with the expectation that they run as the
jovyan user, with user ID of 1000. The group that the
jovyan user is a member of, and how permissions have been set up on directories and files, means the image will not work in an environment which applies a more strict security regime required of a multi tenant system. The issues with the Jupyter Project images have been reported, however not all problems have been addressed which would allow them to run in a more secure multi tenant environment.
Overriding User an Image Runs As
In a situation where an image has not been constructed to allow it to be run as an assigned user ID, one can override OpenShift and configure it to allow running of images as any user ID. This is done using the
oc adm policy add-scc-to-user command, with the security context constraint of
anyuid being added to the service account the image is run as.
$ oc adm policy add-scc-to-user anyuid -z default Error from server: User "developer" cannot get securitycontextconstraints at the cluster scope
As shown, this command will fail if you attempt to run it as a normal user. This is because only an administrator has the ability to override the security context constraints.
The reason for this is that giving a user the ability to run an image as any user ID, also allows them to run images as the
root user. In this case the image declares that it will run as the
jovyan user so will not run as the
root user. If enabling the ability for a user to run images as any user ID, an administrator should first ensure that the user is trusted, and that the source of any images is known and that the images are also trusted.
Presuming the administrator is satisfied, the administrator of the OpenShift cluster should run the command:
# oc adm policy add-scc-to-user anyuid -z default -n myproject
-n option and the argument that follows declares which project the command should be applied to. In this case it would be applied in the project called
Logging in to Jupyter Notebook
Having enabled the ability to run the Jupyter Notebook image as the
jovyan user, trigger a redeployment and the image should now start up.
To get the URL for the Jupyter Notebook, you can look up the hostname using
oc get routes:
$ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION minimal-notebook minimal-notebook-myproject.192.168.99.100.xip.io minimal-notebook 8888-tcp
Accessing the Jupyter Notebook from the browser using the hostname and you will be presented with a login page.
This is the default login page for Jupyter Notebook. As we have not specified a password when we deployed the application, Jupyter Notebook will generate a secret token to be used when logging in. The value of this token is output in the logs for the Jupyter Hub application.
To view the logs you can run the
oc get pods command to get a list of any pods running and then use
oc logs on the name of the pod for the running application.
$ oc get pods --selector app=minimal-notebook NAME READY STATUS RESTARTS AGE minimal-notebook-7-6dwp8 1/1 Running 0 1h $ oc logs minimal-notebook-7-6dwp8 ... Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=10c88f9dab876869b46884443e1157e5eb199ac615fb33e5 ...
As described on the login page, you can also use the
jupyter notebook list command to show the running servers. This needs to be run inside of the container running the Jupyter Notebook instance. You can do this using the
oc rsh command:
$ oc rsh minimal-notebook-7-6dwp8 jupyter notebook list Currently running servers: http://localhost:8888/?token=10c88f9dab876869b46884443e1157e5eb199ac615fb33e5 :: /home/jovyan/work
Copy just the token from the URL which is shown in the logs or output from the
jupyter notebook list command and use that in the login page for Jupyter Notebook in your browser. You should then be presented with the Jupyter Notebook dashboard.
Adding a Persistent Volume
When you work with a Jupyter Notebook, you can create new notebooks or upload an existing notebook. Any changes you make will be saved to the local file system within the container. As a result, if the container running the Jupyter Notebook instance is restarted, all your work will be lost.
If your OpenShift cluster is configured with persistent volumes, to avoid this you should use a persistent volume claim in conjunction with the Jupyter Notebook instance. To claim and mount the persistent volume, you can use the
oc set volume command. The directory at which the persistent volume should be mounted inside of the container should be
$ oc set volume dc/minimal-notebook --add --mount-path /home/jovyan/work --claim-size=1G info: Generated volume name: volume-tnjug persistentvolumeclaims/pvc-acnzs deploymentconfig "minimal-notebook" updated
A persistent volume claim can also be made, and associated with the Jupyter Notebook application from the web console by going to the Deployment Config for the Jupyter Notebook application. The option to Add Storage can be found in the Actions drop down menu.
Installing Additional Packages
Which packages for Python are available for you to use from your Jupyter Notebook instance will depend on which of the Jupyter Project images you chose. If you choose the minimal image, or are using an uncommon package, you will need to install it yourself from a terminal created from the Jupyter Notebook dashboard, or from within a notebook. Because everything is discarded when the container running the Jupyter Notebook instance is restarted, you would have to do this each time.
A way to avoid this is to extend the image and add support to it for running it as a Source-to-Image (S2I) builder. Using S2I, you can then build up a custom image which incorporates the packages you need. An S2I builder can also be used to pre-populate an image with notebooks and data files you may need.
I will explain how to create a S2I builder from the Jupyter Project images in the next post in this series.