Customizing the JupyterHub environment (Kubernetes)¶
The Data 8 course uses a collection of Python modules and open-source technology for course infrastructure as well as teaching. We’ve provided all of these materials as a Docker image that you can connect to your JupyterHub so that all users will have the environment needed for the class. This page describes how to set up your JupyterHub environment on Kubernetes using Helm so that it serves the environment used in Data 8.
UC Berkeley deploys many JupyterHubs for its courses. You can see the documentation that we use to coordinate information across JupyterHub teams in the UC Berkeley JupyterHubs guide.
Deploying a change to your JupyterHub configuration¶
First things first, to make changes to your JupyterHub configuration, you
need to know how to deploy those changes. This section covers how to deploy
a change to your configuration file config.yml
in Kubernetes.
All modifications to the base JupyterHub deployment are made with
changes to your config.yaml
file. When we first created our JupyterHub,
we created this file with a single value inside that contained our secret token.
Once you’ve made a change to config.yaml
you can deploy it with the following
steps:
Double-check the syntax of your
config.yaml
file. For example, you may have copy/pasted two code snippets that both hadsingleuser:
as the top header, and should be merged under a single header. YAML is picky about how whitespaces/indentation are used, and can throw cryptic errors if you have small errors like type-os.
Deploy your change to the JupyterHub. Use the following command:
helm upgrade <YOUR-HUB-NAMESPACE> jupyterhub/jupyterhub --version=<YOUR-VERSION> -f config.yaml
This runs a “Helm Upgrade”, which tells Kubernetes to update its deployment to match the values that you’ve placed in
config.yaml
. The value in<YOUR-HUB-NAMESPACE>
should be whatever you chose when creating the JupyterHub.Most hardware modifications to your Kubernetes deployment will be done in this way.
Common errors¶
“time out waiting for the condition”. It took too long to
pull the Docker image onto the JupyterHub. You can generally resolve this by including
a --timeout=9999999
flag to your helm upgrade
command. This will prevent Helm
from stopping too early.
Customizing your Hub environment¶
The environment provided to your students is defined by a Docker image
that JupyterHub serves to new user sessions. Which image to use is configured
in your config.yml
file.
Using the Data 8 Docker image¶
First, we’ll tell the JupyterHub to connect user sessions with the
Docker image used by Data 8. This is done by adding the following to your
config.yaml
file:
singleuser:
image:
name: berkeleydsep/datahub-user
tag: 21be6ff
To deploy the change, save the file, then run a helm upgrade:
helm upgrade data8 jupyterhub/jupyterhub --version=v0.6 -f config.yaml
Note that this will take a while if you’re using the data8 image, perhaps upwards of 10 minutes, as it pulls the image into your Kubernetes deployment.
Extending each user’s environment¶
If you need to extend the base Docker image, you can do so interactively from
a JupyterHub session. For example students can pip
or conda
install packages
into their own user directories, and these will persist over time.
Modify the resources available to your users¶
In this section we’ll define what kind of resources your students have. For example, how much RAM / CPU / etc. See the Data 8 user resource configuration for example.
Memory (RAM)¶
The following snippet will give each user 1 gig of ram,
which is the amount given to Data 8 students at Berkeley. Students will
always have at least the amount specified in guarantee
and their Jupyter
server will restart if they use more than limit
.
singleuser:
memory:
guarantee: 1G
limit: 1G
Disk Storage¶
The following snippet gives students 2 gigs of persistent storage:
singleuser:
storage:
capacity: 2Gi
Authorization for your hub¶
Authorization allows you to control who has access to your JupyterHub, as well as keep track of who is accessing the hub.
There are many options for authorization with JupyterHub. The Data 8 deployment uses the Berkeley CalNet authentication system, which is part of the Google “G Suite”. Your authentication approach depends on how your university authenticates. You can also choose authentication with popular services like GitHub.
See the Zero to JupyterHub Authentication guide for information about various authentication options and how to enable them.
For this guide, we’ll show you how to authenticate using GitHub usernames, as this is a free service that is more widely-accessible than the particular authentication system that any one university uses.
Authenticating with GitHub¶
To authenticate with GitHub, take the following the steps outlined in the Zero to JupyterHub Authentication Guide.
Note: The DSEP authorization configuration can be found here.
Adding admin users¶
JupyterHub has an administrator page that can be used to see all of the
active sessions currently on the hub, as well as perform some simple actions
to help debug and fix student problems. To add a list of admin usernames,
add the following to the auth
section of your config.yaml
file:
auth:
admin:
users:
- <LIST>
- <OF>
- <ADMIN>
- <USERNAMES>
An example config.yaml
file¶
If you’ve followed all of the instructions on this page
(including authenticating with GitHub), your config.yaml
file should now
look something like this:
proxy:
secretToken: "<< output of openssl rand -hex 32 >>"
singleuser: # This defines the user environment
image:
name: berkeleydsep/datahub-user
tag: da80cb1
memory:
guarantee: 2G
limit: 2G
storage:
capacity: 2Gi
auth:
type: github
github:
clientId: "<< YOUR-CLIENT-ID >>"
clientSecret: "<< YOUR-CLIENT-SECRET >>"
callbackUrl: "http://<< YOUR-HUB-IP-ADDRESS >>/hub/oauth_callback"
admin:
users:
- <LIST>
- <OF>
- <ADMIN>
- <USERNAMES>
Confirm that your environment works¶
To confirm that you’re running the correct environment needed for Data 8, take the following steps:
Go to your JupyterHub’s public IP address. You can find this address with:
kubectl --namespace=data8 get svc proxy-public
Log in with your username/password (if you haven’t logged in yet, use whatever username/password you’d like)
Click “Start My Server”, then create a new Jupyter Notebook.
Run the following Python code
import datascience print(datascience.__version__)
You should see the version for the datascience
package printed below the cell.
If this worked, then congratulations! Your JupyterHub is ready to go.