Skip to content
This repository was archived by the owner on Feb 3, 2021. It is now read-only.

Feature: JupyterLab plugin #459

Merged
merged 8 commits into from
Mar 29, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions aztk/models/plugins/internal/plugin_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ class PluginManager:
# Indexing of all the predefined plugins
plugins = dict(
jupyter=plugins.JupyterPlugin,
jupyter_lab=plugins.JupyterLabPlugin,
rstudio_server=plugins.RStudioServerPlugin,
hdfs=plugins.HDFSPlugin,
)
Expand Down
1 change: 1 addition & 0 deletions aztk/spark/models/plugins/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .hdfs import *
from .jupyter import *
from .jupyter_lab import *
from .rstudio_server import *
1 change: 1 addition & 0 deletions aztk/spark/models/plugins/jupyter_lab/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .configuration import *
23 changes: 23 additions & 0 deletions aztk/spark/models/plugins/jupyter_lab/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import os
from aztk.models.plugins.plugin_configuration import PluginConfiguration, PluginPort, PluginRunTarget
from aztk.models.plugins.plugin_file import PluginFile
from aztk.utils import constants

dir_path = os.path.dirname(os.path.realpath(__file__))

class JupyterLabPlugin(PluginConfiguration):
def __init__(self):
super().__init__(
name="jupyter_lab",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be super pedantic, but I think this might be better as jupyterlab since they brand it as one word. I think we should also make plugins case insensitive (not in this pr, just in general).

ports=[
PluginPort(
internal=8889,
public=True,
),
],
run_on=PluginRunTarget.All,
execute="jupyter_lab.sh",
files=[
PluginFile("jupyter_lab.sh", os.path.join(dir_path, "jupyter_lab.sh")),
],
)
57 changes: 57 additions & 0 deletions aztk/spark/models/plugins/jupyter_lab/jupyter_lab.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash

# This custom script only works on images where jupyter is pre-installed on the Docker image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change all of the plugins to be self-contained. Not sure that should be done in this PR, but we will have to change it soon anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, although this is no longer true since I'm install jupyter lab from conda. I copied the file over so that comment was unnecessary.

#
# This custom script has been tested to work on the following docker images:
# - aztk/python:spark2.2.0-python3.6.2-base
# - aztk/python:spark2.2.0-python3.6.2-gpu
# - aztk/python:spark2.1.0-python3.6.2-base
# - aztk/python:spark2.1.0-python3.6.2-gpu

if [ "$IS_MASTER" = "1" ]; then
conda install -c conda-force jupyterlab

PYSPARK_DRIVER_PYTHON="/.pyenv/versions/${USER_PYTHON_VERSION}/bin/jupyter"
JUPYTER_KERNELS="/.pyenv/versions/${USER_PYTHON_VERSION}/share/jupyter/kernels"

# disable password/token on jupyter notebook
jupyter lab --generate-config --allow-root
JUPYTER_CONFIG='/.jupyter/jupyter_notebook_config.py'
echo >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.token=""' >> $JUPYTER_CONFIG
echo -e 'c.NotebookApp.password=""' >> $JUPYTER_CONFIG

# get master ip
MASTER_IP=$(hostname -i)

# remove existing kernels
rm -rf $JUPYTER_KERNELS/*

# set up jupyter to use pyspark
mkdir $JUPYTER_KERNELS/pyspark
touch $JUPYTER_KERNELS/pyspark/kernel.json
cat << EOF > $JUPYTER_KERNELS/pyspark/kernel.json
{
"display_name": "PySpark",
"language": "python",
"argv": [
"python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "$SPARK_HOME",
"PYSPARK_PYTHON": "python",
"PYSPARK_SUBMIT_ARGS": "--master spark://$MASTER_IP:7077 pyspark-shell"
}
}
EOF

# start jupyter notebook from /mnt - this is where we recommend you put your azure files mount point as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does jupyterlab have the same restriction as jupyter where you can't navigate up the directory structure? Right now /mnt is cluttered with a lot of things, it would be nice to have jupyterlab's default directory point somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same limitation - you cannot navigate 'up' from /mnt. Since this is where we place samples it seems like a reasonable place to put it - where else do you think it could work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's fine, /mnt/ is the best place since that is where their azure files will be mounted,

cd /mnt
(PYSPARK_DRIVER_PYTHON=$PYSPARK_DRIVER_PYTHON PYSPARK_DRIVER_PYTHON_OPTS="lab --no-browser --port=8889 --allow-root" pyspark &)
fi


6 changes: 5 additions & 1 deletion node_scripts/install/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ def setup_node():
plugins.setup_plugins(is_master=True, is_worker=True)
scripts.run_custom_scripts(is_master=True, is_worker=True)

#Write sentinel file on disk for host to know this is the master
path = os.environ['PWD'] + '/MASTER'
print('Writing master file to: {}'.format(path))
f = open(path, 'w')
f.write(master_node.ip_address)
Copy link
Member

@jafreck jafreck Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done here:

master_file.write("{0}\n".format(master_node.ip_address))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

else:
setup_as_worker()
plugins.setup_plugins(is_master=False, is_worker=True)
Expand All @@ -42,7 +47,6 @@ def setup_as_master():
if os.environ["WORKER_ON_MASTER"] == "True":
spark.start_spark_worker()


def setup_as_worker():
print("Setting up as worker.")
spark.setup_connection()
Expand Down
7 changes: 5 additions & 2 deletions node_scripts/setup_node.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,26 @@
# Usage:
# setup_node.sh [container_name] [gpu_enabled] [docker_repo] [docker_cmd]


container_name=$1
gpu_enabled=$2
repo_name=$3
docker_run_cmd=$4

echo "Installing pre-reqs"
apt-get -y install linux-image-extra-$(uname -r) linux-image-extra-virtual
apt-get -y install apt-transport-https
apt-get -y install curl
apt-get -y install ca-certificates
apt-get -y install software-properties-common
echo "Done installing pre-reqs"

# Install docker
echo "Installing Docker"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get -y update
apt-get -y install docker-ce
echo "Done installing Docker"

if [ $gpu_enabled == "True" ]; then
echo "running nvidia install"
Expand Down Expand Up @@ -71,7 +74,7 @@ else
until [ "`/usr/bin/docker inspect -f {{.State.Running}} $container_name`"=="true" ]; do
sleep 0.1;
done;

# wait until container setup is complete
docker exec spark /bin/bash -c 'python $DOCKER_WORKING_DIR/wait_until_setup_complete.py'

Expand Down