Skip to content

Commit

Permalink
docs: update interactive environments documentation
Browse files Browse the repository at this point in the history
fix #1647
  • Loading branch information
lorenzo-cavazzi committed Nov 25, 2020
1 parent 9971e6c commit 6a0c028
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 69 deletions.
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ Ubuntu
ui
untracked
untracked
url
username
versioned
versioning
Expand Down
111 changes: 58 additions & 53 deletions docs/user/interactive_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,38 @@ of code before combining everything into a (reproducible) workflow.
You can run JupyterLab or RStudio within a project independently from RenkuLab,
but RenkuLab offers the following advantages:

* environments hosted in the cloud with a configurable amount of resources
(memory, CPU, and sometimes GPU)
* Environments hosted in the cloud with a configurable amount of resources
(memory, CPU, and sometimes GPU).

* environments are defined using Docker, so they can be shared and reproducibly
re-created
* Environments are defined using Docker, so they can be shared and reproducibly re-created.

* auto-saving of work back to RenkuLab, so you can recover in the event of a
crash
* Auto-saving of work back to RenkuLab, so you can recover in the event of a crash.

* a git client pre-configured with your credentials to easily push your changes
back to the server
* A git client pre-configured with your credentials to easily push your changes
back to the server.

* the functionality provided by the renku-python_ command-line interface (CLI)
is automatically available
* The functionality provided by the renku-python_ command-line interface (CLI)
is automatically available.


What's in my Interactive Environment?
-------------------------------------

* your project, which is cloned into the environment on startup
* Your project, which is cloned into the environment on startup.

* your data (if the option ``Automatically fetch LFS data`` is selected)
files that are stored in git LFS*)
* Your data files (if the option ``Automatically fetch LFS data`` is selected)
that are stored in git LFS*.

* all the software required to launch the environment and common tools for
working with code (``git``, ``git LFS``, ``vim``, etc.)
* All the software required to launch the environment and common tools for
working with code (``git``, ``git LFS``, ``vim``, etc.).

* any dependencies you specified via conda (requirements.txt), using
* Any dependencies you specified via conda (``environment.yml``), using
language-specific dependency-management facilities (``requirements.txt``,
``install.R``, etc.) or installed in the ``Dockerfile``
``install.R``, etc.) or installed in the ``Dockerfile``.

* the renku command-line interface renku-python_.
* The renku command-line interface renku-python_.

* the amount of CPUs, memory, and (possibly) GPUs that you configured before launch
* The amount of CPUs, memory, and (possibly) GPUs that you configured before launch

For adding or changing software installed into your project's interactive environment,
check out :ref:`customizing`
Expand Down Expand Up @@ -78,54 +76,58 @@ configuration options.
+------------------------------+-------------------------------------------------------------------------------------------+
| Option | Description |
+==============================+===========================================================================================+
| branch | default master, but if you're doing work on another branch, switch! |
| Branch | Default is ``master``. You can switch if you are working on another branch |
+------------------------------+-------------------------------------------------------------------------------------------+
| commit | default latest, but you can launch the environment from an earlier commit; |
| | |
| | also useful if your latest commit's build failed (see below). |
| Commit | Default is the latest, but you can launch the environment from an earlier commit. This is |
| | especially useful if your latest commit's build failed (see below) or you have unsaved |
| | work that was automatically recovered. |
+------------------------------+-------------------------------------------------------------------------------------------+
| environment | ``lab``: JupyterLab; ``rstudio``: RStudio; if you're using a python template, |
| | |
| | the ``rstudio`` endpoint will not work. |
| Default Image | This provides information about the Docker image used by the Interactive Environment. |
| | When it fails, you can try to rebuild it, or you can check the GitLab job logs. |
| | An image can also be pinned so that new commits will not require a new image |
| | each time. |
+------------------------------+-------------------------------------------------------------------------------------------+
| # CPUs | the number of CPUs available; resources are shared, so please select the lowest amount |
| | that will work for your use case. |
| Default environment | Default is ``/lab``, it loads the JupyterLab interface. If you are working with ``R``, |
| | you may want to use ``/rstudio`` for RStudio. Mind that the corresponding packages need |
| | to be installed in the image. If you're using a python template, the ``rstudio`` endpoint |
| | will not work. |
+------------------------------+-------------------------------------------------------------------------------------------+
| memory | the amount of RAM available; resources are shared, so please select the lowest amount |
| | that will work for your use case. |
| Number of CPUs | The number of CPUs available, or the quota. Resources are shared, so please select the |
| | lowest amount that will work for your use case. Usually, the default value works well. |
+------------------------------+-------------------------------------------------------------------------------------------+
| # GPUs | the number of GPUs available; You might have to wait for GPUs to free up in |
| | |
| | order to be able to launch an environment. |
| Amount of Memory | The amount of RAM available. Resources are shared, so please select the lowest amount |
| | that will work for your use case. Usually, the default value works well. |
+------------------------------+-------------------------------------------------------------------------------------------+
| Automatically fetch LFS data | Leave off by default. If you find that workflows |
| | you used to be able to run have stopped working, |
| | check the contents of the file(s) -- if plain text and contains |
| | strings that are not your data, run ``renku storage pull <filepath>`` |
| | to get the relevant files, or ``git lfs pull`` to get all of the |
| | files at once. |
| Number of GPUs | The number of GPUs available. If you can't select any number, no GPUs are available in |
| | RenkuLab deployment you are using. If you request any, you might need to wait for GPUs |
| | to free up in order to be able to launch an environment. |
+------------------------------+-------------------------------------------------------------------------------------------+
| Automatically fetch LFS data | Default is off. All the lfs data will be automatically fetched in if turned on. This is |
| | convenient, but it may considerably slow down the start time if the project contains a |
| | lot of data. Refer to :ref:`Data in Renku <data>` for further information |
+------------------------------+-------------------------------------------------------------------------------------------+


What if the Docker image is not available?
------------------------------------------

Interactive environments are backed by Docker images. When launching a new
interactive environment a container is created from the image that matches the
interactive environment, a container is created from the image that matches the
selected ``branch`` and ``commit``.

A GitLab's CI/CD pipeline automatically builds a new image using the project's
``Dockerfile`` when any of the following happens:

* creating of a project
* forking a project (in which the new build happens for the fork)
* pushing changes to the project
* Creating of a project.
* Forking a project (in which the new build happens for the fork).
* Pushing changes to the project.

(This is defined in the project's ``.gitlab-ci.yml`` file.)
This is defined in the project's :ref:`.gitlab-ci.yml file <gitlab_ci_yml>`. If the project
references a pinned image, the UI will not check for the image availability - that is
usually provided by the project's maintainer and it doesn't change at every new commit.

It can sometimes take some time to build an image for various reasons, but if
you've just created the project on RenkuLab from one of the templates it should
take less than a minute.
It may take a long time to build an image for various reasons, but if you've just created the
project on RenkuLab from one of the templates, it generally takes less than a minute or two.


The Docker image is still building
Expand All @@ -144,31 +146,34 @@ The Docker image build failed
If this happens, it's best to click the link to view the logs on GitLab so you
can see what happened. Here are some common reasons for build failure:

* Software installation failure
Software installation failure
*****************************

**problem** You added a new software library to ``requirements.txt``, ``environment.yml``,
**Problem:** You added a new software library to ``requirements.txt``, ``environment.yml``,
or ``install.R``, but something was wrong with the installation (e.g. typo in
the name, extra dependencies required for the library but unavailable).

**how to fix this**
**How to fix this:**
You can use the GitLab editor or clone your project locally to fix the installation,
possibly by adding the extra dependencies it asks for into the ``Dockerfile``
(the commented out section in the file explains how to do this). As an alternative,
you can start an interactive environment from an earlier commit.

**how to avoid this** First try installing into your running interactive environment,
**How to avoid this:** First try installing into your running interactive environment,
e.g. by running ``pip install -r requirements.txt`` in the terminal on JupyterLab.
You might not have needed to install extra dependencies when installing on your
local machine, but the operating system (OS) defined in the ``Dockerfile`` has
minimal dependencies to keep it lightweight.

* The build timed out
The build timed out
*******************

By default, image builds are configured to time out after an hour. If your build
takes longer than that, you might want to check out the section on :ref:`customizing`
interactive environments before increasing the timeout.

* Your project could not be cloned
Your project could not be cloned
********************************

If you accidentally added 100s of MBs or GBs of data to your repo and didn't
specify that it should be stored in git LFS, it might take too long to clone. In
Expand Down
94 changes: 78 additions & 16 deletions docs/user/interactive_customizing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Customizing interactive environments
====================================

Very quickly you will want to make changes to the default configuration of your
interactive sessions. The default environments we provide are pretty bare-bones
so if you want to have easy access to your preferred packages, some simple steps
Very soon, you will want to make changes to the default configuration of your
interactive sessions. The default environments we provide are pretty bare-bones.
If you want to have easy access to your preferred packages, some simple steps
at the start of your project will get you on the way quickly.


Expand All @@ -14,14 +14,18 @@ Important files

The launch is enabled by the content in the following files in your project:

* language-specific files like ``requirements.txt`` or ``install.R``

* ``Dockerfile``: defines the type of interactive environment and other software
installed in the environment, including the ``renku`` command-line installation.

* ``.gitlab-ci.yml``: controls the docker build of the image based on the project's
``Dockerfile``.

* ``requirements.txt`` or ``install.R``: language-specific files controlling the
libraries.

* ``.renku/renku.ini``: renku project configurations containing a
``[renku "interactive"]`` section.

The most basic modifications are installations of additional packages. This can be
done automatically for Python and R projects if you add the packages you want
to ``requirements.txt`` and ``install.R`` respectively.
Expand All @@ -31,26 +35,32 @@ Dockerfile structure
--------------------

The project's ``Dockerfile`` lives in the top level of the project directory. In
the default ``Dockerfile`` provided in the template, the first line is a ``FROM``
statement that specifies a `versioned base docker image <https://github.com/SwissDataScienceCenter/renku-jupyter>`_.
the default ``Dockerfile`` provided in the template, the first line is a
``RENKU_BASE_IMAGE`` argument used to feed the following ``FROM`` instruction.
It specifies a
`versioned base docker image <https://github.com/SwissDataScienceCenter/renku-jupyter>`_.
We add new versions periodically, but the heart of it is the set of installations
of jupyterlab/rstudio, git, and renku::

FROM renku/singleuser:0.3.5-renku0.5.2
ARG RENKU_BASE_IMAGE=renku/renkulab-py:3.7-0.7.3

# or, for RStudio in the build
# or, for RStudio

FROM renku/singleuser-r:0.3.5-renku0.5.2
ARG RENKU_BASE_IMAGE=renku/renkulab-r:4.0.0-0.7.3

The next two statements install user-specified libraries from ``environment.yml``
and ``requirements.txt``::

# install the python dependencies
COPY requirements.txt environment.yml /tmp/
RUN conda env update -q -f /tmp/environment.yml && \
/opt/conda/bin/pip install -r /tmp/requirements.txt && \
conda clean -y --all && \
conda env export -n "root"
/opt/conda/bin/pip install -r /tmp/requirements.txt && \
conda clean -y --all && \
conda env export -n "root"

Then we specify the renku version to be installed through ``pipx``::

ARG RENKU_VERSION=0.12.1

You can add to this ``Dockerfile`` in any way you'd like.

Expand All @@ -62,7 +72,7 @@ Dockerfile development
Before we get into modifying Dockerfiles, if you want to know how to update
the base version of your renkulab image, see `Upgrading Renku <upgrading_renku>`_.

If you're going to be making simple modifications to the ``Dockerfile`` (i.e. changing
If you're going to make simple modifications to the ``Dockerfile`` (i.e. changing
the base Docker image version number), you can use the following steps to update
and re-build the image:

Expand All @@ -73,8 +83,8 @@ and re-build the image:
#. When you're satisfied with the edits, scroll down and write a meaningful **commit message** (you'll thank yourself later).
#. Click the green **Commit changes** button.

You may find the [official docker documentation](https://docs.docker.com/engine/reference/builder/) useful
during this process.
You may find the `official docker documentation <https://docs.docker.com/engine/reference/builder/>`_
useful during this process.

Now you have committed the changes to your ``Dockerfile``. Since you have made a commit,
the CI/CD pipeline will kick off (pre-configured for you as a ``renkulab-runner``
Expand Down Expand Up @@ -164,6 +174,58 @@ these base ``Dockerfile`` s and add the ``renku``, ``git``, and ``jupyter``
parts to another base image that you might have.


Renku project configurations
----------------------------

When starting a new Interactive Environment, most of the options can be manually
changed by the user. Depending on the specific RenkuLab deployment, you can select
more RAM, a higher CPU quota, etc.

Your project may even include a package with an advanced UI (like
`Streamlit <https://renku.discourse.group/t/how-to-deploy-streamlit-in-renku/169>`_)
and you probably want to choose it as default.

It's possible to set a default value for all these options using the project
configurations stored in the ``.renku/renku.ini`` file.
Once you do that, each time a user tries to start a new environment, those options will
be pre-selected.

.. note::

Manually modifying the ``renku.ini`` file is not recommended.
You can use the
`renku config command <https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.config>`_
form an interactive environment.

renku config set interactive.default_url "/tree"

We are working on adding a user friendly solution to set default options on
the project's settings page.

**What are the specific options?**

You can find a comprehensive list of options :ref:`on this page <renku_ini>`. Most commonly,
you may want to change the ``default_url`` or set a specific ``image``.

The first case is useful when you prefer to show a different default UI, like the standard
Jupyter interface ``/tree``, or when you need support for a different interface,
like R studio ``/rstudio`` or ``/streamlit`` (not included in the standard Python template).

The ``image`` is useful when you settle on a Docker image and you don't need to change it
anymore. The benefit is particularly evident when building a new image takes a lot of time
(e.g. you added big packages) or when you expect the project to be used by a lot of people
over a short period of time (e.g. you use it in a presentation or a lecture).

Even if it's common to start the environment with the default values, setting a default value
doesn't prevent a user from changing it.

.. note::

Mind that not all the RenkuLab deployments have the same set of options or allow to choose
the same values. If no GPUs are available, setting the default number to ``1`` can't work.
Should this be the case, a warning will show before starting a new environment.


Getting Help
------------

Expand Down
Loading

0 comments on commit 6a0c028

Please sign in to comment.