Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to JupyterLab 3.0 #204

Merged
merged 10 commits into from
Jun 4, 2021
Merged

Conversation

GeorgianaElena
Copy link
Member

@GeorgianaElena GeorgianaElena commented Feb 8, 2021

Closes #197

  • Update2: The jupyter-server-proxy version has been bumped. This only needs building the new user image and some manual testing.
    • Build the new image and test this
  • Update: This was tested and it works, just needs a new jupyter-server-proxy release 🌞

@consideRatio
Copy link
Contributor

consideRatio commented Feb 8, 2021

@GeorgianaElena this is also needed I think.

singleuser:
  extraEnv:
    # ref: https://github.com/jupyterhub/jupyterhub/blob/master/docs/source/reference/config-user-env.md#switching-to-jupyter-server
    JUPYTERHUB_SINGLEUSER_APP: jupyter_server.serverapp.ServerApp

@yuvipanda
Copy link
Member

Hmm, I'd like us to not switch to serverapp yet, and continue using the notebook package for now. The situation around jupter_server, nbclassic, notebook packages is very confusing to me :D And since the primary interface used in teaching is still the classic notebook package, I want us to stick to that as closely as possible for now. I'm also already running JupterLab 3.0 with the notebook package at berkeley, and it seems to be doing ok.

@consideRatio
Copy link
Contributor

consideRatio commented Feb 8, 2021

I'm also already running JupterLab 3.0 with the notebook package at berkeley, and it seems to be doing ok.

Oh so using ServerApp wasn't required? Ah!

I think the issues I mentioned for jupyter-server-proxy related to ServerApp specifically, and then there is no need for the 1.6.0 release.

@yuvipanda
Copy link
Member

Nope! Check it out - https://github.com/berkeley-dsep-infra/datahub/blob/staging/deployments/datahub/images/default/infra-requirements.txt. Jupyterlab continues to function fine as a notebook extension

@yuvipanda
Copy link
Member

This LGTM, @GeorgianaElena :)

@GeorgianaElena
Copy link
Member Author

Thanks @yuvipanda! I'm building the image right now and want to test it on staging before merging.

@GeorgianaElena
Copy link
Member Author

For some reason, when trying to build the image, it gets stuck in:

Step 30/38 : RUN conda env update -p ${CONDA_DIR} -f /tmp/environment.yml
 ---> Running in 0b0a7ff246ae
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... 
Warning: 2 possible package resolutions (only showing differing packages):
  - conda-forge/linux-64::pyrsistent-0.17.3-py38h497a2fe_2, conda-forge/noarch::tableschema-1.19.3-pyh9f0ad1d_0
  - conda-forge/linux-64::pyrsistent-0.16.0-py38h1e0a361_0, conda-forge/noarch::tableschema-1.20.0-pyh9f0ad1d_0done

Downloading and Extracting Packages
hdf4-4.2.13          | 951 KB    | ########## | 100% 
(...)
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

:( @consideRatio or @yuvipanda, did you encounter this before and know what the issue may be?

@yuvipanda
Copy link
Member

hmm, how long is it stuck there? Maybe the docker image is full?

I unfortunately do not have enough conda experience to debug this :( This is why I always just use pip, though. So if we can't figure this out in a bit, let's just move the packages to pip.

@GeorgianaElena
Copy link
Member Author

I think yesterday I left it more than an hour and nothing happened.

The thing is that it used to work before bumping the version of some packages (I got some version conflicts) and now nothing :(
Pkgs version is probably unrelated though... I'll try again with the ones that gave conflicts, just to double check and because staring at a screen that's stuck makes me do desperate things 😅

@GeorgianaElena
Copy link
Member Author

@yuvipanda, when pining pip to 20.2 (as per this comment conda/conda#6986 (comment)), it got pass the conda env update step 🤷, but got stuck because of the nodejs version requirement of jupyterlab3.

Few questions:

  • Does it make sense to have pip pinned? (I'm running conda env update with -v flag to see if I get any useful info about why it gets stuck with the newest pip)
  • jupyterlab3 requires nodejs >=12.0.0, but we have this comment in our Dockerfile:
# Our pre-built R packages from rspm are built against system libs in focal
# rstan takes forever to compile from source, and needs libnodejs
# So we install older (10.x) nodejs from apt rather than newer from conda

I don't see it pinned to 10.x anywhere, but I guess this is the newest version available from apt, right?
I'm not sure what to do about this.

@yuvipanda
Copy link
Member

mmmm, fun.. Here's what I suggest:

  1. Get rid of our environment.yaml file completely. We use python 3.8, which also comes by default with our miniforge install. We don't need to upgrade that right now, so we can let that go..
  2. Move all the packages in environment.yml to a requirements.txt. This means we will install them with pip instead of conda, and that's ok.
  3. JupyterLab only requires 12.x for installing extensions with jupyter labextension - it doesn't need nodejs at all for extensions installed via pip. So let's find the extensions that can't be installed via pip yet, and just ignore those for now. If those turn out to be super important, we can instead disable the rstan package for now.

How does this sound?

@GeorgianaElena
Copy link
Member Author

How does this sound?

Sounds like a plan 🦸‍♀️

JupyterLab only requires 12.x for installing extensions with jupyter labextension - it doesn't need nodejs at all for extensions installed via pip. So let's find the extensions that can't be installed via pip yet, and just ignore those for now. If those turn out to be super important, we can instead disable the rstan package for now.

So the labextentions we had before are:

    @jupyter-widgets/jupyterlab-manager@2 \
    @jupyterlab/[email protected] \
    @[email protected]

I believe that only jupyter-server-proxy needs nodejs still, but there's an open PR for this: jupyterhub/jupyter-server-proxy#245 🎉

@consideRatio
Copy link
Contributor

consideRatio commented Feb 10, 2021

In a image having quite a bit of conda packages, I got a speedup from 60 minute builds to 40 minute builds using mamba. The time invested in conda command was cut very significantly, and I wouldn't be that surprised if conda would stall for 60+ minutes if it was a bit starved of RAM or something like that.

For reference, this is how I've most recently installed the base python environment using mamba instead of conda, and it works great for a large image with lots of dependencies!

# Install a minimal conda based python disitribution and ensure we use a
# specific python version. Instead of using conda, we use mamba which increase
# performance. Instead of the default channel, we use conda-forge by default as
# well.
#
# Ref for latest versions and SHA: https://github.com/conda-forge/miniforge/releases
ENV MAMBAFORGE_VERSION=4.9.2-5 \
    MAMBAFORGE_SHA256SUM=7f0ad0c2f367751f7878d25a7bc1b4aa48b8dcea864daf9bc09acb595102368b \
    PYTHON_VERSION=3.7
ENV MAMBAFORGE_FILENAME=Mambaforge-${MAMBAFORGE_VERSION}-Linux-x86_64.sh
ENV MAMBAFORGE_URL=https://github.com/conda-forge/miniforge/releases/download/${MAMBAFORGE_VERSION}/${MAMBAFORGE_FILENAME}
RUN cd /tmp \
    # Download, verify checksum, install, cleanup
 && wget --quiet $MAMBAFORGE_URL \
 && echo "${MAMBAFORGE_SHA256SUM}  ${MAMBAFORGE_FILENAME}" | sha256sum --quiet -c - \
 && /bin/bash "${MAMBAFORGE_FILENAME}" -f -b -p $CONDA_DIR \
 && rm "${MAMBAFORGE_FILENAME}" \
    # Install desired Python version, cleanup
 && mamba install python=${PYTHON_VERSION} --quiet --yes \
 && mamba clean --all --yes -f \
    # Preserve behavior of miniconda - packages come from conda-forge + defaults
 && conda config --system --append channels defaults \
 && fix-permissions $CONDA_DIR $HOME

Note that fix-permissions and $CONDA_DIR relate to jupyter/docker-stacks images

@GeorgianaElena
Copy link
Member Author

I was aware of mamba, but didn't get to use it yet. The speedup seems really impressive 🎉 Thanks for sharing @consideRatio!

I won't mind giving mamba a try for this. @yuvipanda, what do you think, go mama 💃 or stick with pip? :D

@GeorgianaElena
Copy link
Member Author

Also, running conda env update with -v flag showed nothing, but with -vvv, the last debug message before hanging was this one:

DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGABRT
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGINT
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGTERM
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGQUIT

It really doesn't make any sense for me, and I don't see any connection between pip and de-registering signal handlers 🤷

@consideRatio
Copy link
Contributor

consideRatio commented Feb 10, 2021

The approach I've evolved to is to rely on pip as much as possible, but when needed, let individual packages be installed with mamba using the conda-forge channel.

To first transition to using pip for everything is a good first step towards such approach btw, because then that is the default and only what is needed can be migrated to be conda/mamba installed.

Motivation

  • conda packages releases follow pip releases typically, but sometimes the delay in between are too long for certain packages
  • conda/mamba is slower to build than pip
  • I understand pip packages and the dependencies far better than conda packages, so issues are easier to resolve
  • sometimes a package must be installed with conda/mamba, and then so be it

@yuvipanda
Copy link
Member

@GeorgianaElena I definitely want us to explore mamba! But I want us to finish this up as soon as we possibly can and move on to other tasks, here and in UToronto, so let's open an issue for that instead?

@GeorgianaElena
Copy link
Member Author

Thanks for sharing all this knowledge @consideRatio <3

@yuvipanda
Copy link
Member

My approach to packaging is also exactly @consideRatio's :D I used conda here as an experiment to see what kinda complexity I can deal with, and unfortunately it looks like the amount of knowledge I have isn't good enough to deal with conda packaging complexities...

@GeorgianaElena
Copy link
Member Author

@yuvipanda, do you want to wait for a new release of jupyter-server-proxy that doesn't require nodejs anymore or just go without it until then?

@yuvipanda
Copy link
Member

@GeorgianaElena if we can test that everything else in this PR works, we can just wait until that release is made. Not urgent :D

@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Feb 10, 2021

I built the image (as it is, without jupyter-server-proxy extension enabled for jupyterlab, but pkg installed), pushed it, then deployed it to staging.

What happened:

  • we have jupyterlab3
  • jupytext seems to work
  • ipywidgets seems to work
  • health check failed with a timeout error and saw this warning related to jupyter-server-proxy in the pod logs:
[W 2021-02-10 16:07:43.160 SingleUserNotebookApp notebookapp:1950] Error loading server extension jupyter_server_proxy
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/notebook/notebookapp.py", line 1945, in init_server_extensions
        func(self)
      File "/opt/conda/lib/python3.7/site-packages/jupyter_server_proxy/__init__.py", line 27, in load_jupyter_server_extension
        server_processes += get_entrypoint_server_processes()
      File "/opt/conda/lib/python3.7/site-packages/jupyter_server_proxy/config.py", line 75, in get_entrypoint_server_processes
        make_server_process(entry_point.name, entry_point.load()())
      File "/opt/conda/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2460, in load
        self.require(*args, **kwargs)
      File "/opt/conda/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2483, in require
        items = working_set.resolve(reqs, env, installer, extras=self.extras)
      File "/opt/conda/lib/python3.7/site-packages/pkg_resources/__init__.py", line 790, in resolve
        raise VersionConflict(dist, req).with_context(dependent_req)
    pkg_resources.ContextualVersionConflict: (nbformat 4.4.0 (/opt/conda/lib/python3.7/site-packages), Requirement.parse('nbformat>=5.0'), {'nbclient'})
[W 2021-02-10 16:07:44.255 LabApp] 'terminado_settings' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2021-02-10 16:07:44.255 LabApp] 'terminado_settings' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
  • Update: when opening or creating a notebook from the clasical notebook interface I get a 500 Internal Server Error and see this in the logs:
AttributeError: 'InteractExporter' object has no attribute 'template_path'

@yuvipanda
Copy link
Member

ah, fun! I think the 'InteractExporter' is from nbinteract, which we can remove - I don't think anyone is actively using that, and Jupyyterbook's thebe does similar things I think.

The other one is concerning :|

@GeorgianaElena
Copy link
Member Author

Removing nbinteract from my single user server seemed to have solved the 500 error issue.
Though in the logs I'm still seeing bunch of similar warnings like:

[W 2021-02-10 17:36:16.906 SingleUserNotebookApp configurable:168] Config option `template_path` not recognized by `ExporterCollapsibleHeadings`.  Did you mean one of: `extra_template_paths, template_name, template_paths`?

@choldgraf
Copy link
Member

Hey all - what is the state of this one? Are we blocked on the error messages reported above?

@yuvipanda
Copy link
Member

All the labextensions that were giving us trouble have now been upgraded to work with JupyterLab 3.0, so this can be picked up again.

@consideRatio
Copy link
Contributor

fwiw Berkeley has been running JupyterLab 3.0 with nbgitpuller for many months now with no issues

Are you running notebook servers or jupyter_server servers?

@damianavila
Copy link
Contributor

Are you running notebook servers or jupyter_server servers?

That's the key... see my comment here: pangeo-data/pangeo-docker-images#206 (comment)

Particularly this piece (pasted here for a quick look):

AFAIK, if you start with jupyter notebook, that is calling the old server whereas if you start with jupyterlab, that is calling the new one. So you may end up in a situation where you started with lab (using the underlying new server) but then redirected to /tree to use the legacy notebook view.
In that scenario, if you have a server extension that is not compatible with the new server, it will fail... I think...

The fix for this sort of scenarios is to make your extension also compatible with the new server: https://jupyter-server.readthedocs.io/en/latest/developers/extensions.html#migrating-an-extension-to-use-jupyter-server

And support both, the legacy and the new server, for at least some time, until the transition completes...

@consideRatio
Copy link
Contributor

@damianavila do you think what I observed (jupyterhub/nbgitpuller#176) as a web-frontend issue is a sympton of this, or do you think there are two separate issues to report in nbgitpuller?

@damianavila
Copy link
Contributor

I think it is a symptom... if you control that environment, try launching the legacy notebook app vs the jupyterlab app and check if things behave in a different way.

@consideRatio
Copy link
Contributor

consideRatio commented May 5, 2021

@damianavila thanks, I opened jupyterhub/nbgitpuller#177!

@GeorgianaElena then I think with regards to nbgitpuller, as long as we don't start using jupyter_server as part of this PR, then we are fine. @damianavila is that your understanding as well?

@damianavila
Copy link
Contributor

@damianavila is that your understanding as well?

IIRC JupyterLab 3 actually has the new jupyter_server as a requirement but, I think, we are defaulting to /tree in our hubs: https://github.com/2i2c-org/pilot-hubs/blob/master/hub-templates/base-hub/values.yaml#L144

So we should be fine, maybe? We should definitely test nbgitpuller on top of the current PR and see how it behaves...

@choldgraf
Copy link
Member

Hey all - what are the blockers to getting this one merged?

@damianavila
Copy link
Contributor

Hey all - what are the blockers to getting this one merged?

⬇️

We should definitely test nbgitpuller on top of the current PR and see how it behaves...

I guess it is just a matter of testing if the nbgitpuller works as expected. And solve the conflicts.
I left some other general comments about package version specification but those should not be a blocker, IMHO.

@GeorgianaElena
Copy link
Member Author

I've just fixed the conflicts and re-built the user image. I also tried to test out nbgitpuller and it seemed to be working.

What I did:

  • created nbgitpuller links that were either for lab or for the classical interface. The repo got cloned ok in both cases.
  • defaulted the staging hub to /lab and then tried the same thing from above again. Repos got cloned ok.

A strage thing I noticed was that running jupyter server extension list (as per pangeo-data/pangeo-docker-images#206 (comment)), didn't show nbgitpuller at all when the hub was defaulted to /lab.
nbgitpuller only showed up when opening up a terminal from lab, when the hub was defaulting to /tree.

I would highly appreciate if this nbgitpuller situation got another pair of 👀 If anyone wants to give this a try, the staging hub (https://staging.pilot.2i2c.cloud) is using lab v3 and it's defaulting to this interface.

@damianavila
Copy link
Contributor

I've just fixed the conflicts and re-built the user image. I also tried to test out nbgitpuller and it seemed to be working.

Super!

defaulted the staging hub to /lab and then tried the same thing from above again. Repos got cloned ok.

Interesting... is the cloning capability actually calling the server-side extension to perform the action?
I was expecting that but maybe nbgitpuller is working on another way? @yuvipanda, do you have more context?

A strage thing I noticed was that running jupyter server extension list (as per pangeo-data/pangeo-docker-images#206 (comment)), didn't show nbgitpuller at all when the hub was defaulted to /lab.

Yep, that is expected because AFAIK, nbgitpuller is not yet compatible with the new server...

nbgitpuller only showed up when opening up a terminal from lab, when the hub was defaulting to /tree.

Yep, expected as well...

Wondering if other actions that nbgitpuller performs that involve server-side code is actually working on the system defaulted to /lab (I am not a nbgitpuller user so not sure what those actions would be although I guess the syncing capabilities should have a server-side component). I would expect a failure because you do not have a server-side counterpart installed/enabled... but maybe somehow the new server is still loading old stuff (I don't think so, but I can always be wrong 😉?

@GeorgianaElena
Copy link
Member Author

I haven't used nbgitpuller much either. So I cannot be sure that everything works with this new lab3 setup.

I'm thinking of two approaches, in order to unblock this PR:

  • Merge this tomorrow morning if nobody has any objection until then. Then watch for user bug reports and try to solve them, or revert this entirely
  • Implement @consideRatio's suggestion, then merge this PR and solve any issues that may come up.

What do you think?

@damianavila
Copy link
Contributor

damianavila commented Jun 1, 2021

Merge this tomorrow morning if nobody has any objection until then. Then watch for user bug reports and try to solve them, or revert this entirely

I am not sure to what extent our users are actually using nbgitpuller, but I think this ⬆️ is an OK approach since we have some previous reports that things are going OK with this in other places and we have already identified the possible issue, so if anything arises, we should be able to quickly fix it/workaround it/revert it.

@GeorgianaElena
Copy link
Member Author

Thanks a lot @damianavila!
Merging this 🤞

@GeorgianaElena GeorgianaElena merged commit 55e083c into 2i2c-org:master Jun 4, 2021
@choldgraf
Copy link
Member

wahooo!!! congrats @GeorgianaElena and thanks everybody else for helping out :-)

GeorgianaElena added a commit to GeorgianaElena/jupyterhub-deploy that referenced this pull request Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to JupyterLab 3.0
5 participants