Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Hub] Jack Eddy Symposium #1329

Closed
9 tasks done
damianavila opened this issue May 25, 2022 · 60 comments
Closed
9 tasks done

[New Hub] Jack Eddy Symposium #1329

damianavila opened this issue May 25, 2022 · 60 comments
Assignees

Comments

@damianavila
Copy link
Contributor

damianavila commented May 25, 2022

Hub Description

The request is for the launch or use of a research hub with Dask. (Daniel Marsh, one of the co-organizers, wishes to share tutorial notebooks. He plans to use "intake-esm to access all the CMIP6 climate runs which are hosted in zarr format on aws. We can pull down all the time series data and regress it against the solar forcing used to derive the solar response."

Community Representative(s)

@colliand, can you give us the contact information for the community representatives.
Looking at the lead issue, it seems the contacts would be Daniel Marsh and/or Ryan McGranaghan. If that is the case, do you have their contact information? Any GitHub handles?

Important dates

  • Required start date: Jun 1st
  • Target start date: ASAP
  • Any important dates for usage:
    • Start date for the event: June 6
    • End date for the event: June 10
    • The active times for the event (e.g., 9am to 5pm US/Pacific) 8 AM MT - 6 PM MT
    • How many people will attend the event? ~100-150
    • Do you need to need hub infrastructure to be [pre-initialized before the event] Not a requirement, but a nice-to-have

Hub Authentication Type

Other (may not be possible, please specify in comments)

Hub logo information

Hub user image

  • Repository for user image: { REPO LINK IF IT EXISTS }
  • User image registry: { REGISTRY IF ONE ALREADY EXISTS }
  • User image tag and name: { NAME AND TAG IF IT EXISTS }

Extra features you'd like to enable

  • Specific cloud provider or datacenter: AWS
  • Dedicated Kubernetes cluster
  • Scalable Dask Cluster

Other relevant information

I presume GitHub auth would be OK but we need to confirm it.
It seems they might need to interact with datasets on AWS, so it might make sense to deploy in that cloud provider...
From the lead description, it is not clear to me how this hub will be paid so there is some stuff to figure it out in case we need a new AWS land to deploy into...

Hub URL

jackeddy.2i2c.cloud

Hub Type

daskhub

Tasks to deploy the hub

  • Engineer who will deploy the hub is assigned
  • Deploy information filled in above
  • Initial Hub deployment PR:
  • Administrators able to log on
  • Community Representative satisfied with hub environment
  • Hub now in steady-state
@colliand
Copy link
Contributor

colliand commented May 25, 2022

Yes! I invite @rmcgranaghan and @dan800 to collaborate with us in establishing this hub for the Eddy Symposium. Please note information requests in the anchor entry of this issue.

(While searching for Dan, I found this repo in his collection: https://github.com/dan800/intake-esm.)

@dan800
Copy link

dan800 commented May 25, 2022

Thanks for the invite! Essentially intake-esm is provide a data access layer allowing you to pick a model, experiment and variable. Regarding using it to access datasets on AWS, this was the example I found most useful:

https://github.com/hdrake/cmip6-temperature-demo/blob/master/notebooks/01_calculate_ECS_Gregory_method.ipynb

It's pulling down the surface temperature and radiative balance for the CMIP models for 4xCO2 experiments to calculate climate sensitivity. I haven't run it in a while, but the catalog is here:
https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv

For our project, the 'hist-sol' runs (solar variability only) might be the first place to look.

@colliand
Copy link
Contributor

Hi @damianavila! There appear to be two distinct usages of the word image in the information request. The Hub logo image might be a .jpg or .png file. The Hub user image refers to a Docker image (see Jupyter Docker Stacks for more information).

Unless @dan800 or @rmcgranaghan intervene with other advice, I suggest that we use the image below for the hub logo and that the 2i2c team choose the Docker image for a Daskhub similar to those used by the Pangeo community.

eddy-symposium

@damianavila
Copy link
Contributor Author

@dan800 is there any preference in the cloud provider? From the top description, it seems you needed to access some data in AWS land, but in your most recent update, it seems the data lives in GCP. Are you OK with the hub deployed on GCP land?

cc @yuvipanda, this might be relevant to decide where we deploy the hub.

@dan800
Copy link

dan800 commented May 25, 2022

@damianavila No preference - I may have had it wrong and it's GCP! Please choose whatever you think is most efficient/cost effective.

@colliand
Copy link
Contributor

colliand commented May 25, 2022

I enjoyed an excellent Zoom call with Ryan and Dan earlier today. We exchanged a lot of information! Updates that I hope will be helpful to the 2i2c team are shared next.

Heliophysics software environment for Eddy Symposium?

Ryan pointed me to this excellent talk by Brian Thomas of NASA on HelioCloud(starting at 22:28). The talk describes a vision for collaborative and accelerated research for the heliophysics community by making the curated tools and data more easily accessible. (Thanks @brianthomas for the talk and for openly sharing your work! It would be great to link up with you to see if we can collaborate.)

Brian's talk pointed at the following resources:

(These images were cloned and modified from pangeo-data/pangeo-docker-images project. FYI @rabernat. )

Before learning about HelioCloud, we (Ryan McGranaghan, Dan Marsh, @fperez and me) planned to deploy a DaskHub with a Pangeo-style software environment for usage at the upcoming Jack Eddy Symposium with future plans to customize the environment to better support heliophysics research teams. HelioCloud may allow us to advance faster! While there may be too little time to get this all in place, I ask @yuvipanda anyway... can you deploy a DaskHub with the HelioCloud environment for us?

Access control for the Symposium?

Ryan McGranaghan created the Jack Eddy Symposium GitHub Organization. Dan, Ryan and others will use this structure to host some repositories with notebooks to be used at the Symposium. We'd also like to use this organization to define the allow-list of users who should be authorized to access the the hub at jackeddy.2i2c.cloud. The content will be shared during the Symposium using nbgitpuller.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue May 26, 2022
- Put in GCP as the CMIP6 data desired is on GCP as well.
  However, I thin GCP's CMIP6 dataset is more limited than AWS' -
  we cna move this to AWS if needed.
- SCRATCH_BUCKET is setup
- Secure push to GitHub via gh-scoped-creds is setup
- GitHub teams is used for auth
- Default memory limits are 1Gi Limit + 256Mi Guarantee. I *think*
  this won't be enough - let's customize this as needed.
- I tried to use https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py
  built from https://git.mysmce.com/heliocloud/heliocloud-docker-images,
  but it doesn't work - primarily because it has JupyterHub 2.0 in
  there, and we're still at 1.5.  Need to debug that. In the meantime,
  just using the default pangeo image.

Ref 2i2c-org#1329
@yuvipanda
Copy link
Member

Ready for testing at https://jackeddy.2i2c.cloud/! Access control is set up so that anyone part of the https://github.com/jack-eddy-symposium organization can log in. https://github.com/yuvipanda/gh-scoped-creds/ is set up for secure pushing to GitHub (see this blog post for details). There's also a shared/ drive, with write access for admins. And a SCRATCH_BUCKET for users to temporarily store stuff in object storage. https://docs.2i2c.org/en/latest/user/storage.html has more information on these features.

dask-gateway is also set up.

In short, this is fully set up as a Pangeo environment.

I tried to use the image https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py, but unfortunately it failed. I'm not sure why - it needs a bit of a deeper investigation. I'll try to do that soon, but any information about wether those images are currently being used in other hubs will be helpful. I also can't create an account on the gitlab instance (https://git.mysmce.com/heliocloud/heliocloud-docker-images) where the image is hosted, so not sure how to contribute.

I've set a 1G memory limit, a 1 CPU guarantee / 2 CPU limit. We can tweak these as needed.

@colliand @dan800 please try it out and let me know what needs to change!

@yuvipanda
Copy link
Member

We can also prewarm the cluster before the symposium starts so users can get on much quicker.

@colliand
Copy link
Contributor

Thanks @yuvipanda. The hub does not appear to complete the spawning process. Here is the error message I received:

2022-05-26T19:55:12Z [Normal] pod didn't trigger scale-up: 1 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up
Event log
Server requested
2022-05-26T19:55:09Z [Warning] 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector.
2022-05-26T19:55:12Z [Normal] pod didn't trigger scale-up: 1 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up

@dan800
Copy link

dan800 commented May 26, 2022

Thanks @yuvipanda. The hub does not appear to complete the spawning process. Here is the error message I received:

Same for me.

@yuvipanda
Copy link
Member

Looking in the console, I see:

image

Investigating...

@yuvipanda
Copy link
Member

@colliand @dan800 based on my reading of https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation#resource_availability, the cloud was just full for a bit! It's working again now.

@colliand
Copy link
Contributor

Wow! Thanks Yuvi! I see Jovian moons orbiting the logo....and there's the lab interface. Merci!

@dan800
Copy link

dan800 commented May 27, 2022

Hello both. Thanks for working on this after hours. So I was able to access the server and tried a very simple notebook. It crashed whan I tried to access a slice of CMIP6 model output - I think because it takes up 3 or 4 GB of RAM. Just reading the CMIP6 catalog does use a large amount of memory. The notebook is here:
shared-readwrite/intake_example.ipynb
Probably down to me not knowing enough to chunk the data efficiently.

@yuvipanda
Copy link
Member

@dan800 ok, I've setup a new nodepool with higher resources for this. You should have access to about 24G of RAM and about 4 CPUs now. Give it a shot?

@yuvipanda
Copy link
Member

We can tone down the resource requests for the actual event if necessary.

@dan800
Copy link

dan800 commented May 27, 2022

@yuvipanda Great, that works! Thanks for getting this working. The memory usage was around 1.25 GB. This simple example extracted a 165 years of monthly mean temperatures at a particular location in the atmosphere from a climate model simulation. It's a good starting point for apply time series analysis.

@rmcgranaghan
Copy link

Brilliant! Thank you for the excellent work @yuvipanda. I'm able to spin up an instance and am creating material now. I've added a directory for tutorials and populated with a README markdown file and some useful starter notebooks. Please add any helpful tutorials there and update the README as you do. This will likely be the starting place for the participants, so we want it to be instructive

@brianthomas
Copy link

Hi all,

I can add folks to our gitlab to contribute (that would be awesome). I'd be happy to work with you all to figure out why the container crashed. @yuvipanda please contact @rmcgranaghan by email to get my email and contact me. I'll then be able to add you to our gitlab.

-brian

I tried to use the image https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py, but unfortunately it failed. I'm not sure why - it needs a bit of a deeper investigation. I'll try to do that soon, but any information about wether those images are currently being used in other hubs will be helpful. I also can't create an account on the gitlab instance (https://git.mysmce.com/heliocloud/heliocloud-docker-images) where the image is hosted, so not sure how to contribute.

I've set a 1G memory limit, a 1 CPU guarantee / 2 CPU limit. We can tweak these as needed.

@colliand @dan800 please try it out and let me know what needs to change!

@colliand
Copy link
Contributor

I've shared Yuvi's email with Ryan to complete the link back to Brian. Thanks to all involved for sharing expertise and contributing toward the success of the Eddy Symposium!

@fperez
Copy link
Contributor

fperez commented May 28, 2022

Thx a lot @yuvipanda! Quick q - did you base the image on the docker/apt/env.yml/etc files they had for their heliocloud, but moved over to one of our repos?

I'd love as a goal of this event to harmonize further how we all do this type of thing even further. Right now we all do very similar things (JMTE hubs, Berkeley ones, 2i2c ones, HelioCloud, ...) but with ever so slight tweaks in the workflow (where some files go, workflow for updates, etc). I think there's an opportunity to adopt some common, more standardized practices on this.

I'd be happy to take some notes together with @rmcgranaghan @brianthomas @colliand et al in Vail on this, linking up with the 2i2c team as needed (but without imposing on any of you our CO schedule).

@fperez
Copy link
Contributor

fperez commented May 28, 2022

Actually, question: where is the set of config files now for this particular hub, in case I want to suggest any other tools/updates?

@fperez
Copy link
Contributor

fperez commented May 28, 2022

@yuvipanda - you know what I'm going to ask for :) Basically the same s159/JMTE toy set - VNC, show hidden files, extensiosn like git & jupyterlab-favorites, url proxy support, syncthing, node selector on landing, etc.

My talk will be about "living la vida nube" in s159/jmte with a uniform workflow, would be fantastic for the attendees to have access to the same set of default toys that I used this semester to achieve this smooth environment in teaching and research.

I'm also talking at the EarthCube meeting the week immediately after - same story. I want to use these events as an opportunity to streamline these patterns as much as possible.

Thanks for all the work folks!!

@yuvipanda
Copy link
Member

@fperez so this currently uses the default pangeo image, which now already has gh-scoped-creds. https://git.mysmce.com/heliocloud/heliocloud-docker-images is the possible image that is going to be used - although it's currently failing and needs debugging. So the current image used is just https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook. Config is at #1337

So if I hear this currently, things you want in the image used would be:

  1. VNC
  2. jupyterlab-git
  3. jupyterlab-favorites
  4. jupyter-server-proxy (already installed)
  5. syncthing
  6. profile_list offering multiple size options
  7. show hidden files on the hub (this is a config options, I'll just turn that on)

I think some of these (perhaps jupyterlab-git?) can just go in the default pangeo image, while others probably need some custom image made for this event.

@dan800
Copy link

dan800 commented Jun 2, 2022

@yuvipanda Has something changed? I don't seem to have access to intake now:


ModuleNotFoundError Traceback (most recent call last)
Input In [1], in <cell line: 5>()
3 import pandas as pd
4 import xarray as xr
----> 5 import intake

ModuleNotFoundError: No module named 'intake'

@yuvipanda
Copy link
Member

@dan800 I used the heliocloud image (https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/tree/main/helio-notebook-py) as the base, and that doesn't seem to have intake. I'm adding it now.

@yuvipanda
Copy link
Member

Done in 8d93017776082c1ea834096e1e72fd0b5dfb78c4, will update after the build is complete

@yuvipanda
Copy link
Member

I can base it off the pangeo image instead of the heliocloud image if you want too

@yuvipanda
Copy link
Member

@dan800 I added intake https://github.com/2i2c-org/jackeddy-image/blob/main/environment.yml, on top of the base in https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/blob/main/helio-notebook-py/environment.yml. https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml is the env file for pangeo. LMK if you want me to just add specific packages to our image, or base it off the pangeo image instead of the heliocloud image.

@damianavila
Copy link
Contributor Author

We should put that request in the event issue once it is created.

FYI, I have created a dedicated issue for the event: #1384

@dan800
Copy link

dan800 commented Jun 7, 2022

@dan800 I used the heliocloud image (https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/tree/main/helio-notebook-py) as the base, and that doesn't seem to have intake. I'm adding it now.

@yuvipanda Intake now loads - thanks! This may have already been flagged, but with the latest kernel I get the following error when trying to load the data from the cloud:

ImportError: Please install gcsfs to access Google Storage

@fperez
Copy link
Contributor

fperez commented Jun 7, 2022

@dan800 - I updated the configurator with the image that has gcsfs after adding it to the environment, so you should be now good to go on this!

@fperez
Copy link
Contributor

fperez commented Jun 7, 2022

On the other hand, I have a question for @yuvipanda - @colliand tried to add @rmcgranaghan as an admin in #1389, and I merged that, but something is not happy at the deploy stage. I'd be happy to try and fix the issue but I'm unfortunately beyond my comfort zone here, so I'll need a bit of input/help if possible.

@yuvipanda
Copy link
Member

@fperez I see @rmcgranaghan is an admin now!

@rmcgranaghan
Copy link

Thanks, @yuvipanda - confirming that I have access to the shared-readwrite!

@dan800
Copy link

dan800 commented Jun 7, 2022

@dan800 - I updated the configurator with the image that has gcsfs after adding it to the environment, so you should be now good to go on this!

@fperez It seems the switch from the pangeo stack to the heliocloud package has meant some of the standard packages need updating.
ImportError: Plotting of arrays of cftime.datetime objects or arrays indexed by cftime.datetime objects requires the optional nc-time-axis (v1.2.0 or later) package.

@rmcgranaghan This might be highlighting a problem we have in that these packages (that perhaps started from similar bases) have diverged making working between the disciplines difficult. It also will make sharing packages across the platforms harder than it could be.

@rmcgranaghan
Copy link

@dan800 thanks for investigating. I think you've identified an issue that we can work on to improve HelioCloud to be better equipped to connect Helio and climate communities

It is my understanding the @brianthomas plans to update the HelioCloud image to begin from the latest version of the Pangeo image. I suggest we enumerate the missing packages that are needed for Helio-climate collaboration and share with Brian

@dan800
Copy link

dan800 commented Jun 7, 2022

BTW, I did try to install it locally for the session but it essentially hung - I ctrl-c'd after 30 minutes:
'''
Welcome to HelioCloud DaskHub
(notebook) jovyan@jupyter-dan800:~$ conda install -c conda-forge nc-time-axis
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ failed

CondaError: KeyboardInterrupt

(notebook) jovyan@jupyter-dan800:~$
'''

@yuvipanda
Copy link
Member

I've also opened 2i2c-org/jackeddy-image#2 which would base the jackeddy image off the latest pangeo image rather than the heliocloud image. However, this might mean you might lose some of the functionality that comes from the heliocloud image...

@yuvipanda
Copy link
Member

@dan800 also, I suggest trying mamba instead of conda to install stuff on the package (it is a drop-in replacement). I just tried !mamba install -c conda-forge nc-time-axis -y and it immediately worked.

@damianavila damianavila moved this from In progress to Complete in DEPRECATED Engineering and Product Backlog Jun 13, 2022
@damianavila
Copy link
Contributor Author

I think we can close this issue by now since the new hub was deployed on time and the event is already finished.
Also marking this one as part of the previous iteration/cycle because it belongs to that one.

Thanks for all your help with this one, @yuvipanda!

@fperez
Copy link
Contributor

fperez commented Jun 13, 2022

Huge, huge thanks and kudos to the team on this one, I think it was a great success.

@rmcgranaghan
Copy link

Indeed - it would be hard to overstate how brilliant a job you all did. Genuine thanks.

@yuvipanda
Copy link
Member

Glad to help, @fperez @rmcgranaghan @colliand @dan800!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

7 participants