-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hub] Jack Eddy Symposium #1329
Comments
Yes! I invite @rmcgranaghan and @dan800 to collaborate with us in establishing this hub for the Eddy Symposium. Please note information requests in the anchor entry of this issue. (While searching for Dan, I found this repo in his collection: https://github.com/dan800/intake-esm.) |
Thanks for the invite! Essentially intake-esm is provide a data access layer allowing you to pick a model, experiment and variable. Regarding using it to access datasets on AWS, this was the example I found most useful: It's pulling down the surface temperature and radiative balance for the CMIP models for 4xCO2 experiments to calculate climate sensitivity. I haven't run it in a while, but the catalog is here: For our project, the 'hist-sol' runs (solar variability only) might be the first place to look. |
Hi @damianavila! There appear to be two distinct usages of the word Unless @dan800 or @rmcgranaghan intervene with other advice, I suggest that we use the image below for the hub logo and that the 2i2c team choose the Docker image for a Daskhub similar to those used by the Pangeo community. |
@dan800 is there any preference in the cloud provider? From the top description, it seems you needed to access some data in AWS land, but in your most recent update, it seems the data lives in GCP. Are you OK with the hub deployed on GCP land? cc @yuvipanda, this might be relevant to decide where we deploy the hub. |
@damianavila No preference - I may have had it wrong and it's GCP! Please choose whatever you think is most efficient/cost effective. |
I enjoyed an excellent Zoom call with Ryan and Dan earlier today. We exchanged a lot of information! Updates that I hope will be helpful to the 2i2c team are shared next. Heliophysics software environment for Eddy Symposium?Ryan pointed me to this excellent talk by Brian Thomas of NASA on HelioCloud(starting at 22:28). The talk describes a vision for collaborative and accelerated research for the heliophysics community by making the curated tools and data more easily accessible. (Thanks @brianthomas for the talk and for openly sharing your work! It would be great to link up with you to see if we can collaborate.) Brian's talk pointed at the following resources: (These images were cloned and modified from pangeo-data/pangeo-docker-images project. FYI @rabernat. ) Before learning about HelioCloud, we (Ryan McGranaghan, Dan Marsh, @fperez and me) planned to deploy a DaskHub with a Pangeo-style software environment for usage at the upcoming Jack Eddy Symposium with future plans to customize the environment to better support heliophysics research teams. HelioCloud may allow us to advance faster! While there may be too little time to get this all in place, I ask @yuvipanda anyway... can you deploy a DaskHub with the HelioCloud environment for us? Access control for the Symposium?Ryan McGranaghan created the Jack Eddy Symposium GitHub Organization. Dan, Ryan and others will use this structure to host some repositories with notebooks to be used at the Symposium. We'd also like to use this organization to define the |
- Put in GCP as the CMIP6 data desired is on GCP as well. However, I thin GCP's CMIP6 dataset is more limited than AWS' - we cna move this to AWS if needed. - SCRATCH_BUCKET is setup - Secure push to GitHub via gh-scoped-creds is setup - GitHub teams is used for auth - Default memory limits are 1Gi Limit + 256Mi Guarantee. I *think* this won't be enough - let's customize this as needed. - I tried to use https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py built from https://git.mysmce.com/heliocloud/heliocloud-docker-images, but it doesn't work - primarily because it has JupyterHub 2.0 in there, and we're still at 1.5. Need to debug that. In the meantime, just using the default pangeo image. Ref 2i2c-org#1329
Ready for testing at https://jackeddy.2i2c.cloud/! Access control is set up so that anyone part of the https://github.com/jack-eddy-symposium organization can log in. https://github.com/yuvipanda/gh-scoped-creds/ is set up for secure pushing to GitHub (see this blog post for details). There's also a dask-gateway is also set up. In short, this is fully set up as a Pangeo environment. I tried to use the image https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py, but unfortunately it failed. I'm not sure why - it needs a bit of a deeper investigation. I'll try to do that soon, but any information about wether those images are currently being used in other hubs will be helpful. I also can't create an account on the gitlab instance (https://git.mysmce.com/heliocloud/heliocloud-docker-images) where the image is hosted, so not sure how to contribute. I've set a 1G memory limit, a 1 CPU guarantee / 2 CPU limit. We can tweak these as needed. @colliand @dan800 please try it out and let me know what needs to change! |
We can also prewarm the cluster before the symposium starts so users can get on much quicker. |
Thanks @yuvipanda. The hub does not appear to complete the spawning process. Here is the error message I received:
|
Same for me. |
@colliand @dan800 based on my reading of https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation#resource_availability, the cloud was just full for a bit! It's working again now. |
Wow! Thanks Yuvi! I see Jovian moons orbiting the logo....and there's the lab interface. Merci! |
Hello both. Thanks for working on this after hours. So I was able to access the server and tried a very simple notebook. It crashed whan I tried to access a slice of CMIP6 model output - I think because it takes up 3 or 4 GB of RAM. Just reading the CMIP6 catalog does use a large amount of memory. The notebook is here: |
@dan800 ok, I've setup a new nodepool with higher resources for this. You should have access to about 24G of RAM and about 4 CPUs now. Give it a shot? |
We can tone down the resource requests for the actual event if necessary. |
@yuvipanda Great, that works! Thanks for getting this working. The memory usage was around 1.25 GB. This simple example extracted a 165 years of monthly mean temperatures at a particular location in the atmosphere from a climate model simulation. It's a good starting point for apply time series analysis. |
Brilliant! Thank you for the excellent work @yuvipanda. I'm able to spin up an instance and am creating material now. I've added a directory for tutorials and populated with a README markdown file and some useful starter notebooks. Please add any helpful tutorials there and update the README as you do. This will likely be the starting place for the participants, so we want it to be instructive |
Hi all, I can add folks to our gitlab to contribute (that would be awesome). I'd be happy to work with you all to figure out why the container crashed. @yuvipanda please contact @rmcgranaghan by email to get my email and contact me. I'll then be able to add you to our gitlab. -brian
|
I've shared Yuvi's email with Ryan to complete the link back to Brian. Thanks to all involved for sharing expertise and contributing toward the success of the Eddy Symposium! |
Thx a lot @yuvipanda! Quick q - did you base the image on the docker/apt/env.yml/etc files they had for their heliocloud, but moved over to one of our repos? I'd love as a goal of this event to harmonize further how we all do this type of thing even further. Right now we all do very similar things (JMTE hubs, Berkeley ones, 2i2c ones, HelioCloud, ...) but with ever so slight tweaks in the workflow (where some files go, workflow for updates, etc). I think there's an opportunity to adopt some common, more standardized practices on this. I'd be happy to take some notes together with @rmcgranaghan @brianthomas @colliand et al in Vail on this, linking up with the 2i2c team as needed (but without imposing on any of you our CO schedule). |
Actually, question: where is the set of config files now for this particular hub, in case I want to suggest any other tools/updates? |
@yuvipanda - you know what I'm going to ask for :) Basically the same s159/JMTE toy set - VNC, show hidden files, extensiosn like git & jupyterlab-favorites, url proxy support, syncthing, node selector on landing, etc. My talk will be about "living la vida nube" in s159/jmte with a uniform workflow, would be fantastic for the attendees to have access to the same set of default toys that I used this semester to achieve this smooth environment in teaching and research. I'm also talking at the EarthCube meeting the week immediately after - same story. I want to use these events as an opportunity to streamline these patterns as much as possible. Thanks for all the work folks!! |
@fperez so this currently uses the default pangeo image, which now already has gh-scoped-creds. https://git.mysmce.com/heliocloud/heliocloud-docker-images is the possible image that is going to be used - although it's currently failing and needs debugging. So the current image used is just https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook. Config is at #1337 So if I hear this currently, things you want in the image used would be:
I think some of these (perhaps jupyterlab-git?) can just go in the default pangeo image, while others probably need some custom image made for this event. |
@yuvipanda Has something changed? I don't seem to have access to intake now: ModuleNotFoundError Traceback (most recent call last) ModuleNotFoundError: No module named 'intake' |
@dan800 I used the heliocloud image (https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/tree/main/helio-notebook-py) as the base, and that doesn't seem to have intake. I'm adding it now. |
Done in 8d93017776082c1ea834096e1e72fd0b5dfb78c4, will update after the build is complete |
I can base it off the pangeo image instead of the heliocloud image if you want too |
@dan800 I added intake https://github.com/2i2c-org/jackeddy-image/blob/main/environment.yml, on top of the base in https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/blob/main/helio-notebook-py/environment.yml. https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml is the env file for pangeo. LMK if you want me to just add specific packages to our image, or base it off the pangeo image instead of the heliocloud image. |
FYI, I have created a dedicated issue for the event: #1384 |
@yuvipanda Intake now loads - thanks! This may have already been flagged, but with the latest kernel I get the following error when trying to load the data from the cloud: ImportError: Please install gcsfs to access Google Storage |
@dan800 - I updated the configurator with the image that has gcsfs after adding it to the environment, so you should be now good to go on this! |
On the other hand, I have a question for @yuvipanda - @colliand tried to add @rmcgranaghan as an admin in #1389, and I merged that, but something is not happy at the deploy stage. I'd be happy to try and fix the issue but I'm unfortunately beyond my comfort zone here, so I'll need a bit of input/help if possible. |
@fperez I see @rmcgranaghan is an admin now! |
Thanks, @yuvipanda - confirming that I have access to the shared-readwrite! |
@fperez It seems the switch from the pangeo stack to the heliocloud package has meant some of the standard packages need updating. @rmcgranaghan This might be highlighting a problem we have in that these packages (that perhaps started from similar bases) have diverged making working between the disciplines difficult. It also will make sharing packages across the platforms harder than it could be. |
@dan800 thanks for investigating. I think you've identified an issue that we can work on to improve HelioCloud to be better equipped to connect Helio and climate communities It is my understanding the @brianthomas plans to update the HelioCloud image to begin from the latest version of the Pangeo image. I suggest we enumerate the missing packages that are needed for Helio-climate collaboration and share with Brian |
BTW, I did try to install it locally for the session but it essentially hung - I ctrl-c'd after 30 minutes: CondaError: KeyboardInterrupt (notebook) jovyan@jupyter-dan800:~$ |
I've also opened 2i2c-org/jackeddy-image#2 which would base the jackeddy image off the latest pangeo image rather than the heliocloud image. However, this might mean you might lose some of the functionality that comes from the heliocloud image... |
@dan800 also, I suggest trying |
I think we can close this issue by now since the new hub was deployed on time and the event is already finished. Thanks for all your help with this one, @yuvipanda! |
Huge, huge thanks and kudos to the team on this one, I think it was a great success. |
Indeed - it would be hard to overstate how brilliant a job you all did. Genuine thanks. |
Glad to help, @fperez @rmcgranaghan @colliand @dan800! |
Hub Description
The request is for the launch or use of a research hub with Dask. (Daniel Marsh, one of the co-organizers, wishes to share tutorial notebooks. He plans to use "intake-esm to access all the CMIP6 climate runs which are hosted in zarr format on aws. We can pull down all the time series data and regress it against the solar forcing used to derive the solar response."
Community Representative(s)
@colliand, can you give us the contact information for the community representatives.
Looking at the lead issue, it seems the contacts would be Daniel Marsh and/or Ryan McGranaghan. If that is the case, do you have their contact information? Any GitHub handles?
Important dates
Hub Authentication Type
Other (may not be possible, please specify in comments)
Hub logo information
Hub user image
Extra features you'd like to enable
Other relevant information
I presume GitHub auth would be OK but we need to confirm it.
It seems they might need to interact with datasets on AWS, so it might make sense to deploy in that cloud provider...
From the lead description, it is not clear to me how this hub will be paid so there is some stuff to figure it out in case we need a new AWS land to deploy into...
Hub URL
jackeddy.2i2c.cloud
Hub Type
daskhub
Tasks to deploy the hub
The text was updated successfully, but these errors were encountered: