-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hub] Alabama Water Institute CIROH hub #1444
Comments
Based on today's call with @jameshalgren, I suggest the following onboarding process. CIROH and AWI have ambitious plans so it's important we get the initial conditions right.
|
Suggested plan LGTM, @colliand. @jameshalgren, we will ping you soon with some questions about the specific of the hub deployment. |
Thanks @colliand, @damianavila. Processing, will respond soon. |
A few questions, possibly specialized, probably going beyond the scope of this issue. Tagging @colliand to ask for redirection or moderation if necessary.
|
Tagging @whitelightning450 @karnesh for situational awareness. |
Hi James! Yes 2i2c has experience with the real-time-collaboration features in upstream Jupyter. Experiments have shown that feature is not ready for production deployments. There is ongoing work there and 2i2c will support RTC when we can do so securely and robustly. Yes, our team is contributing to the "tantalizing future" you referenced. The pioneering work of the Pangeo community is an inspiration for the founding of 2i2c. We are in the process of on-boarding a new team member @jmunroe who has technical and community experience with big data geosciences. I spent some time briefing him on CIROH/AWI today and expect he will be an excellent resource for our collaboration. |
@jameshalgren I haven't seen (which doesn't mean they don't exist, obviously) examples of hubs tightly integrated with ODCs. But from a quick look at the ODC setup, I see a key element of this is having an accessible Postrgres server to manage the actual data catalogs and serving. Coincidentally, as part of the Jupyter Meets the Earth effort, with @consideRatio and @yuvipanda we're looking right now at how to most cleanly set up a persistent, robust and cost-effective Postgres server that can be accessed by all the users of a Hub. We happen to need that for one of our research projects, and our current solution (via sqlite) is sub-optimal. We'll be happy to share any progress we make on that front back with the rest of the team - just today I was discussing with @consideRatio how this was very likely to be a use case that many others would be likely to encounter. So I'm delighted to see that intuition confirmed by your needs, and it means it's all the more timely that we make progress on it :) |
Assuming the National Water Model will be a key dataset used by this hub, I'll note a few other links
This is in additional to the NWM data store on GCP linked above. I am interested in identifying other key datasets that the community will anticipating using on this hub to ensure it is being set up in a way that accessing that data is straight forward for users. |
Thanks @jmunroe! I'll add @jameshalgren here in case he can share any other input on important data sets for the emerging CIROH community. |
Thanks @colliand and @jmunroe. I've jotted down a few thoughts/responses to launch the weekend:
It will be the key dataset used in this hub, together with observation data initially from USGS, but from any valid source.
I think it is http only at this point. There are ftp-versions using the LDM protocol for direct sharing of data between NWS offices, but that's probably not relevant here for the moment.
There is a 1.2 GCP bucket of the same data (they use the label 'reanalysis' which is technically incorrect...). The AWS version of that data is more complete, with the 1.2, 2.0, and 2.1 versions of the retrospective data, along with experimental (?) versions with subsets of the data in zarr formats. The GCP bucket mentioned is a superset of the S3 resource, with the analysis, short (on s3), medium, and long-range output. In fact, only a handful of specific derived products appear to be missing from the GCP bucket relative to what is available on the direct download from NOMADS.
Hopefully, some of what we make here can allow for Dr. Maidment's work to be more easily contributed back to the broader NWM community. He and his team were critical influencers in the initiation of the project and continue to generate great work!
I mentioned USGS data. There is a useful toolset for accessing USGS data and we may use that or replicate a portion into storage on the cloud backend. I am aware of a similar script by @groutr. Those observed streamflow (which are really observed stream-stage data converted to estimated flow -- but the convention is to call them streamflow...) data will be the key initial dataset because they are the key output from the model . As we continue, additional variables will be examined and we will have to identify or create repositories of validation data to use for exploration. |
A few questions for all of you 😉
OK, so starting with the pangeo-notebook image is enough to start with, I presume. Can you confirm?
IIRC, @yuvipanda set up this feature for the Jack Eddy symposium.
Are we talking about a dedicated cluster here? Or are you OK with the hub being deployed in a shared cluster? |
@yuvipanda it would be great if we could take this opportunity to document how to setup this feature in the hub features docs |
For reference, I think this is solely something to setup in the user image. This is what JMTE has done to support this functionality. It is then represented as the "Desktop" icon in the JupyterLab launcher. |
Yes, I suggest that the AWI/CIROH hub be set up on a dedicated GKE cluster on the data center where the NWM data is hosted. I suggest that 2i2c manage the billing account for the cluster with the monthly cloud usage costs passed through to AWI. AWI/CIROH may choose to take over the billing account as the service and their devops capacity expands. I like the advice shared by @consideRatio ratio that we set this hub to resemble the JMTE hub. The suite of integrated tools in that hub is tuned to support collaborations like those envisioned by CIROH. |
This sounds like we should create a new billing account and not just use the two-eye-two-see one, no? P.S. It also looks like I don't manage the two-eye-two-see billing account, so I can't create a project attached to that one in the interim |
Link to that hub for reference? |
@whitelightning450, @hellkite500, @aaraney, @karnesh, @mgdenno -- have been meaning to loop you in here so you can follow the development here. @quebbs -- hello! -- tagging you ahead of upcoming discussion. This may be a tool to put to use. |
Ok, I have created a new GCP account to deploy this into. I have connected the 2i2c billing account for now, and we can decide to change that later if needed. (Big gold star ⭐ to Chris for figuring that out!) |
Can we be a bit more specific about this please? The NWM data is multi-regional in the US: so is |
@colliand was that piece part of the conversation? @jameshalgren, any input about this one? |
I think we want to avoid f-35 syndrome. Let me check with a couple of others but I think we can do plenty with GPCPUs for now. Having the option in the future might be useful. What are the trade-offs for going to the data center where GPUs are available? |
I am struggling to install TurboVNC with the provided code snippet and receiving the following error:
|
@sgibson91 seems like you have the exact same code snippet and a similar base image as in https://github.com/pangeo-data/jupyter-earth/blob/master/hub.jupytearth.org-image/Dockerfile. So, maybe the Hmmm, googling on the errors, I see notes about |
Thanks @consideRatio. I added the clean-up step to the earlier Final commit looks like this: 2i2c-org/awi-ciroh-image@ |
@sgibson91 -- Thanks! |
These are organizations, I was under the impression you wanted specific teams to have access? E.g. the tech-team that is a member of the 2i2c org -> https://github.com/orgs/2i2c-org/teams/tech-team |
Ah pardon me, I think I'm misremembering another hub setup issue where a question was raised about subteams |
|
Absolutely. The hubs are available here: Please note these docs about authorising the GitHub app for the first time: https://infrastructure.2i2c.org/en/latest/howto/configure/auth-management.html#follow-up-github-organization-administrators-must-grant-access |
@consideRatio are there any other setup steps regarding the VNC/Linux desktop? I would've expected a button on the Lab Launcher saying "Desktop", but it's not there. Also changing Image repo: https://github.com/2i2c-org/awi-ciroh-image |
This is what is done in the JMTE image, which is based on a pangeo-notebook base image: #1444 (comment). I don't think anything else is needed! |
🤔 Hmmm ok, maybe Yuvi can help me debug when he's online then |
@sgibson91 I would suspect CIROH-UA/awi-ciroh-image@7b080be#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R32-R33 could be to blame. I don't understand how jupyter-server-proxy registers things to show up in jupyterlab and start up properly, but jupyterlab presents icons for notebook / kernels etc, and maybe there is a common mechanism in play related to removing Hmmm, thinking about it, if you don't succeed in accessing /user/some-name/desktop, it makes me think that jupyter-server-proxy has failed to start. That I know from experience can happen if some other jupyter-server-proxy package fails to load properly. So, something else registering itself with jupyter-server-proxy may be to blame. |
Yeah, tbh, I'm just guessing and used https://github.com/2i2c-org/coessing-image/blob/main/Dockerfile as a starting point (before the Julia addition :D) |
Awesome! Does this mean we can get in a start trying things out (I assume this will begin to incur cloud costs...)? |
@jameshalgren yes and yes :) I'm still trying to figure out the VNC/Linux desktop feature though |
I made some progress in PR CIROH-UA/awi-ciroh-image#3 I now have the Desktop icon on JupyterLab's launcher (I'm testing this on the staging hub). However when I click on it, I see "Something went wrong, connection is closed" Logs from my user server (
|
@GeorgianaElena suggested some missing packages in CIROH-UA/awi-ciroh-image#3 (review) and now the desktop feature is available! |
Thanks @jameshalgren. I fixed the link to point to the intended slide deck created by Fernando. |
Now that the production and staging hubs are available, I suggest to @jameshalgren that we organize a kickoff event for CIROH personnel who will manage the hub with @jmunroe @fperez (and perhaps others on the 2i2c team). Perhaps we can link up for a phone call to discuss some launch planning? |
@colliand -- targeting 23 August for a technically focused demo. |
I think we can close this issue (new hub set up) by now (since I believe it is completed) and continue the conversation on new issues. |
Thanks @damianavila -- new issues (as needed) are still posted under this repository, correct? |
@jameshalgren, for follow-up questions/requests I would suggest using our support email channel. |
Hub Description
The Alabama Water Institute (AWI) is convening a consortium of 28 university partners to improve water management for the USA. The announcement of the award to support the collaboration called CIROH is available here.
2i2c has been engaged to provide interactive computing service supporting this collaboration.
The service will initially use GitHub auth using an allow list based on membership in AWI GitHub organization. As the service evolves, I anticipate we may move over to CIlogon.
Community Representative(s)
@jameshalgren
Important dates
Notes: dates are updated accordingly to new information and prioritization.
Hub Authentication Type
GitHub Authentication (e.g., @MyGitHubHandle)
Hub logo information
URL to Hub Image:
URL for Image Link: {{ URL HERE }}
Hub user image
Extra features you'd like to enable
Other relevant information
Let's get started with a Pangeo-style Daskhub. The capacity of the team at AWI is increasing and a customized software environment will likely be ready later in the year.
I suggest this hub offer the VNC/Linux desktop feature.
This hub should be hosted on GCP in a data center that hosts the National Water Model Data.
Hub URL
ciroh.awi.2i2c.cloud
Hub Type
daskhub
Tasks to deploy the hub
The text was updated successfully, but these errors were encountered: