-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with external ESGF data sources #420
Comments
Hi @lila. Thank you for bringing this points up.
This is purely for convenience purposes. If there is good reason to change this, we certainly could do so. We also are accustomed to authenticating with other cervices once inside a pangeo jupyterhub. So accessing ESGF is certainly possible while using github.
This is an interesting idea. We could certainly setup a pangeo jupyterhub in the same k8s cluster as ESGF. Ideally this wouldn't be necessary and we can sort out the authentication issues so that ESGF doesn't have to broker access to the compute cluster and users don't have to stand up their own ESGF.
In a very ideal world, we wouldn't need to authenticate with ESGF directly but it seems that is unlikely to happen in the short term. I think people have tried openDAP frontends for accessing object store. We have also been bouncing around the idea of a more cloud native openDAP service that can read from stores like Zarr. More broadly, it sounds like it would be useful for ESGF and Pangeo to have some conversations to discuss these sorts of details. If you think that is something that would be useful, let us know and we can get something setup. |
Hi @lila, @jhamman, the latest version of ESGF supports OAuth 2.0 but it is not rolled out to all nodes in the federation yet. You can use OAuth 2.0 to sign in and also to get a delegated X.509 certificate. You can't use OAuth 2.0 access tokens directly to access data but this is in the pipeline. Without knowing too much about what you are doing with Pangeo OAuth 2.0 with github is probably a good starting point. You could also consider authentication with ORCID ids as this uses OAuth 2.0 also. There are also many good initiatives with research federations - ESGF - and elsewhere to plug into. I believe there are plans to make profile for use of OpenID Connect for research federations. This would be a natural progression from OAuth 2.0. Most of ESGF uses THREDDS for direct HTTP file serving rather than OPeNDAP. There is also GridFTP support. At CEDA, we have been doing work overlaying netCDF over object storage. I mentioned on another thread that there was interest in inviting someone from Pangeo to present at the ESGF Face-to-Face meeting in Washington DC in December. It sounds like there are a number of areas where it would be good to talk more |
It might be useful to note that when we're talking about pangeo and
authentication all we're talking about is just JupyterHub. We're not
adding any additional functionlity beyond what is described in
https://jupyterhub.readthedocs.io/en/stable/reference/authenticators.html
. If changes needed to be made they would almost certainly be made in that
project (which is pretty receptive).
…On Sun, Oct 7, 2018 at 10:38 PM Joe Hamman ***@***.***> wrote:
Hi @lila <https://github.com/lila>. Thank you for bringing this points up.
pangeo uses oauth2 authentication with github
This is purely for convenience purposes. If there is good reason to change
this, we certainly could do so. We also are accustomed to authenticating
with other cervices once inside a pangeo jupyterhub. So accessing ESGF is
certainly possible while using github.
does it makes sense then to just run both sets of services in the same
cluster
This is an interesting idea. We could certainly setup a pangeo jupyterhub
in the same k8s cluster as ESGF. Ideally this wouldn't be necessary and we
can sort out the authentication issues so that ESGF doesn't have to broker
access to the compute cluster and users don't have to stand up their own
ESGF.
is there an alternative to using openDAP for data access?
In a very ideal world, we wouldn't need to authenticate with ESGF directly
but it seems that is unlikely to happen in the short term. I think people
have tried openDAP frontends for accessing object store. We have also been
bouncing around the idea of a more cloud native openDAP service that can
read from stores like Zarr.
------------------------------
More broadly, it sounds like it would be useful for ESGF and Pangeo to
have some conversations to discuss these sorts of details. If you think
that is something that would be useful, let us know and we can get
something setup.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszKlBnrq3wetmseeaOS-UnHPnqLIwks5uirqagaJpZM4XKW-3>
.
|
What are the dates for this meeting? Do you want to touch base over email? I'm sure we can find a Pangeo member that can attend. |
It would be really great if someone from Pangeo could make it to the ESGF face to face meeting: http://www.cvent.com/events/8th-annual-esgf-f2f-conference-esgf-2018-/custom-22-d2c9d372ed56433cab3e5b80c9541c50.aspx I can't do it because it is my last week of classes. Maybe @jhamman or @rsignell-usgs is game? |
I agree, we should see if someone could get to this meeting. I doubt it can be me with AGU the following week. Also, it looks like we missed the boat on registration/abstracts. |
According to @balaji-gfdl, there is flexibility here. They seem eager to have a contribution from us. |
No we did not, for with great foresight (and presumption) I put in an abstract on your behalf. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
Hi everyone,
I just wanted to capture any issues we may have regarding connecting to and accessing data from external data sources, specifically ESGF datanodes. ESGF now supports k8s as well. So some questions that have come up:
alternatively it could be done by whitelisting the pangeo ips. however, this approach would not support multiple instances of the pangeo system being deployed and used.
given you can now run both esgf data storage/search and data analysis in k8s, does it makes sense then to just run both sets of services in the same cluster?
is there an alternative to using openDAP for data access? if the data is stored in a GCS bucket, then for public datasets you could use requestor-pays buckets to limit potential egress costs for providing the data. But direct access to GCS would lack authentication. GCS Signed URLs might be sufficient.
would love any thoughts of feedback ...
The text was updated successfully, but these errors were encountered: