Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with external ESGF data sources #420

Closed
lila opened this issue Oct 5, 2018 · 10 comments
Closed

Integration with external ESGF data sources #420

lila opened this issue Oct 5, 2018 · 10 comments
Labels

Comments

@lila
Copy link

lila commented Oct 5, 2018

Hi everyone,

I just wanted to capture any issues we may have regarding connecting to and accessing data from external data sources, specifically ESGF datanodes. ESGF now supports k8s as well. So some questions that have come up:

  • pangeo uses oauth2 authentication with github. esgf uses its own user certificates. is it possible to combine them somehow? ie, somehow upload/save/define esgf users in pangeo that then use their user cert to connect to esgf datanode to download data securely.

alternatively it could be done by whitelisting the pangeo ips. however, this approach would not support multiple instances of the pangeo system being deployed and used.

  • given you can now run both esgf data storage/search and data analysis in k8s, does it makes sense then to just run both sets of services in the same cluster?

  • is there an alternative to using openDAP for data access? if the data is stored in a GCS bucket, then for public datasets you could use requestor-pays buckets to limit potential egress costs for providing the data. But direct access to GCS would lack authentication. GCS Signed URLs might be sufficient.

would love any thoughts of feedback ...

  • k
@jhamman
Copy link
Member

jhamman commented Oct 8, 2018

Hi @lila. Thank you for bringing this points up.

pangeo uses oauth2 authentication with github

This is purely for convenience purposes. If there is good reason to change this, we certainly could do so. We also are accustomed to authenticating with other cervices once inside a pangeo jupyterhub. So accessing ESGF is certainly possible while using github.

does it makes sense then to just run both sets of services in the same cluster

This is an interesting idea. We could certainly setup a pangeo jupyterhub in the same k8s cluster as ESGF. Ideally this wouldn't be necessary and we can sort out the authentication issues so that ESGF doesn't have to broker access to the compute cluster and users don't have to stand up their own ESGF.

is there an alternative to using openDAP for data access?

In a very ideal world, we wouldn't need to authenticate with ESGF directly but it seems that is unlikely to happen in the short term. I think people have tried openDAP frontends for accessing object store. We have also been bouncing around the idea of a more cloud native openDAP service that can read from stores like Zarr.


More broadly, it sounds like it would be useful for ESGF and Pangeo to have some conversations to discuss these sorts of details. If you think that is something that would be useful, let us know and we can get something setup.

@philipkershaw
Copy link

Hi @lila, @jhamman, the latest version of ESGF supports OAuth 2.0 but it is not rolled out to all nodes in the federation yet. You can use OAuth 2.0 to sign in and also to get a delegated X.509 certificate. You can't use OAuth 2.0 access tokens directly to access data but this is in the pipeline.

Without knowing too much about what you are doing with Pangeo OAuth 2.0 with github is probably a good starting point. You could also consider authentication with ORCID ids as this uses OAuth 2.0 also. There are also many good initiatives with research federations - ESGF - and elsewhere to plug into.

I believe there are plans to make profile for use of OpenID Connect for research federations. This would be a natural progression from OAuth 2.0.

Most of ESGF uses THREDDS for direct HTTP file serving rather than OPeNDAP. There is also GridFTP support. At CEDA, we have been doing work overlaying netCDF over object storage.

I mentioned on another thread that there was interest in inviting someone from Pangeo to present at the ESGF Face-to-Face meeting in Washington DC in December. It sounds like there are a number of areas where it would be good to talk more

@mrocklin
Copy link
Member

mrocklin commented Oct 8, 2018 via email

@jhamman
Copy link
Member

jhamman commented Oct 8, 2018

I mentioned on another thread that there was interest in inviting someone from Pangeo to present at the ESGF Face-to-Face meeting in Washington DC in December. It sounds like there are a number of areas where it would be good to talk more

What are the dates for this meeting? Do you want to touch base over email? I'm sure we can find a Pangeo member that can attend.

@rabernat
Copy link
Member

It would be really great if someone from Pangeo could make it to the ESGF face to face meeting: http://www.cvent.com/events/8th-annual-esgf-f2f-conference-esgf-2018-/custom-22-d2c9d372ed56433cab3e5b80c9541c50.aspx

I can't do it because it is my last week of classes. Maybe @jhamman or @rsignell-usgs is game?

@jhamman
Copy link
Member

jhamman commented Oct 23, 2018

I agree, we should see if someone could get to this meeting. I doubt it can be me with AGU the following week. Also, it looks like we missed the boat on registration/abstracts.

@rabernat
Copy link
Member

Also, it looks like we missed the boat on registration/abstracts.

According to @balaji-gfdl, there is flexibility here. They seem eager to have a contribution from us.

@balaji-gfdl
Copy link

Also, it looks like we missed the boat on registration/abstracts.

According to @balaji-gfdl, there is flexibility here. They seem eager to have a contribution from us.

No we did not, for with great foresight (and presumption) I put in an abstract on your behalf.

@stale
Copy link

stale bot commented Dec 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 23, 2018
@stale
Copy link

stale bot commented Dec 30, 2018

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

@stale stale bot closed this as completed Dec 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants