Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPL OPeNDAP service is retired; access to new source seems restricted #138

Open
observingClouds opened this issue May 18, 2023 · 16 comments
Labels
data source changed Fix for vanished source locations.

Comments

@observingClouds
Copy link
Collaborator

observingClouds commented May 18, 2023

The JPL OPeNDAP service has been retired, which has provided e.g. the saildrone datasets. Following the instructions on how to shift to the new system, I fear that the access is now restricted by username and password, which would be a bummer.

Here is for example the new link to the SD-1060 dataset: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/SAILDRONE_ATOMIC/saildrone-gen_5-atomic_eurec4a_2020-sd1060-20200117T000000-20200302T235959-5_minutes-v1.1595997115384.nc
(can be found here).

However, I can only open it after entering credentials.

@observingClouds observingClouds added the data source changed Fix for vanished source locations. label May 18, 2023
@observingClouds
Copy link
Collaborator Author

observingClouds commented May 18, 2023

@RobertPincus any ideas how to go about this? Is the data also stored somewhere else, where we would have easier access?

@observingClouds observingClouds changed the title JPL opendap service is retired JPL OPeNDAP service is retired; access to new source seems restricted May 18, 2023
@observingClouds
Copy link
Collaborator Author

I sent an email to podaac and hope they have an idea on how to solve this issue.

@observingClouds
Copy link
Collaborator Author

This worries me the most:

An “Earthdata Login” is required to access data files from within OPeNDAP-in-the-cloud. This service provided by the EOSDIS program is openly available to all free of charge except where governed by internal agreements. If you access OPeNDAP without being logged in, your Earthdata username and password will be requested.

source: https://podaac.jpl.nasa.gov/OPeNDAP-in-the-Cloud

@RobertPincus
Copy link
Contributor

@observingClouds Thanks for digging into this. I've accessed data using an Earthdata login in scripts in other projects. This relies on having an environmental variable set with the token from Earthdata.

Could we create a CI account for this repo/organization, generate a token, and use it as a Github secret?

@observingClouds
Copy link
Collaborator Author

Technically it is probably possible, but every user of the catalog would need to create a token as well. If several services request those tokens, it adds a huge burden on everyone and makes the catalog usage much less convenient. For non-interactive usage of the catalog for example, one would need to know beforehand, which tokens need to be created. In this particular case, the tokens also seems to be valid only for one hour, so within one workflow you might need to request a new token.

@RobertPincus
Copy link
Contributor

@observingClouds Of course it's nicest if the data doesn't require authorization or credentials.

Users only need to provide credentials if they're going to use the data, of course. Having to refresh the credentials hourly will be a pain - that's an especially unfortunate choice at JPL.

@jjmcnelis
Copy link

jjmcnelis commented May 23, 2023

Hi folks, this is Jack McNelis from the PO.DAAC ([email protected]). I want to help you find a workable solution for maintaining this interface now that our datasets are hosted in the cloud.

I'm not familiar with your software; so it's hard to know what to recommend. An approach like the one mentioned by @RobertPincus should work. There's good documentation describing how to set up Earthdata Login authentication at this link: https://docs.opendap.org/index.php/DAP_Clients_-_Authentication

@RobertPincus
Copy link
Contributor

@jjmcnelis Thanks for being in touch. This repo contains an intake catalog - a map to remotely-accessible resources that abstracts away the particular accesses details for Python users.

A couple of questions about getting Earthdata tokens:

  1. Does the PO.DAAC and/or Earthdata have the concept of organizational, rather than personal, accounts?
  2. Is it possible to refresh the authorization token programmatically, or does one have to go through a GUI?

@jjmcnelis
Copy link

1. Does the PO.DAAC and/or Earthdata have the concept of organizational, rather than personal, accounts?

Yes, you're permitted to register an Earthdata Login account for an organization and/or service.

2. Is it possible to refresh the authorization token programmatically, or does one have to go through a GUI?

Indeed, check out: https://urs.earthdata.nasa.gov/documentation/for_users/user_token#api I'm happy to share some python code if you'd rather not bother implementing it yourself.

@RobertPincus
Copy link
Contributor

@jjmcnelis We access the PO.DAAC regularly (at least once a week) to ensure we are still pointing to valid data. If you have Python examples of how to request a token, use it to access the data, and revoke it (so we don't ask for too many at once) in a single script that would fit our use case perfectly.

@jjmcnelis
Copy link

Thanks, @RobertPincus. Will you please share an example endpoint you're hitting to do this? Is it CMR or OPeNDAP? That'll help me identify the most appropriate resource for this use case.

@RobertPincus
Copy link
Contributor

@jjmcnelis Here's an example:
https://www.ncei.noaa.gov/thredds-ocean/dodsC/psl/atomic/p3

Used in this leaf of the catalog: https://github.com/eurec4a/eurec4a-intake/blob/master/P3/axbts.yaml

This is an OpenDAP endpoint; I think most of our data is hosted behind one OpenDAP server or another.

@observingClouds
Copy link
Collaborator Author

@jjmcnelis, thank you for your help! I really wish the access of the data would be more straight forward, something that we try to accomplish with this catalog. I hope PO.DAAC will change this again, because it was very easy beforehand.

While I would like to have the original source integrated in the catalog, I think in the short-term the easiest is to just find a different resource or host the data elsewhere, where access is not restricted. I found some of the files at https://github.com/cgentemann/paper_software/tree/master/2020_ATOMIC_Salinity/data . We can easily access those files through intake. Unfortunately, these are not all Saildrone files though.

@cgentemann, do you know an alternative source by any chance? I also just want to raise your awareness that the data access to this particular set of data got more restrictive in the Year of Open Science. Maybe this is something NASA TOPS could address?

@cgentemann
Copy link

@observingClouds I'm sorry, but I don't know of an alternative source. Saildrone data are scattered around in part because of who funded what data and what licensing agreements were applied. The NASA funded data is open and freely available, but open doesn't always mean easy to access and this is a challenge for all datasets, not just Saildrone. Thanks for your comments, I will pass them along.

@observingClouds
Copy link
Collaborator Author

observingClouds commented Jun 2, 2023

There is another issue with the new OPeNDAP server that makes it currently not straight forward to use with pydap and intake: pydap/pydap#188

import os
from pydap.client import open_url
from pydap.cas.urs import setup_session

url = "https://opendap.earthdata.nasa.gov/collections/C2491772162-POCLOUD/granules/saildrone-gen_5-atomic_eurec4a_2020-sd1026-20200117T000000-20200302T235959-5_minutes-v1.1595997001389"
setup_session(os.environ['DAP_USER'], os.environ['DAP_PASSWORD'], check_url=url)

fails with

UserWarning: Navigate to https://opendap.earthdata.nasa.gov/collections/C2491772162-POCLOUD/granules/saildrone-gen_5-atomic_eurec4a_2020-sd1026-20200117T000000-20200302T235959-5_minutes-v1.1595997001389, login and follow instructions. It is likely that you have to perform some one-time registration steps before acessing this data.

@observingClouds
Copy link
Collaborator Author

Something that does work but requires additional code and is not performant, because the entire dataset has to be downloaded, is:

import netrc, fsspec, aiohttp
import intake
from intake.catalog.local import LocalCatalogEntry

(username, account, password) = netrc.netrc().authenticators("urs.earthdata.nasa.gov")
fsspec.config.conf['https'] = dict(client_kwargs={'auth': aiohttp.BasicAuth(username, password)})

d={"SD-1060":LocalCatalogEntry('5min','',args={'urlpath':'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/SAILDRONE_ATOMIC/saildrone-gen_5-atomic_eurec4a_2020-sd1060-20200117T000000-20200302T235959-5_minutes-v1.1595997115384.nc'}, driver='netcdf')}
cat['SD-1060'].to_dask()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source changed Fix for vanished source locations.
Projects
None yet
Development

No branches or pull requests

4 participants