Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when opening MERRA in xarray via earthaccess but same code is successful using s3fs #616

Closed
asteiker opened this issue Jun 25, 2024 · 3 comments

Comments

@asteiker
Copy link
Member

I'm trying to understand why I'm getting an error when I open an s3 URL using earthaccess and then feeding that to xarray, yet I don't have an issue when using s3fs directly.

Working off of a modified tutorial from https://nasa-openscapes.github.io/earthdata-cloud-cookbook/tutorials/Hurricanes_Wind_and_Sea_Surface_Temperature.html. This code opens the dataset successfully in xarray:

import earthaccess
import requests
import s3fs
import xarray as xr

auth = earthaccess.login(strategy="netrc") # works if the EDL login already been persisted to a netrc
if not auth.authenticated:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

# Define a function for S3 access credentials

def begin_s3_direct_access(daac_url):

    # Retrieve the token as a JSON
    response = requests.get(daac_url).json()

    print(response['accessKeyId'])
    print(response['secretAccessKey'])
    print(response['sessionToken'])

    # Mount the bucket and return it as an S3FileSystem object
    return s3fs.S3FileSystem(key=response['accessKeyId'],
                            secret=response['secretAccessKey'],
                            token=response['sessionToken'],
                            client_kwargs={'region_name':'us-west-2'})

# Open S3 file systems with S3FS

fs = begin_s3_direct_access("https://data.gesdisc.earthdata.nasa.gov/s3credentials")

# Check that the file system is intact as an S3FileSystem object, which means that token is valid

type(fs)
# Open datasets with S3FS

print(fs.ls('gesdisc-cumulus-prod-protected/MERRA2'))

ds = xr.open_dataset(fs.open("s3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2012/10/MERRA2_400.tavg1_2d_slv_Nx.20121025.nc4"))
min_lon = -89
min_lat = 14
max_lon = -67
max_lat = 31

ds = ds.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))

ds

However, I get an engine error using earthaccess.open(), and specifying "netcdf4" or "h5netcdf" produces other errors:

files = earthaccess.open(["s3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2012/10/MERRA2_400.tavg1_2d_slv_Nx.20121025.nc4"], provider="GES_DISC")
ds = xr.open_dataset(files)
min_lon = -89
min_lat = 14
max_lon = -67
max_lat = 31

ds = ds.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))

ds

I know this is not necessarily an earthaccess problem, but it would be helpful to understand what is happening under the hood and whether/how we need to caveat any use cases for those who would like to simplify their code away from s3fs.

@mfisher87
Copy link
Collaborator

Hey @asteiker can you attach the engine error you're receiving?

@itcarroll
Copy link
Collaborator

files = earthaccess.open(["s3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2012/10/MERRA2_400.tavg1_2d_slv_Nx.20121025.nc4"], provider="GES_DISC")
ds = xr.open_dataset(files)

I think you forgot that files is a list:

ds = xr.open_dataset(files[0])

@asteiker
Copy link
Member Author

@itcarroll Ah!! Thank you so much! The xarray error threw me off. It asked to specify an engine because it couldn't open a list. Duh. I really appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants