-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow other fsspec protocols than local and s3 #126
Allow other fsspec protocols than local and s3 #126
Conversation
Hey @TomAugspurger, I'm not getting that failure when running the tests on your branch, which is probably due to some AWS credential magic on my end. Updating the tests to run against minio or something similar would probably help with that.. |
Thanks for the PR @TomAugspurger. I just tried this out and am running into some unexpected behavior:
For the setup from virtualizarr import open_virtual_dataset
from virtualizarr.kerchunk import FileType
urls = [
"http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc",
"http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202001-202412.nc",
"http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_202501-202912.nc",
"http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_203001-203412.nc",
] Then I tried to naively do this: vds_list = []
for url in urls:
vds = open_virtual_dataset(
url, indexes={}
)
vds_list.append(vds) which failed with this error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File [/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/implementations/http.py:422](https://leap.2i2c.cloud/srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/implementations/http.py#line=421), in HTTPFileSystem._info(self, url, **kwargs)
420 try:
421 info.update(
--> 422 await _file_info(
423 self.encode_url(url),
424 size_policy=policy,
425 session=session,
426 **self.kwargs,
427 **kwargs,
428 )
429 )
430 if info.get("size") is not None:
File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/implementations/http.py:831, in _file_info(url, session, size_policy, **kwargs) File /srv/conda/envs/notebook/lib/python3.11/site-packages/aiohttp/client.py:978, in ClientSession.get(self, url, allow_redirects, **kwargs) TypeError: ClientSession._request() got an unexpected keyword argument 'key' The above exception was the direct cause of the following exception: FileNotFoundError Traceback (most recent call last) File /srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/xarray.py:108, in open_virtual_dataset(filepath, filetype, drop_variables, loadable_variables, indexes, virtual_array_class, reader_options) File /srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/kerchunk.py:76, in read_kerchunk_references_from_file(filepath, filetype, reader_options) File /srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/kerchunk.py:117, in _automatically_determine_filetype(filepath, reader_options) File /srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/utils.py:58, in _fsspec_openfile_from_filepath(filepath, reader_options) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/spec.py:1298, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/implementations/http.py:361, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper..wrapper(*args, **kwargs) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout) File /srv/conda/envs/notebook/lib/python3.11/site-packages/fsspec/implementations/http.py:435, in HTTPFileSystem._info(self, url, **kwargs)
Makes me think that the protocol is not properly detected? When I add vds_list = []
for url in tqdm(urls):
vds = open_virtual_dataset(
url, indexes={}, reader_options={}
)
vds_list.append(vds) I believe that the default values for reader options are basically invalidating this logic. |
Thanks for this contribution @TomAugspurger ! If @norlandrhagen is happy with this (including @jbusecke 's fix), then I am happy to merge it. |
100% Thanks for the fixes @TomAugspurger and @jbusecke! |
Co-authored-by: Julius Busecke <[email protected]>
Oh weird. My test case is still failing after pulling from main? My fix might have not been sufficient? |
Damn - is there a test/reproducer for this issue @jbusecke (that you can raise in a new issue)? |
Yeah I have that on my list, but very busy this week, so might have to push to next week. Please ping me as needed 😆 |
@TomNicholas see #135 |
This change simplifies the handling of filepaths in
_fsspec_openfile_from_filepath
, and removes some restrictions around what can be passed in. Most notably, it allows the use of non-S3 and local filepaths.I see that
virtualizarr/tests/test_xarray.py::test_anon_read_s3
covers this, but that's failing for me onmain
withIs that just a local configuration issue for me, or is it failing for others as well?