Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support request_kwargs param for HTTPFilesystem #1142

Closed
yuvipanda opened this issue Dec 22, 2022 · 5 comments · Fixed by #1164
Closed

Support request_kwargs param for HTTPFilesystem #1142

yuvipanda opened this issue Dec 22, 2022 · 5 comments · Fixed by #1164

Comments

@yuvipanda
Copy link

HTTPFilesystem supports a client_kwargs param, which is passed to the aiohttp ClientSession object constructed for all requests. This allowed us to pass auth credentials to all requests made in the session, which worked when trying to access data behind auth (such as EarthData login).

However, this sent the credentials to all requests from the session, and it was not possible to restrict which requests the credentials get sent for. While this leaked credentials, it was fine for most cases as the extra Authorization header present does not make a big difference.

However, if you are making calls to AWS S3 presigned URLs, they will reject your request saying there is now an additional unexpected Authorization header present. This was the primary issue causing earthdata login and fsspec to not play well together when the user was running code from inside AWS us-west-2, as they will be eventually redirected to using presigned S3 URLs. More detailed information about this problem is captured in nsidc/earthaccess#188.

The solution is to use earthdata tokens, but only send them on the initial request. If we use the existing client_kwargs param, they will be sent on every request, including redirects. Instead, we need a way to find a way to modify the params of the originating request only, and let redirects play out as they might.

Adding a request_kwargs that is merged into the .get calls (like here:

async with session.get(self.encode_url(rpath), **self.kwargs) as r:
) and any other place a request is made would solve this issue.

@yuvipanda
Copy link
Author

ok after writing all this maybe storage_options is the param I want and it already exists?!

@yuvipanda
Copy link
Author

Looks like we can just pass headers as a param, which gets shoved into storage_options or something, and is exactly what would be called request_kwargs! Sorry for the confusion!

@martindurant
Copy link
Member

Correct, get() and related calls take extra kwargs that get passed eventually to the session.get .

You might also want to consider creating two filesystem instances, one with the extra header information and one without (like two s3 instances at different endpoints).

          async with session.get(self.encode_url(rpath), **self.kwargs) as r: 

looks like a bug, should be **kw

@yuvipanda yuvipanda reopened this Dec 22, 2022
@yuvipanda
Copy link
Author

Should we close this issue, @martindurant? Not sure what exactly you mean by 'looks like a bug, should be **kw'

@martindurant
Copy link
Member

I mean that on this line, I think it should be **kw rather than **self.kwargs to take account of extra kwargs passed to _get_file (and get()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants