Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing s3 accesspoint url #1743

Closed
wants to merge 2 commits into from

Conversation

wang2bo2
Copy link

Parsing S3 access point URI, like s3://arn:aws:s3:us-west-2:1234:accesspoint/abc/123.jpg, causes problems in infer_storage_options for the :accesspoint was parsed as port number. This PR addresses that.

@martindurant
Copy link
Member

Whilst this appears to do the right thing, I wonder if you see any way that this can be isolated in s3fs rather than here?

@martindurant
Copy link
Member

ping @wang2bo2

@wang2bo2
Copy link
Author

wang2bo2 commented Nov 5, 2024

Yes, we could remove the use of infer_storage_options and duplicate the extraction of url_query here
https://github.com/fsspec/s3fs/blob/f3f63cbfbfe71a4355abd63cafd8c678c4a5a0af/s3fs/core.py#L394

In the current infer_storage_options implementation, anything after the last : would be excluded from getting into options["host"]

options["host"] = parsed_path.netloc.rsplit("@", 1)[-1].rsplit(":", 1)[0]
even if the non-int port error is caught and handled. So I thought it's good to fix that too.

@martindurant
Copy link
Member

I think it would make sense to look for "accesspoint" in s3fs, but still use to infer_storage_options in the common case. Perhaps the accesspoint branch could simply modify the URL into something that makes sense. An ARN description isn't very much like a URL anyway...

@martindurant
Copy link
Member

Closing in favour of fsspec/s3fs#912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants