-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check if path is vsi #1612
Add check if path is vsi #1612
Conversation
Why not check for all files containing |
I forgot to push the isfile in
I don't think it does. But it is a valid vsi scheme.
Should be sufficient, but then we only support the apache/rasterio syntax Edit: Also I think it is good practice to verify the vsi syntax against the ones supported by rasterio. Then we can yield meaningful error messages if there are typos. But I guess rasterio would complain that the file is not found. The |
Yes, I usually try to avoid using these. I wonder if there's a different way we can access that info... |
Could create a feature request in rasterio for non-private methods/classes for doing this verification? |
This could be useful, but it will be deprecated and will be removed in rasterio 1.4. Cannot find any information pointing to why or what will replace it. I'll ask in the rasterio discussion board. from rasterio.path import ParsedPath
p = ParsedPath.from_uri("zip://*.tif::files.zip")
p.scheme # 'zip'
p.is_local # True
p.is_remote # False I see that rasterio are working on api-changes for opening vsi. Pasting here for reference. I do not yet know if this means will need to change anything when upgrading rasterio. docs: |
FYI, I started this 'Q&A' over at rasterio |
Rasterio don't want to expose the path-validation, and say this API is prone to change (as we expected). They suggest copying in the relevant parts of this validation, as I do in this PR. What do we think. Are we on the right track here? I do think it would be practical to give the users the possibility to implement their own logic for files that are not visible to |
I'm fine with keeping the check on our side as the changes should be "someone wants a new VSI in GeoDataset" vs. an update to |
My concern is that if the list of supported schemes changes according to rasterio version, it will be annoying to validate in torchgeo. What's the downside of treating all paths with a |
4b5d067
to
2b9f13b
Compare
I agree. If the user inputs invalid paths, then this line will complain.
Should we somehow inform users about this difference? This is outside the scope of this bug-fix and is something we can fix when the use of vsi shows/grows. In the mean time users can implement the listing of files themselves e.g. by overriding property When rasterio settles on how they treat vsi then we can take a new look at the possibilities it may bring. |
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
4bf52dd
to
28899cb
Compare
…path_in_paths' into bugfix/fix_invalid_path_in_paths
Wait with pathlib syntax Co-authored-by: Adam J. Stewart <[email protected]>
When I try to download AbovegroundLiveWoodyBiomassDensity, it complains that my directory doesn't exist and will be ignored. This is because we check |
* Add check if path is vsi * Add url to reference for apache vsi syntax * Add missing check to if * Copy rasterio SCHEMES definition into torchgeo * Check all schemes, not only last * Simplify method path_is_vsi * Add tests * Remove print * Update test names * Add missing comma in list * Update torchgeo/datasets/utils.py Co-authored-by: Adam J. Stewart <[email protected]> * Update torchgeo/datasets/utils.py Co-authored-by: Adam J. Stewart <[email protected]> * Use pytest tmp_path for test * Warn if some of input paths are invalid * Update docstring for mocked class * Handle tests failing due to UserWarning * Remove unnecessary filterwarning * Test CustomGeoDataset instead of MockRasterDataset * Merge two similar tests * str instead of as_posix Wait with pathlib syntax Co-authored-by: Adam J. Stewart <[email protected]> --------- Co-authored-by: Adrian Tofting <[email protected]> Co-authored-by: Adam J. Stewart <[email protected]>
Right. Would it make sense to create a common verify-method and ignore the warning there? This way the ignore is only "required" one place in the code. |
I would love a common |
Fix #1605
In 0.5 we included property
files
in GeoDatasets.files
tries to list all files matching thefilename_glob
. It also tries to support virtual file systems. In doing so, it accidentally accepts non-directories as valid files.The proposed solution will do a basic check of the path prefix to see if it looks like a vsi. Paths not matching these criteria will be ignored (to match behaviour pre version 0.5). Could
raise FileNotFound
instead. Then_verify
-methods of Datasets couldtry-except
to decide if files need downloading.Current limitations: