Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel2 Dataset filename_glob not fully extensible #290

Closed
KennSmithDS opened this issue Dec 15, 2021 · 3 comments
Closed

Sentinel2 Dataset filename_glob not fully extensible #290

KennSmithDS opened this issue Dec 15, 2021 · 3 comments
Labels
datasets Geospatial or benchmark datasets

Comments

@KennSmithDS
Copy link
Contributor

KennSmithDS commented Dec 15, 2021

Description of Issue
When attempting to use the Sentinel2 Dataset class, I am receiving an error that no Sentienl2 data was found in the root directory provided. This didn't work in either the parent data directory containing both features and labels, nor the sub-directories for each type.

Upon digging into the source code, the Sentinel2 Dataset class expects the raster filenames to comply with Sentinel2 naming convention:

filename_blob = "T*_*_B02_*m.*"

The files in the dataset I have been provided via the DrivenData.org competition has raster files named with just the {band}.tif, e.g. B02.tif, B03.tif. Therefore no .tif data is found in the root directory and the program throws the FileNotFoundError below:

Traceback (most recent call last):
  File "/home/ying/cloud_cover_competition/library_experiment.py", line 15, in <module>
    load_dataset(ROOT_PATH, CLOUD_BANDS)
  File "/home/ying/cloud_cover_competition/library_experiment.py", line 10, in load_dataset
    cloud_dataset = Sentinel2(root=path, bands=bands)
  File "/home/ying/anaconda3/envs/torch/lib/python3.9/site-packages/torchgeo/datasets/sentinel.py", line 100, in __init__
    super().__init__(root, crs, res, transforms, cache)
  File "/home/ying/anaconda3/envs/torch/lib/python3.9/site-packages/torchgeo/datasets/geo.py", line 251, in __init__
    raise FileNotFoundError(
FileNotFoundError: No Sentinel2 data was found in '/home/ying/cloud_cover_competition/data/'

Steps to Reproduce

  1. Dataset download instructions accessible through the DrivenData.org portal or directly from AzureBlobStorage, respectively:
  1. Run a Python script like in attached image
    torchgeo_dataloader_error

Expected Behavior
Sentientl2 Dataset loader should be extensible with unconventional filenames to find GeoTIFF files if the band number is in the filename:

  • B02.tif
  • B03.tif
  • B04.tif
  • B08.tif
  • etc...
@KennSmithDS
Copy link
Contributor Author

As an interim solution, I made a class that extends the base Sentinel2 class, and overrode the filename_glob and filename_regex properties:

custom_sentinel2_class

@calebrob6
Copy link
Member

calebrob6 commented Dec 15, 2021

Thanks for the thorough issue @KennSmithDS!

I think letting users specify unconventional filenames directly in the Sentinel2 dataset would be messy. Instead, I think the best way to do this is to create a new Dataset that extends Sentinel2 or RasterDataset as illustrated in a new example notebook that @RitwikGupta is creating in #283.

Edit: You are doing exactly that :). If you are feeling extremely motivated / have the time, you could directly add this Dataset to torchgeo as a "benchmark dataset" so that competitors can get it for free with torchgeo.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Dec 16, 2021
@adamjstewart
Copy link
Collaborator

It sounds like this confusion has been resolved, but if you have any other questions let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

3 participants