Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VHR-10 dataset plotting #2092

Merged
merged 11 commits into from
May 29, 2024
Merged

Conversation

robmarkcole
Copy link
Contributor

@robmarkcole robmarkcole commented May 28, 2024

Fix plotting using percentiles

Note: I have uploaded the dataset at https://huggingface.co/datasets/satellite-image-deep-learning/VHR-10 - can transfer to torchgeo org if you want to use it here

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets labels May 28, 2024
@ashnair1 ashnair1 changed the title Update VHR-10 dataset plotting and README into, address #2069 Update VHR-10 dataset plotting and README code snippet May 28, 2024
@ashnair1
Copy link
Collaborator

ashnair1 commented May 28, 2024

I get the normalization fix but why is the README code snippet being changed? Once 0.6.0 is released, the snippet should work. Also as I understand it, the goal of the snippet is to show how the dataloader works not how to train using datamodules

@robmarkcole
Copy link
Contributor Author

I just assumed the docs predate the datamodules but we would like to showcase this feature - personally I always use datamodules now but perhaps am in the minority there. Suggest a second opinion before I change the PR

@robmarkcole
Copy link
Contributor Author

There is some fishy behaviour still - in the load_target the image id has one subtracted here - why not use as below?

annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id_))

@robmarkcole
Copy link
Contributor Author

Applied black as per the guide, and this has changed ALL of the strings

@ashnair1
Copy link
Collaborator

Applied black as per the guide, and this has changed ALL of the strings

torchgeo no longer uses black, isort etc and has recently switched to ruff for linting and code formatting.

@ashnair1
Copy link
Collaborator

There is some fishy behaviour still - in the load_target the image id has one subtracted here - why not use as below?

annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id_))

Refer to #847 (comment)

@robmarkcole
Copy link
Contributor Author

robmarkcole commented May 29, 2024

@ashnair1 I checked the approach I suggested (imgIds=id_)) and this results in erroneous masks - strange as the API appears to support this approach, suggesting the implementation has some quirks

image

The approach I am using elsewhere:

  1. Get list of image ids: self.ids = list(sorted(self.coco.imgs.keys())) and use in getitem: id_ = self.ids[index]
  2. Access filename using id_: self.coco.imgs[id_]['file_name']
  3. Access annotations using API: annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id_))

@adamjstewart
Copy link
Collaborator

Regarding the README

The README has a few different sections. The section you're editing is designed to show off our benchmark datasets. There is a separate section below specifically for PyTorch Lightning, including data modules. I would prefer to keep using a dataset instead of a data module here.

Actually, I think we're better off choosing a different dataset (maybe EuroSAT?). VHR-10 is too complicated and makes it look like TorchGeo is hard to use. Maybe we could move VHR-10 to the Lightning section of the README since it has a pretty visualization.

Regarding plotting

Yes, this is a common issue with plotting via dataset vs. plotting via data module. I agree with your solution, and proposed basically the same thing for all datasets in #1263. This issue is not specific to VHR-10, but affects almost all datasets/data modules. I haven't had the time/energy to fix this for all datasets, but contributions are definitely welcome here, even if it's only 1 dataset at a time.

Can we separate the README and plotting changes into two separate PRs?

@adamjstewart adamjstewart modified the milestones: 0.5.3, 0.6.0 May 29, 2024
@adamjstewart
Copy link
Collaborator

Note: I have uploaded the dataset at https://huggingface.co/datasets/satellite-image-deep-learning/VHR-10 - can transfer to torchgeo org if you want to use it here

It's easier to download tar/zip/rar files instead of raw data. But I'm happy to change the download to your org or ours. Not sure if we should rehost all rar datasets as tar/zip to make them easier to extract?

@github-actions github-actions bot removed the documentation Improvements or additions to documentation label May 29, 2024
@robmarkcole robmarkcole changed the title Update VHR-10 dataset plotting and README code snippet Update VHR-10 dataset plotting May 29, 2024
@robmarkcole
Copy link
Contributor Author

@adamjstewart this PR now just addresses the plotting.

RE dataset choice, I agree EuroSAT would be a nice choice - it also supports selecting subsets of bands so shows immediately what is possible with torchgeo

@adamjstewart adamjstewart merged commit 6a9cd53 into microsoft:main May 29, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants