Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the CVPR Chesapeake dataset #100

Merged
merged 16 commits into from
Sep 3, 2021
Merged

Conversation

calebrob6
Copy link
Member

@calebrob6 calebrob6 commented Sep 1, 2021

Dataset from https://lila.science/datasets/chesapeakelandcover

Some notes:

  • _check_integrity behaves differently from other places, see Dataset downloading expected behavior pt. 2 #99
  • The dataset is made up of non-overlapping geotiffs
  • The dataset's index is in EPSG:3857 and we index into it with EPSG:3857 boxes, but the actual data is in EPSG:26917 and EPSG:26918.
  • In __getitem__ we grab the single geotiff that the query intersects with, warp the query geom to the CRS of that file, then return the data corresponding to the warped query.

To do:

  • upload the updated version of the dataset to lila.science
  • add a mechanism by which users can pick which layers they want, e.g. ["naip-new", "lc"] should return only those layers
  • pyproj dependency

@calebrob6
Copy link
Member Author

Ready for review -- just need to upload the new version of the dataset to lila.science and fill in the links.

requirements.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
torchgeo/datasets/cvpr_chesapeake.py Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Collaborator

I don't love having to add 2 more deps (seems every PR adds at least 1 dep) but these 2 seem unavoidable in the long run anyway.

@calebrob6
Copy link
Member Author

calebrob6 commented Sep 3, 2021

pyproj 3.1 doesn't support python 3.6 @adamjstewart - thoughts on dropping 3.6

https://pyproj4.github.io/pyproj/stable/history.html

see pyproj4/pyproj#782 for the threading problem that was fixed in 3.1 (I'm imagining we will run into problems if we try to do projections in __getitem__ when running a Dataloader with multiple workers)

@adamjstewart
Copy link
Collaborator

I would prefer not to drop 3.6 just because of a dependency. Numpy also doesn't support 3.6. Can we just install an older version on 3.6?

@calebrob6
Copy link
Member Author

sure how do you do that

@adamjstewart
Copy link
Collaborator

I would like to test with older versions of pyproj and shapely. Both of these are almost the latest version available. Having the minimum required version so high makes it difficult for people to install.

When I was testing the minimum supported version I basically created a new conda environment with Python 3.6, then pip installed all the deps, then pip installed successively older versions of each package until the tests no longer passed.

@calebrob6
Copy link
Member Author

Can you do this if you then? I do not think it is hard for users to install new versions of packages, pip install ... covers it.

@adamjstewart
Copy link
Collaborator

I can once you add unit tests that actually run this code.

No minimum version at all is better than setting the latest version to be the minimum version. Many packages aren't necessarily compatible with the latest version of everything. For example, rasterio does not support the latest version of GDAL. If we decided to require the latest version of GDAL, we wouldn't be able to build rasterio at all.

@calebrob6
Copy link
Member Author

Generally true, but that's not the case here. There exists a valid set of our required packages that work with pyproj>=3.0 and shapely>=1.7.0 so I do not see what the problem is.

In any case, I made a test for the shape warping functionality that we're using from pyproj and shapely, I'll go ahead and test some major versions of the two libraries so that this can be merged.

@adamjstewart
Copy link
Collaborator

Generally true, but that's not the case here. There exists a valid set of our required packages that work with pyproj>=3.0 and shapely>=1.7.0 so I do not see what the problem is.

If users only installed torchgeo and its dependencies that would be true, but users tend to write their own code and install things like torchgeo into much larger environments where conflicts may be possible.

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@13be614). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #100   +/-   ##
=======================================
  Coverage        ?   83.65%           
=======================================
  Files           ?       31           
  Lines           ?     1927           
  Branches        ?        0           
=======================================
  Hits            ?     1612           
  Misses          ?      315           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13be614...1cb219b. Read the comment docs.

environment.yml Outdated Show resolved Hide resolved
tests/test_crs_conversions.py Outdated Show resolved Hide resolved
torchgeo/datasets/__init__.py Outdated Show resolved Hide resolved
torchgeo/datasets/cvpr_chesapeake.py Outdated Show resolved Hide resolved
Comment on lines 42 to 43
crs = CRS.from_epsg(3857)
res = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these actually used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defining the properties from GeoDataset

torchgeo/datasets/cvpr_chesapeake.py Outdated Show resolved Hide resolved
environment.yml Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Collaborator

Might also need to add deps to docs/requirements.txt (added in main). I'm hoping we don't have to keep this file forever. This can either be done in this PR if you rebase or after its merged and the docs start failing to build.

@adamjstewart adamjstewart enabled auto-merge (rebase) September 3, 2021 22:38
@adamjstewart adamjstewart merged commit 04355ec into main Sep 3, 2021
@adamjstewart adamjstewart deleted the dataset/chesapeake_cvpr branch September 3, 2021 22:43
@adamjstewart
Copy link
Collaborator

Just realized we forgot to add this dataset to the docs: 8ab4211

@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants