Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridGeoSampler: change stride of last patch to sample entire ROI #630

Merged
merged 23 commits into from
Sep 3, 2022

Conversation

remtav
Copy link
Contributor

@remtav remtav commented Jun 28, 2022

This PR changes the way in which GridGeoSampler samples patches from each tile if the tile size is not a multiple of the stride. We want to cover the entire tile, requiring us to adjust the stride of the final row/column.

This also changes the number of patches returned from each tile. Let $i$ be the size of the input tile. Let $k$ be the requested size of the output patch. Let $s$ be the requested stride. Let $o$ be the number of output rows/columns sampled from each tile.

Before

$$ o = \left\lfloor \frac{i - k}{s} \right\rfloor + 1 $$

After

$$ o = \left\lceil \frac{i - k}{s} \right\rceil + 1 $$

Reboot of #448
Fixes #431

@github-actions github-actions bot added samplers Samplers for indexing datasets testing Continuous integration testing labels Jun 28, 2022
@remtav
Copy link
Contributor Author

remtav commented Jun 28, 2022

@adamjstewart this is ready for review. I added more tests than necessary. I can remove some if you'd prefer.
Also, I'm not sure how to reproduce the problem you mentionned when two tiles that have a common edge face create a sample with empty area. Tips to implement this tests are welcome.
Currently, see TODO

@remtav
Copy link
Contributor Author

remtav commented Jun 28, 2022

It seems like a test with the NAIP dataset is failing, but on Windows, the tests passes. Can you try on Ubuntu @adamjstewart ? I'll be back home thursday to test on Ubuntu locally if needed.

@remtav
Copy link
Contributor Author

remtav commented Jun 29, 2022

@adamjstewart I added you as collaborator on my fork if you'd like to push some modifications. I'll be in and out until next week

@calebrob6
Copy link
Member

@remtav, can you add me as well?

@calebrob6 calebrob6 closed this Jul 2, 2022
@calebrob6 calebrob6 reopened this Jul 2, 2022
@calebrob6
Copy link
Member

Okay so I just played with this for awhile, the NAIPChesapeakeDataModule has stride hardcoded as 128 which I think makes the 2nd box generated by the sampler jump out of the dataset bounds. The test data is 218x128 and strides of < 64 seem to work.

@adamjstewart adamjstewart added this to the 0.3.0 milestone Jul 9, 2022
@adamjstewart
Copy link
Collaborator

Wait a minute. Do we want to sample beyond the ROI?? What if I have a geospatial train/val/test split and the sampler goes beyond the ROI of train into val/test? Wouldn't that be bad?

@calebrob6
Copy link
Member

An alternative behavior to make sure we sample all space within the ROI without sampling beyond is to force an additional overlapping box along the right and bottom sides of the ROI.

@adamjstewart
Copy link
Collaborator

@calebrob6 which of these possible solutions does that correspond to? #448 (comment)

@calebrob6
Copy link
Member

calebrob6 commented Jul 9, 2022

Adjust the stride of the last patch only -- e.g. of the logic is here https://github.com/microsoft/poultry-cafos/blob/main/cafo/data/TileDatasets.py#L65

@adamjstewart adamjstewart modified the milestones: 0.3.0, 0.3.1 Jul 9, 2022
@remtav
Copy link
Contributor Author

remtav commented Jul 10, 2022

Adjust the stride of the last patch only -- e.g. of the logic is here https://github.com/microsoft/poultry-cafos/blob/main/cafo/data/TileDatasets.py#L65

That's also the solution I had implemented in my first solution (draft):
cb554c6

@adamjstewart
Copy link
Collaborator

Trying to release 0.3.1 in approximately 1 week. @remtav can you revert back to your original solution and we can iterate from there? If you're busy, I can take over from here. I think we've agreed on what the correct behavior should be, which was the hardest part. Note that #736 may also affect the same lines of code.

@remtav
Copy link
Contributor Author

remtav commented Aug 25, 2022

Trying to release 0.3.1 in approximately 1 week. @remtav can you revert back to your original solution and we can iterate from there? If you're busy, I can take over from here. I think we've agreed on what the correct behavior should be, which was the hardest part. Note that #736 may also affect the same lines of code.

Sure, I can revert and build on top of initial solution (adjust stride of last patch) with unit tests and formatting clean up. I'll push at the latest on monday.

…r row/col and issue warning"

This reverts commit cb554c6
@remtav remtav dismissed a stale review via af5a3d1 August 29, 2022 16:12
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to check to make sure the rounding is right, will compare to #736 later.

torchgeo/samplers/single.py Outdated Show resolved Hide resolved
torchgeo/samplers/single.py Outdated Show resolved Hide resolved
@remtav
Copy link
Contributor Author

remtav commented Aug 29, 2022

Ready for review @adamjstewart
As for rounding issues, I've favored using /, then math.ceil, rather than the original // and +1 approach (which doesn't seem right when ROI and grid sampling area are equal anyways):

Old:
int((bounds.maxy - bounds.miny - self.size[0]) // self.stride[0]) + 1
New:
math.ceil((bounds.maxy - bounds.miny - self.size[0] + self.stride[0]) / self.stride[0]

Hopefully that does it, but please double check. There might also be small conflicts with #736.

@remtav remtav requested a review from adamjstewart August 30, 2022 20:16
@adamjstewart adamjstewart changed the title GridGeoSampler: Sample beyond roi limit if needed GridGeoSampler: change stride of last patch to sample entire ROI Sep 1, 2022
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks correct for all situations I can think of. Just a few remaining questions/suggestions before this is ready to merge.

tests/samplers/test_single.py Outdated Show resolved Hide resolved
torchgeo/samplers/single.py Outdated Show resolved Hide resolved
torchgeo/samplers/single.py Outdated Show resolved Hide resolved
tests/samplers/test_single.py Outdated Show resolved Hide resolved
tests/samplers/test_single.py Outdated Show resolved Hide resolved
tests/samplers/test_single.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart enabled auto-merge (squash) September 3, 2022 04:06
@adamjstewart
Copy link
Collaborator

Thanks for the fix @remtav! Sorry this took so long to merge.

@adamjstewart adamjstewart merged commit 7fa0fd4 into microsoft:main Sep 3, 2022
@adamjstewart adamjstewart deleted the samplers/gridgeosampler_bounds branch September 3, 2022 04:11
Modexus pushed a commit to Modexus/torchgeo that referenced this pull request Sep 3, 2022
…rosoft#630)

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue microsoft#448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <[email protected]>
adamjstewart added a commit that referenced this pull request Sep 3, 2022
* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue #448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <[email protected]>
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
…rosoft#630)

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue microsoft#448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GridGeoSampler may never reach end of image
3 participants