-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weighted geo sampler #757
Comments
I like the idea, but how would you implement it? Unlike |
You could get a list of filenames from RasterDataset's index, compute
weights, then pass those to the sampler. I'll note this is a good reason
why RasterDatasets should be able to be instantiated from a list of
filenames.
…On Mon, Sep 5, 2022 at 10:02 AM Adam J. Stewart ***@***.***> wrote:
I like the idea, but how would you implement it? Unlike NonGeoDatasets,
GeoDatasets will recursively search for files on disk, so you can't just
pass in a list of weights. You could compute those weights, but how would
you make a single class that is generic enough to allow users to do this?
—
Reply to this email directly, view it on GitHub
<#757 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTUUTYP7NYQUHZNPDALV4YRQRANCNFSM6AAAAAAQE3BW3U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This feels a bit fragile. For example, if your dataset is an
Should be easier to support a list of filenames for instantiation when we move to TorchData. |
Should be easier to support a list of filenames for instantiation when we
move to TorchData.
I don't understand why this is particularly hard now, I guess I need to try
it.
…On Mon, Sep 5, 2022 at 11:44 AM Adam J. Stewart ***@***.***> wrote:
You could get a list of filenames from RasterDataset's index, compute
weights, then pass those to the sampler.
This feels a bit fragile. For example, if your dataset is an
IntersectionDataset or UnionDataset, you now need to be more careful
because each "hit" could be both image and label, or from a different
dataset entirely. But yes, this could work.
I'll note this is a good reason why RasterDatasets should be able to be
instantiated from a list of filenames.
Should be easier to support a list of filenames for instantiation when we
move to TorchData.
—
Reply to this email directly, view it on GitHub
<#757 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTV2E2S6BCZ65NMPFU3V4Y5PTANCNFSM6AAAAAAQE3BW3U>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It's not hard to support without TorchData, but it becomes easier to support with TorchData because the user can construct their own data loading pipeline with a set of common operations. So they can choose whether they want to specify a list of files, or recursively search a directory, or use a STAC API, or whatever. I also still need to investigate TorchData. I'm hoping it doesn't put all of the work on the user. |
This seems like 2 separate problems.
|
This is the broader problem. One of the ways I would approach this would be to generate a grid based on a user-specified criteria (pixel width, pixel height and nSamples), then get the percentage cover of each label value per grid cell (patch), lastly filter out any patches that do not meet the weight criteria specified by the user? for example, in my case, any cell with less than equal to 50% cover of zero is allowed. I could quickly and easily do this in earth engine but have no idea how to go about this using python. I will implement this in GEE to preprocess the data I use for now. In the case of a regression problem and in my case, it just the zero value that is problematic. so the problem is slightly more simplified compared to multi-class classification problem.
it is beneficial to have some zero labels to learn from. Also I do not think torchgeo supports irregular polygons, only bounding boxes for intersection datasets. |
FYI, we are planning on working on this for our time series efforts. All samplers will allow users to pass in weights, not just a single |
Summary
in the scenario of imbalanced datasets, the use of the current samplers may not assist with imbalanced samples.
I am currently trying to get rid of samples with only 0 metre heights in the mask (water regions).
Rationale
No response
Implementation
No response
Alternatives
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: