RAM optimizations, additional parameters (discussion) #33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The purpose of this PR is to discuss a few possible optimizations (RAM usage, coverage), additional parameters.
@foobarbecue feel free to cherry-pick what you deem useful ;)
RAM optimizations
Parsing NAC index
It works thanks to the
chunksize
parameter provided bypandas.read_csv
Then the join operation is done over each chunk of lines:
Computing pairs (overlay)
geopandas.overlay
and to some extentgeopandas
have performance issues.When we find many image candidates for pairs
geopandas.overlay
explodes (RAM and computation time).It wastes a lot of time computing a huge number of pairs that are mostly discarded by filters (sun geometry and area)
This PR adds a generator that yields chunks of pairs and filter them as they are generated.
That way RAM usage is kept low and it is possible to abort early if enough pairs were found.
And then the generator is used here:
Improved coverage
When looking for a minimal set of pairs that covers an area, it seems the way the code chooses a new point is flawed.
Instead of:
This PR uses:
This change helps with coverage close to the equator where finding pairs is harder.
New parameters
indfilepath
andlblfilepath
: paths for INDEX.TAB and INDEX.LBLmax_pairs
: stop looking for pairs as soon as we found at leastmax_pairs
miss_limit
: how many times we may fail to cover a point when providing--find-covering=True
incidence_range_low
,incidence_range_high
: filter out pairs for which image sun incidence are outside this range. This helps finding better pairs when close to north/south poles...json_output
: path for dumping the JSON containing the pairs.Misc
This PR also modifies the
download_NAC.py
so that it can read pairs from the json written byfind_stereo_pairs.py
and download the corresponding images (in parallel).