-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamlined site assessment process #50
Conversation
Applied KD-tree clustering to filter out pixels which are close to each other and not worthwhile re-assessing. I have also adjusted requirements to be a bit tighter: previously, assessment of pixels by rotating a polygon around it was abandoned if a score was < 33%, but have raised that to 70%. We were also rotating the polygon twice in 15 degree steps (in either direction). We are now only doing this once. Doing this drastically reduces by orders of magnitude:
I have also re-enabled multi-threading so each potential pixel is assessed in parallel. The effect of these changes is that site suitability assessment takes ~70 seconds (compared to ~12mins previously) for Cairns-Cooktown. The worst case seems to be the Far Northern region, where assessment took 3mins. From previously discussions with @arlowhite , this region previously could take upwards of 45-50mins. Note that the criteria ranges used for demonstration (shown below) purposely did not specify ranges for all criteria so somewhat represents a "worst case". |
Previously the buffering distance wasn't applied to max_count. Additionally have moved to max_offset to allow for full rotation of the search box without going outside of the rel_pix bounds. And adjusted options if there are no rel_pix for a search box (returns standard geometry to output valid DataFrame, but will be filtered out at later steps).
Add max_count documentation and capping at 1
Argument names updated for readability Reduced number of arguments by moving dependent work inside function Moved repeated calculations outside for loop Use preallocated array (`loc_constraint`) to avoid reallocation of potentially 10s of thousands of values Assign results directly to result arrays Update raster-based method as well
This approach is faster and uses ~5x less memory
Fix typos in docstring
Standardize column names to `lons` and `lats` (although I actually prefer the singular over the plural, but this is how it is in the lookup tables)
Idea is to remove unnecessary buffer areas around valid pixels. However this needs to be done in the pre-processing step, not on data load. But I am much too tired to do it right now, so committing this so the idea isn't lost.
Previously a "search geometry" would be created and moved/rotated about each pixel location to be assessed. i.e., [n pixels]^2 * [r rotations] operations We now instead pre-generate the rotated polygons and move these to each pixel location to be assessed. i.e., [r rotations] + [n pixels] operations
Apply KD-tree clustering to identify "groups" of suitable pixels and only assess the "center" pixel of the group. Further constrain search by adopting a higher requirement (polygons must have suitable pixels in >= 70% of its area compared to 33% previously). This could be further constrained by defaulting from 95% suitability scores (proportion of area around target pixel meeting suitability criteria) to 99%.
510a171
to
ddaf69f
Compare
@@ -211,6 +225,9 @@ function setup_region_routes(config, auth) | |||
) | |||
) | |||
|
|||
# Specifically clear from memory to invoke garbage collector | |||
assessed = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be necessary, assessed
is local to this function. Did you actually see this make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes it does make a difference.
Triggers the GC to run a bit earlier, making space for the next big dataset. Otherwise you're more likely to hit an out-of-memory error.
src/criteria_assessment/criteria.jl
Outdated
@@ -111,6 +111,19 @@ function _write_cog(file_path::String, data::Raster, config::Dict)::Nothing | |||
return nothing | |||
end | |||
|
|||
function _write_tiff(file_path::String, data::Raster, config::Dict)::Nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config
not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think because we went from writing out a COG (which used the defined options) to writing out a geotiff...
I'll fix it up, thanks for catching.
end | ||
|
||
return min(score[argmax(score)], 1), | ||
argmax(score) - (n_per_side + 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would read better with an indent, but maybe formatter didn't like it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-format issue - it'll unindent the lines underneath
Supersedes changes in the
perf-changes
branch.Changes to assessment process for computational efficiency.
Settings:
Example endpoint: http://127.0.0.1:8000/suitability/site-suitability/Cairns-Cooktown/slopes?Depth=-4.0:-2.0&Slope=0.0:40.0&Rugosity=0.0:6.0&SuitabilityThreshold=95&xdist=450&ydist=50
Originally took 8-12 seconds for the site suitability assessment step alone (1406 potential locations).
Now down to ~6.5 seconds.
A different strategy for very large numbers of potential locations still needed.