Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlined site assessment process #50

Merged
merged 39 commits into from
Nov 28, 2024
Merged

Conversation

ConnectedSystems
Copy link
Collaborator

@ConnectedSystems ConnectedSystems commented Nov 7, 2024

Supersedes changes in the perf-changes branch.

Changes to assessment process for computational efficiency.

Settings:

  • Region: Cairns-Cooktown
  • type: Slope
  • Depth: -4.0:-2.0
  • Slope: 0.0:40.0
  • Rugosity: 0.0:6.0
  • SuitabilityThreshold: 95
  • xdist: 450
  • ydist: 50

Example endpoint: http://127.0.0.1:8000/suitability/site-suitability/Cairns-Cooktown/slopes?Depth=-4.0:-2.0&Slope=0.0:40.0&Rugosity=0.0:6.0&SuitabilityThreshold=95&xdist=450&ydist=50

Originally took 8-12 seconds for the site suitability assessment step alone (1406 potential locations).
Now down to ~6.5 seconds.

A different strategy for very large numbers of potential locations still needed.

@ConnectedSystems
Copy link
Collaborator Author

Applied KD-tree clustering to filter out pixels which are close to each other and not worthwhile re-assessing.

I have also adjusted requirements to be a bit tighter: previously, assessment of pixels by rotating a polygon around it was abandoned if a score was < 33%, but have raised that to 70%.

We were also rotating the polygon twice in 15 degree steps (in either direction). We are now only doing this once.

Doing this drastically reduces by orders of magnitude:

  • The number of pixels to assess
  • The number of total rotations

I have also re-enabled multi-threading so each potential pixel is assessed in parallel.
However, this increases memory use to close to 9GB. There are some things that could be done to reduce overall memory use, but will have to be investigated later.

The effect of these changes is that site suitability assessment takes ~70 seconds (compared to ~12mins previously) for Cairns-Cooktown.

The worst case seems to be the Far Northern region, where assessment took 3mins. From previously discussions with @arlowhite , this region previously could take upwards of 45-50mins.

Note that the criteria ranges used for demonstration (shown below) purposely did not specify ranges for all criteria so somewhat represents a "worst case".
On the other hand, the timings were taken on the laptop with 12 threads - could naively assume there's a 4x increase when run in the cloud...

http://127.0.0.1:8000/suitability/site-suitability/FarNorthern/slopes?Depth=-10.0:-2.0&Slope=0.0:40.0&Rugosity=0.0:6.0&SuitabilityThreshold=95&xdist=450&ydist=50

BG-AIMS and others added 28 commits November 23, 2024 18:14
Previously the buffering distance wasn't applied to max_count. Additionally have moved to max_offset to allow for full rotation of the search box without going outside of the rel_pix bounds. And adjusted options if there are no rel_pix for a search box (returns standard geometry to output valid DataFrame, but will be filtered out at later steps).
Add max_count documentation and capping at 1
Argument names updated for readability

Reduced number of arguments by moving dependent work inside function

Moved repeated calculations outside for loop

Use preallocated array (`loc_constraint`) to avoid reallocation of potentially 10s of thousands of values

Assign results directly to result arrays

Update raster-based method as well
This approach is faster and uses ~5x less memory
Fix typos in docstring
Standardize column names to `lons` and `lats` (although I actually prefer the singular over the plural, but this is how it is in the lookup tables)
Idea is to remove unnecessary buffer areas around valid pixels.
However this needs to be done in the pre-processing step, not on data load. But I am much too tired to do it right now, so committing this so the idea isn't lost.
Previously a "search geometry" would be created and moved/rotated about each pixel location to be assessed.

i.e., [n pixels]^2 * [r rotations] operations

We now instead pre-generate the rotated polygons and move these to each pixel location to be assessed.

i.e., [r rotations] + [n pixels] operations
Apply KD-tree clustering to identify "groups" of suitable pixels and only assess the "center" pixel of the group.

Further constrain search by adopting a higher requirement (polygons must have suitable pixels in >= 70% of its area compared to 33% previously).

This could be further constrained by defaulting from 95% suitability scores (proportion of area around target pixel meeting suitability criteria) to 99%.
@@ -211,6 +225,9 @@ function setup_region_routes(config, auth)
)
)

# Specifically clear from memory to invoke garbage collector
assessed = nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be necessary, assessed is local to this function. Did you actually see this make a difference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yes it does make a difference.

Triggers the GC to run a bit earlier, making space for the next big dataset. Otherwise you're more likely to hit an out-of-memory error.

@@ -111,6 +111,19 @@ function _write_cog(file_path::String, data::Raster, config::Dict)::Nothing
return nothing
end

function _write_tiff(file_path::String, data::Raster, config::Dict)::Nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config not used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think because we went from writing out a COG (which used the defined options) to writing out a geotiff...

I'll fix it up, thanks for catching.

end

return min(score[argmax(score)], 1),
argmax(score) - (n_per_side + 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would read better with an indent, but maybe formatter didn't like it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-format issue - it'll unindent the lines underneath ☹️

@ConnectedSystems ConnectedSystems merged commit 94a4d63 into main Nov 28, 2024
1 check passed
@ConnectedSystems ConnectedSystems deleted the streamlined-process branch November 28, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants