Bugfix/refactor area calculation #1699

alukach · 2017-08-03T21:25:01Z

Proposed changes in this pull request

This PR resolves the area calculation issues brought up in #1689.

As mentioned by @oliverroick, there are two changes that need to be made:

Recalculate the area field for all currently existing location instances in our DB.
Update how the area field will be calculated for any location instances saved in the future.

To address the first question, a new migration has been added to instruct the DB correctly calculate the area value for all rows with geometry type of POLYGON or MULTIPOLYGON.

        migrations.RunSQL(
            "UPDATE spatial_spatialunit SET area = ST_Area(geography(geometry)) "
            "WHERE geometrytype(geometry) LIKE '%POLYGON';",
            reverse_sql=migrations.RunSQL.noop
        )

This will overwrite the currently existing area values. It should be decently performant, faster at least than calculating this data in Python.

To address the second question (calculating area for new SpatialUnit rows), I deleted the currently existing pre_save signal and moved the logic to the DB as a DB trigger. This was done because of the fact that calculating the area via Python produced slightly different values than via PostGIS:

In [1]: from functools import partial
   ...:
   ...: from django.db import connection
   ...: from shapely import ops
   ...: from shapely.wkt import loads
   ...: import pyproj
   ...:
   ...:
   ...: WKT = 'Polygon ((24.92122528665211334 60.14935271120948812, 24.92222637615321901 60.14908991700298202, 24.92152698485792683 60.14840732483580865, 24.92051903857941753 60.14867353746650736, 24.92122528665211334 60.14935271120948812
   ...: ))'
   ...:
   ...: geom = loads(WKT)
   ...: project = partial(
   ...:     pyproj.transform,
   ...:     pyproj.Proj(init='EPSG:4326'),
   ...:     pyproj.Proj(proj='aea', lat1=geom.bounds[1], lat2=geom.bounds[3])
   ...: )
   ...: geom = ops.transform(project, geom)
   ...: print('pyproj ', geom.area)
   ...:
   ...: with connection.cursor() as cursor:
   ...:     sql = "SELECT ST_Area(geography(ST_GeomFromText(%s)))"
   ...:     cursor.execute(sql, [WKT])
   ...:     results = cursor.fetchone()
   ...:     print('postgis', results[0])
   ...:
pyproj  5383.540188133147
postgis 5383.5162784872

This would mean that the values calculated via DB migration would shift slightly the next time that the SpatialUnit instance was saved. The advantage of using a trigger is that we ensure that the area will be calculated even when using the bulk_create and update methods. The downside is that the DB-calculated value is not set on the object when the object is saved. To get around this, I wired up a post_save signal to retrieve this value and add it to the instances area field if it should have been generated.

Other changes: For some reason, MultiPolygons didn't have their area calculated. I've updated the code to support that geometry type.

The SpatialRelationshipManager was checking if one SpatialUnit contained another. This was done via a DB query. Using geography=True on the SpatialUnit.geometry field removed the ability to query via contains. I did a quick refactor to do the containment check in Python via GEOS.

When should this PR be merged

No dependencies.

Risks

None to mention.

Follow-up actions

[List any possible follow-up actions here; for instance, testing data
migrations, software that we need to install on staging and production
environments.]

Checklist (for reviewing)

General

Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
- Review 1
- Review 2
Is the PR labeled correctly? It should have the migration label if a new migration is added.
- Review 1
- Review 2
Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
- Review 1
- Review 2

Functionality

Are all requirements met? Compare implemented functionality with the requirements specification.
- Review 1
- Review 2
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
- Review 1
- Review 2

Code

Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
- Review 1
- Review 2
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
- Review 1
- Review 2
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
- Review 1
- Review 2
Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
- Review 1
- Review 2
Has the scalability of this change been evaluated?
- Review 1
- Review 2
Is there a maintenance plan in place?
- Review 1
- Review 2

Tests

Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
- Review 1
- Review 2
If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
- Review 1
- Review 2
If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
- Review 1
- Review 2

Security

Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
- Review 1
- Review 2
Are all UI and API inputs run through forms or serializers?
- Review 1
- Review 2
Are all external inputs validated and sanitized appropriately?
- Review 1
- Review 2
Does all branching logic have a default case?
- Review 1
- Review 2
Does this solution handle outliers and edge cases gracefully?
- Review 1
- Review 2
Are all external communications secured and restricted to SSL?
- Review 1
- Review 2

Documentation

Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
- Review 1
- Review 2
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
- Review 1
- Review 2
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.
- Review 1
- Review 2

alukach · 2017-08-03T21:27:26Z

I've been testing this PR against a polygon drawn about this rectangular building: https://www.openstreetmap.org/#map=17/60.14876/24.92242&layers=TN

Polygon ((24.92122528665211334 60.14935271120948812, 24.92222637615321901 60.14908991700298202, 24.92152698485792683 60.14840732483580865, 24.92051903857941753 60.14867353746650736, 24.92122528665211334 60.14935271120948812))

amplifi · 2017-08-04T08:32:06Z

This looks ok so far -- we should test in isolation on staging (migration, trigger changes, calculation).

I'll leave it to @oliverroick to decide whether we revert the original area commit, test the whole feature in staging, then do a patch release next week, or push the sprint release date by a day or two to accommodate the additional testing.

oliverroick · 2017-08-04T11:37:00Z

I'll leave it to @oliverroick to decide whether we revert the original area commit, test the whole feature in staging, then do a patch release next week, or push the sprint release date by a day or two to accommodate the additional testing.

Let's push the release by a few days and test it next week.

I also added another commit, that fixes the failing tests. Since we're using geographies now, some spatial lookups are not supported anymore. You can cast the geography back to a geometry and run the spatial lookups; which is what I did.

alukach · 2017-08-04T16:02:29Z

cadasta/xforms/mixins/model_helper.py

+
+                    location = duplicate.spatial_units.annotate(
+                        geom=Cast('geometry', GeometryField())
+                    ).get(geom=geom, **attrs)


👍 Thanks for the fixup! I was struggling to understand where the ~= operator was being used. It makes sense that it was this .get call.

alukach · 2017-08-04T16:06:00Z

Btw, here's a little helper to run from the command line to verify the triggers were installed on your db:

sudo su postgres -c "
    psql -c \"
        SELECT event_object_table
              ,trigger_name
              ,event_manipulation
              ,action_statement
              ,action_timing
        FROM  information_schema.triggers
        WHERE event_object_table = 'spatial_spatialunit'
        ORDER BY event_object_table
             ,event_manipulation;
    \" cadasta "

(env)vagrant@vagrant-ubuntu-trusty-64:/vagrant/cadasta$ sudo su postgres -c "
>     psql -c \"
>         SELECT event_object_table
>               ,trigger_name
>               ,event_manipulation
>               ,action_statement
>               ,action_timing
>         FROM  information_schema.triggers
>         WHERE event_object_table = 'spatial_spatialunit'
>         ORDER BY event_object_table
>              ,event_manipulation;
>     \" cadasta "
 event_object_table  |      trigger_name      | event_manipulation |          action_statement          | action_timing
---------------------+------------------------+--------------------+------------------------------------+---------------
 spatial_spatialunit | calculate_area_trigger | INSERT             | EXECUTE PROCEDURE calculate_area() | BEFORE
 spatial_spatialunit | calculate_area_trigger | UPDATE             | EXECUTE PROCEDURE calculate_area() | BEFORE
(2 rows)

alukach · 2017-08-08T17:02:54Z

@oliverroick @amplifi any opposition if I simplify that signal a bit?

seav · 2017-08-09T04:20:19Z

I've done some area testing in Helsinki and Singapore and the areas are comparable to the areas computed using QGIS. I get a difference in area of −1.3% in Singapore (~1°N) to +0.6% in Helsinki (~60°N). The difference is probably caused by using the spherical version of ST_Area instead of the more accurate but slower spheroidal version (see PostGIS docs). I wanted to try using the spheroidal version but this requires PROJ.4 version 4.9 and up but Ubuntu only supports version 4.8 in Trusty.

I guess this is OK as it is. We just need to be clear to our partners that the area values should be considered only approximate and not to be taken as gospel truth.

oliverroick

I tested this on staging and it should be good to go from my end.

oliverroick · 2017-08-09T12:59:47Z

@alukach

any opposition if I simplify that signal a bit?

Let's do that after the code is shipped.

alukach · 2017-08-09T14:50:47Z

@seav Thanks for the testing. Reading through the link you've added:

For geography, by default area is determined on a spheroid with units in square meters. To measure around the faster but less accurate sphere, use ST_Area(geog,false).

We are currently doing calculations on a geography and not turning off the spheroidal calculation, so we are currently returning the most accurate calculation via spheroid (not sphere) that we can do via PostGIS tooling at this given time, right? Would you agree with this statement?

The only way we could improve it would rely on the next bit of docs:

Enhanced: 2.2.0 - measurement on spheroid performed with GeographicLib for improved accuracy and robustness. Requires Proj >= 4.9.0 to take advantage of the new feature.

We should make sure that we're using the latest/greatest version of PostGIS (and thus Proj) to ensure that the calculation is as accurate as possible.

alukach · 2017-08-09T14:57:54Z

And regarding the slight discrepancies, I would note that I never got a unified result for area calculation throughout my experimentation/development of this feature. As shown in the first comment, PostGIS and Python had slightly different results. QGIS also had slightly different results. Definitely not as bad as ~1%, but they all varied slightly. My guess is that each tool employs slightly different algorithms and the results show this.

I am not sure which one is more correct than others, however if I were to place a bet, I'd put it on the PostGIS/Proj solution.

seav · 2017-08-09T16:02:37Z

We are currently doing calculations on a geography and not turning off the spheroidal calculation, so we are currently returning the most accurate calculation via spheroid that we can do via PostGIS tooling at this given time, right? Would you agree with this statement?

Based on the documentation, I guess PostGIS should be using spheroidal calculation. Anyway, I did some more testing and I took your branch, added false to the ST_Area() function call and I got different values which does indicate that the existing code uses the spheroidal method. So I guess this PR should be OK. I also agree that we should upgrade dependencies when we have the chance.

alukach · 2017-08-09T18:15:09Z

Okay, #1709 is available if we want to simplify the signal slightly.

alukach commented Aug 4, 2017

View reviewed changes

oliverroick requested review from oliverroick and amplifi August 8, 2017 12:14

oliverroick added PR: migration PR: needs review PR: needs testing PR: bugfix labels Aug 8, 2017

oliverroick approved these changes Aug 9, 2017

View reviewed changes

amplifi approved these changes Aug 9, 2017

View reviewed changes

amplifi added PR: approved and removed PR: needs review PR: needs testing labels Aug 9, 2017

Anthony Lukach and others added 2 commits August 9, 2017 18:57

Correctly calculate area (fixes #1689)

3b9bf2d

Cast geography to geometry when making ~= spatial lookups

511c4dc

amplifi force-pushed the bugfix/refactor-area-calculation branch from 2f299e7 to 511c4dc Compare August 9, 2017 18:01

amplifi merged commit eed4943 into master Aug 9, 2017

oliverroick mentioned this pull request Aug 16, 2017

Project area trigger #1716

Merged

46 tasks

oliverroick deleted the bugfix/refactor-area-calculation branch September 27, 2017 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/refactor area calculation #1699

Bugfix/refactor area calculation #1699

alukach commented Aug 3, 2017 •

edited

Loading

alukach commented Aug 3, 2017

amplifi commented Aug 4, 2017

oliverroick commented Aug 4, 2017

alukach Aug 4, 2017

alukach commented Aug 4, 2017 •

edited

Loading

alukach commented Aug 8, 2017

seav commented Aug 9, 2017

oliverroick left a comment

oliverroick commented Aug 9, 2017

alukach commented Aug 9, 2017 •

edited

Loading

alukach commented Aug 9, 2017 •

edited

Loading

seav commented Aug 9, 2017

alukach commented Aug 9, 2017

Bugfix/refactor area calculation #1699

Bugfix/refactor area calculation #1699

Conversation

alukach commented Aug 3, 2017 • edited Loading

Proposed changes in this pull request

When should this PR be merged

Risks

Follow-up actions

Checklist (for reviewing)

General

Functionality

Code

Tests

Security

Documentation

alukach commented Aug 3, 2017

amplifi commented Aug 4, 2017

oliverroick commented Aug 4, 2017

alukach Aug 4, 2017

Choose a reason for hiding this comment

alukach commented Aug 4, 2017 • edited Loading

alukach commented Aug 8, 2017

seav commented Aug 9, 2017

oliverroick left a comment

Choose a reason for hiding this comment

oliverroick commented Aug 9, 2017

alukach commented Aug 9, 2017 • edited Loading

alukach commented Aug 9, 2017 • edited Loading

seav commented Aug 9, 2017

alukach commented Aug 9, 2017

alukach commented Aug 3, 2017 •

edited

Loading

alukach commented Aug 4, 2017 •

edited

Loading

alukach commented Aug 9, 2017 •

edited

Loading

alukach commented Aug 9, 2017 •

edited

Loading