Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/refactor area calculation #1699

Merged
merged 2 commits into from
Aug 9, 2017
Merged

Conversation

alukach
Copy link
Contributor

@alukach alukach commented Aug 3, 2017

Proposed changes in this pull request

This PR resolves the area calculation issues brought up in #1689.

As mentioned by @oliverroick, there are two changes that need to be made:

  1. Recalculate the area field for all currently existing location instances in our DB.
  2. Update how the area field will be calculated for any location instances saved in the future.

To address the first question, a new migration has been added to instruct the DB correctly calculate the area value for all rows with geometry type of POLYGON or MULTIPOLYGON.

        migrations.RunSQL(
            "UPDATE spatial_spatialunit SET area = ST_Area(geography(geometry)) "
            "WHERE geometrytype(geometry) LIKE '%POLYGON';",
            reverse_sql=migrations.RunSQL.noop
        )

This will overwrite the currently existing area values. It should be decently performant, faster at least than calculating this data in Python.

To address the second question (calculating area for new SpatialUnit rows), I deleted the currently existing pre_save signal and moved the logic to the DB as a DB trigger. This was done because of the fact that calculating the area via Python produced slightly different values than via PostGIS:

In [1]: from functools import partial
   ...:
   ...: from django.db import connection
   ...: from shapely import ops
   ...: from shapely.wkt import loads
   ...: import pyproj
   ...:
   ...:
   ...: WKT = 'Polygon ((24.92122528665211334 60.14935271120948812, 24.92222637615321901 60.14908991700298202, 24.92152698485792683 60.14840732483580865, 24.92051903857941753 60.14867353746650736, 24.92122528665211334 60.14935271120948812
   ...: ))'
   ...:
   ...: geom = loads(WKT)
   ...: project = partial(
   ...:     pyproj.transform,
   ...:     pyproj.Proj(init='EPSG:4326'),
   ...:     pyproj.Proj(proj='aea', lat1=geom.bounds[1], lat2=geom.bounds[3])
   ...: )
   ...: geom = ops.transform(project, geom)
   ...: print('pyproj ', geom.area)
   ...:
   ...: with connection.cursor() as cursor:
   ...:     sql = "SELECT ST_Area(geography(ST_GeomFromText(%s)))"
   ...:     cursor.execute(sql, [WKT])
   ...:     results = cursor.fetchone()
   ...:     print('postgis', results[0])
   ...:
pyproj  5383.540188133147
postgis 5383.5162784872

This would mean that the values calculated via DB migration would shift slightly the next time that the SpatialUnit instance was saved. The advantage of using a trigger is that we ensure that the area will be calculated even when using the bulk_create and update methods. The downside is that the DB-calculated value is not set on the object when the object is saved. To get around this, I wired up a post_save signal to retrieve this value and add it to the instances area field if it should have been generated.

Other changes: For some reason, MultiPolygons didn't have their area calculated. I've updated the code to support that geometry type.

The SpatialRelationshipManager was checking if one SpatialUnit contained another. This was done via a DB query. Using geography=True on the SpatialUnit.geometry field removed the ability to query via contains. I did a quick refactor to do the containment check in Python via GEOS.

When should this PR be merged

No dependencies.

Risks

None to mention.

Follow-up actions

[List any possible follow-up actions here; for instance, testing data
migrations, software that we need to install on staging and production
environments.]

Checklist (for reviewing)

General

  • Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
    • Review 1
    • Review 2
  • Is the PR labeled correctly? It should have the migration label if a new migration is added.
    • Review 1
    • Review 2
  • Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
    • Review 1
    • Review 2

Functionality

  • Are all requirements met? Compare implemented functionality with the requirements specification.
    • Review 1
    • Review 2
  • Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
    • Review 1
    • Review 2

Code

  • Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
    • Review 1
    • Review 2
  • Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
    • Review 1
    • Review 2
  • Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
    • Review 1
    • Review 2
  • Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
    • Review 1
    • Review 2
  • Has the scalability of this change been evaluated?
    • Review 1
    • Review 2
  • Is there a maintenance plan in place?
    • Review 1
    • Review 2

Tests

  • Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
    • Review 1
    • Review 2
  • If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
    • Review 1
    • Review 2
  • If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
    • Review 1
    • Review 2

Security

  • Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
    • Review 1
    • Review 2
  • Are all UI and API inputs run through forms or serializers?
    • Review 1
    • Review 2
  • Are all external inputs validated and sanitized appropriately?
    • Review 1
    • Review 2
  • Does all branching logic have a default case?
    • Review 1
    • Review 2
  • Does this solution handle outliers and edge cases gracefully?
    • Review 1
    • Review 2
  • Are all external communications secured and restricted to SSL?
    • Review 1
    • Review 2

Documentation

  • Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
    • Review 1
    • Review 2
  • Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
    • Review 1
    • Review 2
  • Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.
    • Review 1
    • Review 2

@alukach
Copy link
Contributor Author

alukach commented Aug 3, 2017

I've been testing this PR against a polygon drawn about this rectangular building: https://www.openstreetmap.org/#map=17/60.14876/24.92242&layers=TN

image

Polygon ((24.92122528665211334 60.14935271120948812, 24.92222637615321901 60.14908991700298202, 24.92152698485792683 60.14840732483580865, 24.92051903857941753 60.14867353746650736, 24.92122528665211334 60.14935271120948812))

@amplifi
Copy link
Contributor

amplifi commented Aug 4, 2017

This looks ok so far -- we should test in isolation on staging (migration, trigger changes, calculation).

I'll leave it to @oliverroick to decide whether we revert the original area commit, test the whole feature in staging, then do a patch release next week, or push the sprint release date by a day or two to accommodate the additional testing.

@oliverroick
Copy link
Member

I'll leave it to @oliverroick to decide whether we revert the original area commit, test the whole feature in staging, then do a patch release next week, or push the sprint release date by a day or two to accommodate the additional testing.

Let's push the release by a few days and test it next week.

I also added another commit, that fixes the failing tests. Since we're using geographies now, some spatial lookups are not supported anymore. You can cast the geography back to a geometry and run the spatial lookups; which is what I did.


location = duplicate.spatial_units.annotate(
geom=Cast('geometry', GeometryField())
).get(geom=geom, **attrs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for the fixup! I was struggling to understand where the ~= operator was being used. It makes sense that it was this .get call.

@alukach
Copy link
Contributor Author

alukach commented Aug 4, 2017

Btw, here's a little helper to run from the command line to verify the triggers were installed on your db:

sudo su postgres -c "
    psql -c \"
        SELECT event_object_table
              ,trigger_name
              ,event_manipulation
              ,action_statement
              ,action_timing
        FROM  information_schema.triggers
        WHERE event_object_table = 'spatial_spatialunit'
        ORDER BY event_object_table
             ,event_manipulation;
    \" cadasta "
(env)vagrant@vagrant-ubuntu-trusty-64:/vagrant/cadasta$ sudo su postgres -c "
>     psql -c \"
>         SELECT event_object_table
>               ,trigger_name
>               ,event_manipulation
>               ,action_statement
>               ,action_timing
>         FROM  information_schema.triggers
>         WHERE event_object_table = 'spatial_spatialunit'
>         ORDER BY event_object_table
>              ,event_manipulation;
>     \" cadasta "
 event_object_table  |      trigger_name      | event_manipulation |          action_statement          | action_timing
---------------------+------------------------+--------------------+------------------------------------+---------------
 spatial_spatialunit | calculate_area_trigger | INSERT             | EXECUTE PROCEDURE calculate_area() | BEFORE
 spatial_spatialunit | calculate_area_trigger | UPDATE             | EXECUTE PROCEDURE calculate_area() | BEFORE
(2 rows)

@alukach
Copy link
Contributor Author

alukach commented Aug 8, 2017

@oliverroick @amplifi any opposition if I simplify that signal a bit?

@seav
Copy link
Contributor

seav commented Aug 9, 2017

I've done some area testing in Helsinki and Singapore and the areas are comparable to the areas computed using QGIS. I get a difference in area of −1.3% in Singapore (~1°N) to +0.6% in Helsinki (~60°N). The difference is probably caused by using the spherical version of ST_Area instead of the more accurate but slower spheroidal version (see PostGIS docs). I wanted to try using the spheroidal version but this requires PROJ.4 version 4.9 and up but Ubuntu only supports version 4.8 in Trusty.

I guess this is OK as it is. We just need to be clear to our partners that the area values should be considered only approximate and not to be taken as gospel truth.

Copy link
Member

@oliverroick oliverroick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on staging and it should be good to go from my end.

@oliverroick
Copy link
Member

@alukach

any opposition if I simplify that signal a bit?

Let's do that after the code is shipped.

@alukach
Copy link
Contributor Author

alukach commented Aug 9, 2017

@seav Thanks for the testing. Reading through the link you've added:

For geography, by default area is determined on a spheroid with units in square meters. To measure around the faster but less accurate sphere, use ST_Area(geog,false).

We are currently doing calculations on a geography and not turning off the spheroidal calculation, so we are currently returning the most accurate calculation via spheroid (not sphere) that we can do via PostGIS tooling at this given time, right? Would you agree with this statement?

The only way we could improve it would rely on the next bit of docs:

Enhanced: 2.2.0 - measurement on spheroid performed with GeographicLib for improved accuracy and robustness. Requires Proj >= 4.9.0 to take advantage of the new feature.

We should make sure that we're using the latest/greatest version of PostGIS (and thus Proj) to ensure that the calculation is as accurate as possible.

@alukach
Copy link
Contributor Author

alukach commented Aug 9, 2017

And regarding the slight discrepancies, I would note that I never got a unified result for area calculation throughout my experimentation/development of this feature. As shown in the first comment, PostGIS and Python had slightly different results. QGIS also had slightly different results. Definitely not as bad as ~1%, but they all varied slightly. My guess is that each tool employs slightly different algorithms and the results show this.

I am not sure which one is more correct than others, however if I were to place a bet, I'd put it on the PostGIS/Proj solution.

@seav
Copy link
Contributor

seav commented Aug 9, 2017

We are currently doing calculations on a geography and not turning off the spheroidal calculation, so we are currently returning the most accurate calculation via spheroid that we can do via PostGIS tooling at this given time, right? Would you agree with this statement?

Based on the documentation, I guess PostGIS should be using spheroidal calculation. Anyway, I did some more testing and I took your branch, added false to the ST_Area() function call and I got different values which does indicate that the existing code uses the spheroidal method. So I guess this PR should be OK. I also agree that we should upgrade dependencies when we have the chance.

@amplifi amplifi force-pushed the bugfix/refactor-area-calculation branch from 2f299e7 to 511c4dc Compare August 9, 2017 18:01
@alukach
Copy link
Contributor Author

alukach commented Aug 9, 2017

Okay, #1709 is available if we want to simplify the signal slightly.

@amplifi amplifi merged commit eed4943 into master Aug 9, 2017
@oliverroick oliverroick mentioned this pull request Aug 16, 2017
46 tasks
@oliverroick oliverroick deleted the bugfix/refactor-area-calculation branch September 27, 2017 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants