Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check number of zones by country - POC #2

Merged
merged 6 commits into from
Apr 6, 2018
Merged

Conversation

nlehuby
Copy link
Contributor

@nlehuby nlehuby commented Mar 1, 2018

This is a POC about quality assurance tools, as described in this issue and following up the creation of this notebook.

The branch is based on PR #1.

It produces :

  • a py.test report
    py test
  • a HTML dashboard with the results (any contribution to enhance the usability or the aesthetic would be very welcomed ;) )
    screenshot-2018-3-1 volumetric data dashboard

--

For now, it does not bring satisfaction:

  • if you test for a subset of the planet cosmogony, you got many false positives (because you have the country in your file, but not its descendants)
  • reading the whole cosmogony output in memory is not efficient and it does not scale (around 20min to process a France file)

Copy link
Contributor

@antoine-de antoine-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but seems ok.

What do we do if it's not enough ? we move toward sql-based solution ? we do those in rust ?

README.md Outdated

To compute the number of zones for each kind of zones (volumetric stats) and test them again reference values, just type:

`py.test`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it pipenv run py.test ?

Copy link
Contributor

@antoine-de antoine-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a old cosmogony planet I get:
44 failed, 182 passed, 2 skipped in 2630.95 seconds

but there seems to be lots of mistakes.

I feel like the html could be more helpful and that I'd like a way to dig into the problems afterward (interactive mode in python, pg loading to query ?) but I'm ok to do this in another PR

@@ -0,0 +1 @@
from .index import ZonesIndex, UnknownWikidataId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

try:
matched_zones = list(zones_index.iter_children(
line['wikidata_id'],
lambda z:z['zone_type']==line['zone_type']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it mandatory to filter on the zone_type ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zone_type is the only filter that is currently defined in the reference ".csv" file.
Do you suggest to make it optional to count all children of a zone ?
Or make this filter dynamic, depending on the columns in the csv ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop I was wondering if that made sense to check also all the chlid without this filter, but I'm not sure we want this and anyway it will be for another time 😉

README.md Outdated

### Compute and test against references values

You will need `python3` and a few dependancies you can install with `pipenv install`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipenv install --three will ensure the python3 version

.pytest_cache
data_volumetric.csv
data_volumetric.json
cosmogony.geojson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

@antoine-de antoine-de merged commit 33cfaac into master Apr 6, 2018
@nlehuby nlehuby deleted the test_with_pandas branch April 6, 2018 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants