-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality Assurance #4
Comments
Maybe the tests could be split in three:
I’m not sure exactly why you want to have it in a separate repository. If someone wants to suggest a fix or have an alternative reality, I would still be simple, no? — or did you just mean that the configuration should be in a .yaml and not in a .rs, but still in the same repository? |
Nice categories, it seems fine for me. I think also think it's ok to put the test in the same repository (and @nlehuby too 😉 ), we just want quality tests easily maintained (so no |
Here is a proposal for a first step, only dealing with volumetric stat. Todo : In : for instance a csv file :
Out : This could be a single csv file:
|
I like the general idea. Where the data will be hosted, against what it will be tested doesn’t matter much for me (but I have a slight preference towards large mono-repos). What do you mean with the wikidata_id? The property of that level? That might become a problem as cities can be of different type (think of the German Kreisfreie Stadt). However, we could maybe add extra tests, like having 4 state_district Q202216 (département d’outre-mer) in France, as those might break easily with bad country shapes. If we want no specific constraint, we can leave the wikidata_id empty. Is that clear? |
for now, wikidata_id stands for a country wikidata id (Q142 is France). It may be extended to any zone wikidata id in the future. We could definitly use wikidata ontology to check the quality of our data. But I think your proposal adds a lot of complexity: and we may also need to map wikidata ontolology to libpostal zone type, country by country in the same way to what has be done for OSM ...
This seems possible and would add very valuable quality tests, but I really think we should start with a smaller task with no dependency to a wikidata dump ;) |
init of reference values for countries stat: osm-without-borders/cosmogony-data-dashboard#1 |
Some ideas to test the quality of our dataset:
Non closed boundaries
We need to log the list of the boundaries that could not be imported because they are not valid polygon / multipolygon
Hierarchy coherence
Coverage stat and tests
By country statistics:
Compute the geographical coverage in states, cities, etc. (example: 88% city coverage, which means that 88% of the country territory is inside a city)
Persist expected values and test them in CI:
for example:
Volumetric stat and tests
Stat:
same as below, but only raw numbers, without geographical concerns (example: Australia country has 17 states)
Test:
Expected values for each country must be in a config file (CSV, YAML ?) and not inside the code source, so that anybody can update it if needed.
The text was updated successfully, but these errors were encountered: