Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

News about the polygon fixing effort #15

Open
joto opened this issue Mar 2, 2017 · 48 comments
Open

News about the polygon fixing effort #15

joto opened this issue Mar 2, 2017 · 48 comments
Labels

Comments

@joto
Copy link
Collaborator

joto commented Mar 2, 2017

I will be (mis-)using this issue to occasionally write about the current state of the effort to fix the (multi)polygons in OSM.

After a lot of preparation over the last half year or so (and frequent non-activity in between when I was busy doing other things), the real kickoff for this project was on February 14, 2017, when I posted the first challenges on Maproulette and made them public. Those challenges contained about 6500 self-intersecting building ways around the world split up into seven continent-sized areas. The community answered the call and got to work. Eleven days later, all of them were fixed.

While this was going on I worked on getting more challenges out the door. Next were about 1600 small landuse polygons from a single way with self-intersection I posted on February 21. They were all fixed in five days. I am amazed on how well and quickly the community responded, but there is muc more to do and not all taks will be so simple to fix. I deliberatly started with the easy tasks to get things going and give me and everybody else a chance to learn how this can work, before we start working on the bigger problems.

You can see the result of the effort in the massive drop in this graph (source):

intersection-stats-2017-03-02

That place were the number of errors was going up (February 23) was a massive import of broken data that got reverted a few days later. This shows another reason why it is good to fix those old problems: If the number of errors is much lower, anomalies such as broken imports will be seen more easily and we can fix them quickly.

On February 21 I also started another challenge: 1300 open rings from all around the world. As I write this, more than half of them have been fixed already. As these problems are (on average) much harder to fix, a slower pace then with the self-intersections was to be expected. But you can still the results of the effort in the graph (source):

open-rings-2017-03-02

I also started another challenge, to fix wrong roles on multipolygon relations. It is not doing as well as the others are, but it only has been going for a week now. And it has a lower priority, because those errors don't show up on the map directly. So it is good, mappers are working on the other tasks first. Ideally I'd still want the community to go through all those cases, because they show where the data is bad and problems often come in clusters. One error showing up in the challenge often means there are more around.

You can find all challenges and more information about how to help fix things here.

@joto
Copy link
Collaborator Author

joto commented Mar 4, 2017

By now the "Open rings" challenge is also fixed and the "Wrong role" challenge is well on its way, some people have been very industrious indeed!

Today I am posting four new challenges. Really, it is only one, but I have split it up again into four areas: Africa, the Americas, Asia + Australia, and Europe. Together these are about 2700 closed ways tagged building, landuse, or natural with self-intersections. So this is similar to some of the previous challenges, but it also includes larger polygons and some new tags. You can find the challenges here.

@joto
Copy link
Collaborator Author

joto commented Mar 9, 2017

The stats show that, after a very busy weekend, the work has slowed down some, but there is still progress. I don't know what's causing this, maybe it is just that the mappers do less work during the week. Or it maybe it is due to new problems being introduced. Over the last weeks when diving into the data I have noticed several current botched imports, some of them have already been reverted. They are not always easy to see before you know where to look. But this is a nice side-benefit of cleaning up the data: You see new problems better. Once we have fixed all the old problems, new problems will show up immediately in the graph.

I am finding those problems while digging into the data and preparing the Maproulette challenges. I am using a mixture of software: C++ programs I have written (osm-area-tools) create a list of all problems. I am using osmium-tool and other programs to do more ad-hoc filtering of OSM data (for instance to get all problems affecting specifically tagged objects). Then everything gets importet into PostgreSQL and then I look at the data in QGIS. This allows me to inspect the data "from all sides". And I can easily generate images like this:

heatmap

This is a part of Europe with duplicate nodes in red and a heatmap of the same duplicate nodes in blue. You can easily see the hotspots. A few days ago, one of those hotspots was in Prague. I contacted the local community and they have fixed the problems in a few days.

I am reaching out to other local communities as well, when I see particular problems there or just to get them involved. I'd love to get your help, too, contacting especially non-English speaking communities. I am happy to create special Maproulette challenges geared towards area problems in specific communities or do special data extracts or so.

@joto
Copy link
Collaborator Author

joto commented Mar 11, 2017

I have just rolled out another batch of challenges, this time with self-intersections in multipolygon relations. I have marked these challenges as "difficult", because some of them concern rather complex multipolygons. There are about 3300 multipolygons here and I have split up everything again, with about 200 to 600 tasks in each challenge.

@stoecker
Copy link

Let's see if JOSM start page does have some impact :-)

@joto
Copy link
Collaborator Author

joto commented Mar 12, 2017

For those of you who don't know what @stoecker is talking about. This is what the JOSM start page shows since yesterday:

josm-startup

Thanks Dirk!

@joto
Copy link
Collaborator Author

joto commented Mar 12, 2017

All the challenges I have created so far are about broken (multi)polygons. I haven't even started with old-style multipolygons yet. But that doesn't mean that others aren't busy working on that. I noticed a marked dip in the number of old-style multipolygons in the last days:

old-style-dip

That are several thousand multipolygons fixed! And if you look on the old-style multipolygon comparison map you can see where a lot of that happened. There are almost no old-style multipolygons left in Austria:

old-style-austria

Looking through the changesets, I found this is the work of nebulon42. Thanks @nebulon42.

If you want to do the same, just pick an area from the comparison map and get going!

@tyrasd
Copy link
Member

tyrasd commented Mar 12, 2017

If you want to do the same, just pick an area from the comparison map and get going!

Yes! And Michael even posted a handy overpass query to find such multipolygons on osm talk: http://overpass-turbo.eu/s/nrg (or use the bbox-version: http://overpass-turbo.eu/s/nri)

Using this query and loading the result into JOSM/Level0/… makes it much quicker to find and fix such multipolygons.

@joto maybe you can put a link to this query somewhere on the site?

@danfos
Copy link

danfos commented Mar 13, 2017

In JOSM you can use File --> "Download from Overpass API..." with the Overpass query:

[out:xml][timeout:60][bbox:{{bbox}}];
(
   relation["type"="multipolygon"](if:count_tags()==1);
);
(._;>;);
out meta;

Then select a Bounding box and "Download".

After downloading run the JOSM validator and you see see warnings for all the problems in the selected area.

@joto
Copy link
Collaborator Author

joto commented Mar 13, 2017

@tyrasd @danfos Please keep this issue for news reports and open separate issues for other things. Thanks.

@joto
Copy link
Collaborator Author

joto commented Mar 14, 2017

Several people have already started fixing old-style multipolygons and posting some tips how to approach this using Overpass queries and JOSM. I have assembled this information and added some of my own and put it into a manual.

@joto
Copy link
Collaborator Author

joto commented Mar 14, 2017

I noticed that in the last days the number of segments with the wrong role is going up. I looked into this and I think this is a side-effect of the fixing effort. Wrong roles will only be detected for multipolygons that don't have any other problems, so fixing those other problems will lead to more wrong roles being detected. Even if you are very diligent when fixing something complex as those multipolygon relations, new errors will be made and some problems will slip through. That's not really a big deal, we'll detect them and fix them later.

But still, here is a a reminder: If you are fixing multipolygons, always also check the roles and correct them.

@joto joto added the news label Mar 15, 2017
@joto
Copy link
Collaborator Author

joto commented Mar 15, 2017

A month ago I launched this effort to get the (multi)polygons fixed. Over 150 mappers have contributed so far, some with thousands and thousands of edits! Different fixes have been done, but the focus was on the self-intersections and, after this month, more than half of them have been fixed:

intersections-half-point

This is an awesome achievement!

@joto
Copy link
Collaborator Author

joto commented Mar 15, 2017

Some of the challenges posted have been quite difficult to fix. But here is an easy one that also allows mappers to help who are not that firm with relations: Ways that contain only a single node. Sometimes the same node is in the way multiple times. Some of these will be detected as polygons, because the first and last node are the same, so they are closed ways. They contain neither a proper line geometry nor a proper polygon geometry, so they need to be removed or fixed. Look at the details.

@joto
Copy link
Collaborator Author

joto commented Mar 17, 2017

The old-style multipolygon comparison map just got iD and JOSM buttons in the upper right corner that make editing the data a snap. (The buttons will turn red for a second if you are not at least in zoom level 15 or if JOSM isn't started or the remote control not available.)

And the challenges are almost all done again. I'll create some more for you soon...

@joto
Copy link
Collaborator Author

joto commented Mar 18, 2017

All challenges were done, so here is the next one. This is a bit more challenging to describe, but often pretty easy to fix thanks to the magic of JOSM. Get started at Duplicate segments in closed ways.

@joto
Copy link
Collaborator Author

joto commented Mar 18, 2017

I'll be at the FOSSGIS conference in Passau, Germany, next week. Catch me there to talk about the area fixing effort (or anything else). On Saturday there will be an OSM unconference where I am planning a session on the area fixing effort.

@joto
Copy link
Collaborator Author

joto commented Mar 21, 2017

Now that the fixing effort is making progress, I have been focussing some more on getting the word out.

I have contacted key software developers for OSM, especially the editor developers, usually through their ticketing systems. I am tracking this on our issue #23. I have also contacted several local communities and started talking with the HOT community to tell everybody about what's happening and make sure as many community members as possible are informed and involved in this process.

If you know about anybody else who should know about this effort, tell them, or tell me to tell them.

The feedback I got so far has been really positive, the only criticism I header really was that the documentation is too technical. So that is something we need to work on. I appreciate any help on that!

@joto
Copy link
Collaborator Author

joto commented Mar 29, 2017

While I was at the FOSSGIS conference you worked through all the challenges. Time to add some more. I have just added two new challenges. One is a Maproulette challenge (split up into 4 continent-sized bits) with another batch of Open Rings. This time there is no limit on the size of the multipolygons involved. Some are huge!

The other challenge is a bit different: There are many many multipolygon problems in South Korea. I have not shown them in previous challenges, because there are so many. I think it is better to fix these by going though them using the OSM Inspector. So I posted a challenge called Fixing multipolygons in South Korea.

@stoecker
Copy link

I know that's no news, but probably saves much time: I fixed some 100-1000 member polygons missing roles completely. Here a short guide how to do this in JOSM fast:

  • Open relation in editor - set all roles to outer (select all entries and enter "outer" in the box at the bottom) - close relation editor
  • Start validator - it will complain about roles which should be inner
  • Select all the related warnings and click "select" button in Validator
  • Open relation and apply "inner" role (all the elements are preselected)
  • Rerun validator to check if all is ok.

First fix all geometry related issues!

@joto
Copy link
Collaborator Author

joto commented Mar 31, 2017

There is some amazing work going on switching old-style multipolygons to new-style tagging. I have created this little movie showing the vanishing old-style multipolygons around the world. Many countries are already done!

old-style-map-animation

This movie was created from the same data you see on the comparison map overlay. It shows all nodes in all relations tagged type=multipolygon which have no other tags. So it doesn't show old-style multipolygons that have, for instance, a created_by tag or so. This was an oversight by me that I'll fix eventually. So, sorry folks, there are some more old-style multipolygons in those areas that we have to fix. But there are only about 6,000 of them or so. So nothing compared to the about 80,000 that were already fixed. (The statistics have the correct number of all old-style multipolygons.)

@wolfbert
Copy link

Nice! I'd also suggest to evaluate the number of multipoly relations without an area tag (irrespective of which other tags they carry). A rough tag list is in #17 (last entry at the moment), which should cover the vast majority of cases (more details would be in the style sheets).

@osmlab osmlab locked and limited conversation to collaborators Apr 1, 2017
@joto
Copy link
Collaborator Author

joto commented Apr 10, 2017

Sometime in the last few days we marked the half-way point of fixing the old-style multipolygons. Fixing around 120,000 multipolygons took us about a month. Lets see how fast we can do the other half!

Here is how the map looked this morning:

old-style-mps-2017-04-10

Africa is done, Australia is done, huge parts of the other continents. Thanks to everybody who is helping out here! As I mentioned before the map was only showing old-style multipolygons that have no other tags except the type tag. This isn't quite all of them. Some have a created_by tag for instance. I have now corrected the map, so it shows some more multipolygons including some huge ones, mostly in Russia. So there are some more angry red dots on the map again, sorry. But the statistic was correct, so there isn't actually more work, it only appears so. :-)

@joto
Copy link
Collaborator Author

joto commented Apr 13, 2017

The currently running "Open Rings" challenges have been going slower than other challenges before. That was to be expected because there are some tasks in there that are really hard to fix. And many that can't be fixed at all without local knowledge.

In Maproulette users can mark tasks as "Too hard" or just skip them. Unfortunately those tasks will just show up again and again (for the same user or other users) which makes it difficult to get to the new tasks. I have now deleted all tasks marks as "Skipped" or "Too hard" from the challenges. This should it make easier to get to the other tasks that nobody has seen before.

@joto
Copy link
Collaborator Author

joto commented Apr 18, 2017

By now the "Open Rings" challenge for Europe is done, but there are a few hundred more to look at in the rest of the world. Would be great if we can get through this. Just don't spend too much time on any task, if it is not immediately obvious what the solution is, mark it as "Too hard" and move on.

If you don't like the "Open rings", I have added a new challenge called Duplicate Ways. This contains all cases where the same way is in a multipolygon relation twice or more times. That's always wrong and, in JOSM at least, easy to fix. The JOSM relation editor shows these ways with a reddish background.

@joto
Copy link
Collaborator Author

joto commented Apr 22, 2017

At the recent FOSSGIS conference we had a "OSM Saturday". I hosted a workshop about the area fixing effort. A video of my talk and the following discussion (all in German) is available for download and on youtube.

@joto
Copy link
Collaborator Author

joto commented Apr 27, 2017

Here is a new challenge for you. About 3000 building ways with spikes. Those are a subset of those cases that show up as duplicate segments in the statistics. Some of them are really easy to fix, just delete one node. But some of them are more tricky.

Again, I have split up this challenge into 5 sub-challenges for different areas: Africa, Americas, Asia + Australia, Europe, and, for the first time, one extra challenge for HOT activation areas.

(I just removed all the challenges and created new ones. The new ones should not have zero-length spikes in them (which happened if there were duplicated nodes) which was confusing.)

@joto
Copy link
Collaborator Author

joto commented Apr 28, 2017

Looking at the stats today you might have noticed a huge jump in the number of duplicated segments.

duplicate_segments_stats_2017-04-28

This jump is due to me being conservative before and not counting some duplicated segments that I was not sure were actual problems. I have changed this now to better show the number of problems at the price of some overreporting.

Oh, and before anybody freaks out about the huge numbers. The numbers reported here are segments (the connection between two nodes). Because most ways contain many segments, the number of closed ways or multipolygon relations affected is much lower. About 7000 ways and 21000 relations are affected.

@joto
Copy link
Collaborator Author

joto commented Apr 30, 2017

After exactly a month the "open rings" challenge is finally finished. This has been the most difficult challenge so far and there were many cases where a fix wasn't possible because the data needed to fix it is just not there. Boundaries can't be seen on satellite images for instance. But still we got nearly half of the about 10,000 cases fixed, so I count this as a success. I think we can't do much more here at the moment, but will keep thinking about how to address this in the future, possibly by involving the local communities more.

@joto
Copy link
Collaborator Author

joto commented May 4, 2017

The old-style multipolygon relations are history! In not even two months the OSM community cleaned up all of the nearly a quarter million relations:

old-style-stats-2017-05-04

This is much faster than I (and probably everybody else) had anticipated. There are a few old-style multipolygons around, some of them have no members at all, some only relation members (which isn't allowed for multipolygon relations) and some have been created in the last days. I expect that we will get new ones occasionally from editors and/or mappers who don't know yet, that they shouldn't do that, but that's not a big problem.

Here is an animation showing how the old-style multipolygons vanish bit by bit:

old-style-map-animation

So that part of the great (multi)polygon fixing effort is done. Huge thanks to everybody involved! But there are still geometry errors to fix.

@joto
Copy link
Collaborator Author

joto commented May 7, 2017

The "building ways with spikes" challenge is finished. Now on to the rest of the closed ways with spikes. Another about 2000 closed ways that have the same kind of spikes in them.

@joto
Copy link
Collaborator Author

joto commented May 10, 2017

The old-style multipolygons are cleaned up. New ones appear occasionally, but with the release of the new iD version I expect this to drop even further soon. The OSM Inspector now shows old-style multipolygons, so you can keep cleaning those up.

I'll be retiring the comparison map shortly giving me back some much needed space on my server.

@joto
Copy link
Collaborator Author

joto commented May 13, 2017

Another challenge finished (the "spikes" challenge), so here is the next one: Inner ways with same tags as multipolygon relation. There are nearly 2000 of those each in Italy and in the Netherlands for some reason. And another 2000 or so in the rest of the world.

@joto
Copy link
Collaborator Author

joto commented May 27, 2017

The "inner ways with same tags" challenge is done. Nearly 10.000 fixes in about two weeks. Good work! The about 700 cases left shown in the statistics are mostly multipolygons that have multiple errors, we'll probably catch them in a future challenge.

Now it is time to circle back to the very first challenges we did, self-intersections, and fix the remaining problems there. First up are 1200 building ways with self-intersections, split into two challenges for Africa and the rest of the world.

@joto
Copy link
Collaborator Author

joto commented May 29, 2017

This is going quickly... The last challenge was finished today, so here is the next one: Multipolygon relations with self-intersections. Instead of splitting them up geographically, I have split them in the simpler and more difficult cases based on the number of ways involved.

@joto
Copy link
Collaborator Author

joto commented Jun 10, 2017

Fixing those pesky relations took a bit longer. Here are some more simple ways with self-intersections.

@joto
Copy link
Collaborator Author

joto commented Jun 18, 2017

The self-intersections from the challenges are done. There are still 8000 or so left, those are mostly either closed ways that aren't polygons (railway loops or so) or that are part of a relation that has multiple problems and so is very difficult to fix. I have decided to leave those alone for the time being. This needs some more work finding out what the different cases are and which are real problems and which are false-positives.

Instead we are coming to a new problem that I didn't have on my radar as a big problem, but it turns out it is. There are about 50,000 multipolygon relations where at least one way (inner or outer) has the same tags as the relation. I had exptected this only to be a problem with inner ways and we have already worked on challenges to fix those inner ways. But it turns out that the real problem are the outer ways. A large part of this is probably due to mappers putting a closed way into a multipolygon relation to add inner ways and forgetting to remove the tags from the outer ways. And there are some obvious huge import problems in Canada (15,000 cases), New Zealand (nearly 8,000 cases), and other places. Note that this is not a new problem. Even with the old code supporting old-style multipolygons, these cases were often rendered in the wrong way.

To detect these I compare the tags on the relation with the tags on all the ways. Some tags (like source) are ignored in this comparison, but apart from that the tags have to be the same. There are probably also cases where the tags have some overlap but also some differences. This might be okay (different types of forest) or might not be (one has a name and the other doesn't). But I don't check for this, because it would make this much more difficult. We'll leave this for future work.

Fixing those 50,000 relations is a huge task and I am not sure whether we can break it down into more sensible sub-cases. So I have decided to start this challenge with two subsets, Switzerland and South America, first. Both have between 600 and 700 ways in them, but of course Switzerland is much denser. Please try out these challenges and open a new ticket if you seen some patterns or notice something useful that we can use to better classify this problem.

@joto
Copy link
Collaborator Author

joto commented Jun 20, 2017

This is more complicated than expected. We have to take the tags into account. For instance highway tags on the ways themselves and on the relation might be okay, because highway tags are not area tags.

I have removed the already running challenges and replaced them with a new one. This only contains multipolygon relations with a single way member with the same tags. I would expect that in most cases the relation can simply be removed.

Once this is done I'll post more challenges for different types of tags.

@joto
Copy link
Collaborator Author

joto commented Jun 23, 2017

So, the single way member cases are fixed. On to buildings where the relation tags and way member tags are the same. This job is split up into cases in France, in Italy, rest of Europe and rest of the world.

@joto
Copy link
Collaborator Author

joto commented Jun 26, 2017

That was quick. Buildings are done. New job with relations tagged landuse=forest split up into France, Germany, Italy, and the rest of the world. Nothing from Canada or New Zealand though until we figure out what to do with those imports.

@joto
Copy link
Collaborator Author

joto commented Jul 2, 2017

Forests are done, on to scrub.

@joto
Copy link
Collaborator Author

joto commented Jul 4, 2017

Scrub was only a small challenge and it is already done. Next one is woods. Split into Russia and the rest of the world (without Canada and NZ again).

@joto
Copy link
Collaborator Author

joto commented Jul 10, 2017

The woods have been fixed over the weekend, so here is the next one: Everything tagged as water. Split up into Russia, Europe, and the rest of the world (again without Canada and NZ).

@joto
Copy link
Collaborator Author

joto commented Jul 16, 2017

You have been all pretty busy and finished the last challenge in two days, but I have been lazy and not created a new challenge since then. But here is the next one now: Same tags on member way in multipolygon relation (Riverbank).

This one is a bit different, because the tagging as waterway=riverbank is rather old and doesn't fit into our current tagging scheme for water areas any more. The more modern scheme tags river areas as natural=water and water=river and I think it makes more sense. But if you work on this challenge you have to decide for yourself whether you also want to update those tags while you are touching all those objects anyway.

@joto
Copy link
Collaborator Author

joto commented Jul 17, 2017

The ways with the same tags as the relations are now available on the OSM Inspector as a new layer called "Same tags on outer ring". You have to zoom in to at least zoom level 8 to see something in this layer. If you zoom in far enough you see the ways that have the same tags as the relation are a bit thicker, the ways that have different tags from the same relations are a bit thinner.

Thanks to @Nakaner for adding this.

@joto
Copy link
Collaborator Author

joto commented Jul 24, 2017

The riverbanks are done, so here is the next challenge: meadows.

And congratulations to the Canadians. Some diligent workers have fixed most of the broken multipolygons from CanVec imports!

@joto
Copy link
Collaborator Author

joto commented Jul 27, 2017

Meadows are done, so here is the farmland.

@joto
Copy link
Collaborator Author

joto commented Aug 8, 2017

Farmland was also done two days ago, so here is the last challenge: about 2500 ways with the same tags as the relation with various tags.

@joto
Copy link
Collaborator Author

joto commented Aug 28, 2017

Were are done! Well, we are never really done. But I am declaring this effort to be done anyway. I have written up some more words in my personal blog.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants