Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup after the self-serve city import GUI #523

Open
matkoniecz opened this issue Feb 23, 2021 · 33 comments
Open

Cleanup after the self-serve city import GUI #523

matkoniecz opened this issue Feb 23, 2021 · 33 comments

Comments

@matkoniecz
Copy link
Contributor

https://a-b-street.github.io/docs/howto/new_city.html is a bit tricky and I have seen some people from OSM interested by it, but scared away by command line stuff.

I think that JOSM-like download interface (that would also reject too large areas) would be great.

@matkoniecz matkoniecz changed the title Make possible to select rectangle/area on a screen on map of the world and add this area as a custome one. Make possible to select rectangle/area on a screen on map of the world and add this area as a custom city. Feb 23, 2021
@dabreegster
Copy link
Collaborator

I've heard some requests for something like this. I think it's pretty feasible to add an in-game UI for it.

  1. First open the browser to geojson.io and tell them to draw there
  2. Somehow copy back the results (the tricky step)
  3. Use https://github.com/a-b-street/abstreet/blob/master/importer/src/bin/pick_geofabrik.rs to find the osm.pbf best matching the boundary
  4. Run osmconvert to clip
  5. Run the one-shot importer automatically

Step 2 is a bit awkward because we'd need to support copy/paste (across 4 platforms) -- there are crates for it and I think we used to use one of them, so maybe not bad. The single-line text entry box in abst today sucks and I really don't want to make a multi-line one, but we could maybe just have a button saying "grab from system clipboard" and display the result and not allow editing in the UI.

Step 3 is more what we would do for a real permanent import; it might be overkill or too large a download for a quick demo. I'm not aware of any OSM download service that'll extract the city-sized areas people will likely draw. https://protomaps.com/bundles is the closest, but we need a .osm ultimately; I don't think the bundle has enough.

Step 4 means bundling the osmconvert binary for the correct platform. Or... if that's a hassle, I don't actually think it's so hard to implement the equivalent of osmconvert large_map.osm -B=clipping.poly --complete-ways -o=smaller_map.osm in Rust with the OSM parser we've already got. Argh, I guess the complication is that we don't handle .pbf. There are crates like https://crates.io/crates/osmpbf we could use.

I think this is sounding like an awesome hackathon project. Tentative plan is to do this in two weeks at https://democracylab.org/index/?section=AboutEvent&id=sthacktricksday2021

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Feb 23, 2021

Step 2 is a bit awkward because we'd need to support copy/paste (across 4 platforms) -- there are crates for it and I think we used to use one of them, so maybe not bad. The single-line text entry box in abst today sucks and I really don't want to make a multi-line one, but we could maybe just have a button saying "grab from system clipboard" and display the result and not allow editing in the UI.

What about showing map within abstreet with bbox selection done there? Exactly like it is done by JOSM?

@matkoniecz
Copy link
Contributor Author

I'm not aware of any OSM download service that'll extract the city-sized areas people will likely draw

Limit it to a part of the city (say, like Kraków city center bundled in AB Street by default) and use Overpass API? May be not feasible if used on a large scale, but I am not expecting it done on massive scale.

@michaelkirk
Copy link
Collaborator

To make sure I'm understanding, it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

@dabreegster
Copy link
Collaborator

What about showing map within abstreet with bbox selection done there? Exactly like it is done by JOSM?
it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

This sounds hard to implement, so that's why I think we can do step 1 and 2 and just open up the browser to an existing interface that can select boundaries, and tell people to copy-paste the GeoJSON blob back. It'd be cool to embed it directly and make this even easier, but I think even with a browser step, this'd be a huge step in usability.

use Overpass API?

Maybe worth a try, but this is probably not what it's intended for. We need to select all nodes, ways, relations intersecting that bbox. We could try to get clever and filter just for certain types, but that list is kind of huge and probably hard to maintain.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Feb 23, 2021

To make sure I'm understanding, it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

Yes. No idea how hard it would be and is it reasonable. Main factor is, I guess, "can we use OSMF tile servers or other free tier of tile hosting".

Maybe worth a try, but this is probably not what it's intended for. We need to select all nodes, ways, relations intersecting that bbox.

I do it for some projects, downloading entire cities is within Overpass API range. My guess is that size limitation would be AB performance, not Overpass download capability. And for now any area runnable in AB would be downloadable with Overpass.

(no promises here, and natural=coastline is a trap here - Overpass data needs augmenting with https://osmdata.openstreetmap.de/data/water-polygons.html or some tricky processing to get seas/oceans rendered)

But if you are worried that it may be misuse of Overpass - look at their usage policy (I remember reading it - but maybe it changed since that time)

We could try to get clever and filter just for certain types

For some cases I exclude buildings, but AB is using them anyway. I would not try filtering.

AB is using most of data, throwing out unused scraps would be asking for trouble with minor benefits.

@dabreegster
Copy link
Collaborator

Sounds like the Overpass idea is worth trying, then. Closer to the hackathon, I'll break this down into parallelizable tasks, one of which could be a little tool that tries to effectively assemble a .osm from Overpass.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Feb 23, 2021

little tool that tries to effectively assemble a .osm from Overpass.

If it can be downloaded from Overpass (I am betting that any area that can be loaded and run sanely will be) you can simply download .osm file ([out:xml]).

@dabreegster
Copy link
Collaborator

Dumping more related tasks for the hackathon:

  • cache the geofabrik index file
  • rearrange data/input and carve out a single place for all the geofabrik pbfs, since many places will share. Simplify config a bit.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Feb 23, 2021

I just tested whatever what I was saying matches reality. I just downloaded 600MB+ .osm file representing city of Vienna with an Overpass query.

I used following script that used (very unimpressive) library code from https://github.com/matkoniecz/osm_bot_abstraction_layer/blob/master/osm_bot_abstraction_layer/overpass_downloader.py (basically, logging and retrying + inefficient file saving).

from osm_bot_abstraction_layer.overpass_downloader import download_overpass_query

vienna_download = """
  [timeout:250];
  (
    nwr(48.10,16.26,48.26,16.51);
  );
out body;
>;
out skel qt;
"""
download_overpass_query(vienna_download, '/tmp/vienna.osm')

Disclaimer: I am not maintainer of Overpass service, not sure how significant use AB Street would generate, not sure how significant use is OK.

View of data in JOSM (note inclusion of also object partially included in queried area, though ones without even node there can be missed):

(note: JOSM needed also metadata like version of object, so this was fetched with

[timeout:250];
(
    nwr(48.10,16.26,48.26,16.51);
);
out meta
>;
out meta qt;

)

screen06

@dabreegster
Copy link
Collaborator

Thank you for confirming Overpass is a viable option! It looks extremely straightforward for us to use it too.

I am not maintainer of Overpass service, not sure how significant use AB Street would generate, not sure how significant use is OK.

I'll double-check their policy before publicizing this tool, and check with the owner. I suspect they have rate limiting on their end for individual users, but we could also send an "A/B Street" user agent if they want to throttle across all users.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Feb 24, 2021

I suspect they have rate limiting on their end for individual users

Yes. See http://overpass-api.de/api/status Though I am not sure if they have user-agent limiting or just request to self-limit. But I expect that AB Street:

  • will not have this feature used extremely heavily
  • big chunk of that will be OSM mappers, including use for OSM data validation and other mapping help

@Alfred-Mountfield
Copy link

Another possibility for OSM data is GeoFabrik. A relatively new but really quite powerful Python library uses it for automatic data acquisition https://pypi.org/project/pyrosm/

It downloads .osm files in Protocol Buffer Binary Format and I've been using it in an automated pipeline to grab data for the whole of London (which surely is one the upper-end of areas you'd be able to handle).

The library is open-source so it might provide a nice example of how to do it:
https://github.com/HTenkanen/pyrosm/blob/master/pyrosm/data/geofabrik.py

Not sure if you could directly call the library to handle it or if you'd have to reimplement the logic in Rust but it seems like it'd be a good alternative and it might have less throttling as they're direct file downloads rather than custom queries

@dabreegster dabreegster changed the title Make possible to select rectangle/area on a screen on map of the world and add this area as a custom city. Easy self-serve city import GUI Feb 27, 2021
@matkoniecz
Copy link
Contributor Author

One more thing to consider for usecase of use for OSM mapping validation: how long mappers needs to wait after update to rerun import and see changes.

For Overpass it is enough to wait 5-10 minutes (right now replication lag is 2 minutes).

@dabreegster
Copy link
Collaborator

dabreegster commented Mar 9, 2021

Alright, the hackathon is this Sauturday. Anybody reading this issue, feel free to join for the day! Rust experience useful, but we can also spend the time mentoring -- it's more important at the hackathons to grow the community than to finish the tasks.

Design decisions

  1. Only support this running natively, not on the web. We can revisit this later, but it adds lots of complications.
  2. Execute the importer and all other tools as subprocesses from the game crate. Don't link in the code as a library. It would really bloat the build, which we've been trying to pare down.
  3. In parallel, try using Overpass and Geofabrik to get our .osm input, as discussed in the rest of this issue. We can pivot to just focus on one based on how things are going.

Tasks

These can all be done simultaneously and in any order, except for the last one.

  • Create the UI, mocking out the steps to grab from the clipboard and actually run the import. Checkboxes for driving on the left/right. Button to open browser to geojson.io, with instructions what to do.
  • Add clipboard support. We used to use clipboard, but removed it. Make sure it doesn't break on web.
  • Given a GeoJSON boundary string, run geojson_to_osmosis, then pick_geofabrik, then download the .pbf.
  • Make a new tool to clip .pbf with an osmosis boundary and produce an XML .osm. Aka, reimplement a small part of osmconvert so we can ship the binary on 3 platforms sanely. I haven't researched pbf reading crates yet.
  • Call the Overpass HTTP API, extract all raw data in a boundary, and save as XML .osm

Then glue everything together in the UI, and modify the release scripts to include whatever new binaries are needed.

Bonus tasks

If we have time/people:

  • Add a button to open a new Github issue, copying over the GeoJSON boundary the user drew and asking to permanently import the map
  • Cache the geofabrik index file in pick_geofabrik in data/input/shared
  • Reorganize all .pbf's downloaded in the import pipeline to live under data/input/shared, in a directory structure mirroring what geofabrik does. Change importer config to just refer to these relative filenames, and always assume geofabrik is our main source. The updater may have to get smarter to figure out what pbf to download, based on input DataPacks.
  • Brainstorm / prototype a tool to "update this map", for the use case of OSM validation wanting to grab their upstreamed changes

dabreegster added a commit that referenced this issue Mar 13, 2021
process. #523

Tested really quickly, might not be working right yet. The .xml output
seems to have a duplicate '</osm>' for some reason.
dabreegster added a commit that referenced this issue Mar 13, 2021
… map, in one step, by gluing together a bunch of other tools. #523

Breaks in a few places along the way, but the basic idea is there.
@dabreegster
Copy link
Collaborator

Just got cargo run --bin one_step_import -- bedfordshire.geojson to work! Lots of smoothing out needed, but that wasn't hard so far. Quick lunch break, then I'll wire up a UI.

dabreegster added a commit that referenced this issue Mar 13, 2021
screen yet. #523

Not working yet; one_step assumes paths to binaries, and it's now wrong.
dabreegster added a commit that referenced this issue Mar 13, 2021
Improved the target dir finding, and hacking around some of the command
output not rendering.
@dabreegster
Copy link
Collaborator

End-to-end import from the UI now "works". Lots of hacks along the way. First up, going to work on a sane way to run a command and display live STDOUT/STDERR in widgetry.

dabreegster added a commit that referenced this issue Mar 13, 2021
dabreegster added a commit that referenced this issue Mar 13, 2021
dabreegster added a commit that referenced this issue Mar 15, 2021
pbf. #523

But now we don't know which map to switch to in the UI!
dabreegster added a commit that referenced this issue Mar 15, 2021
…ther tool. #523

For this to work, RunCommand has to be much more careful about getting
all output when the command is done.
dabreegster added a commit that referenced this issue Mar 16, 2021
… simple progress bar. #523

Use it in the updater and all the importer tools. Only place it's not
used yet is parking_mapper and the in-game updater.  [rebuild]

Also make the RunCommand state understand the control code for erasing
the current line. It... sort of works.
dabreegster added a commit that referenced this issue Mar 17, 2021
…m the importer. #523

This forces the main importer to include tokio and propagate async a
bit. That seems worth it.

Also removed the quiet param from the download helpers; we always want
progress.
dabreegster added a commit that referenced this issue Mar 18, 2021
@dabreegster dabreegster changed the title Easy self-serve city import GUI Cleanup after the self-serve city import GUI Mar 22, 2021
@dabreegster
Copy link
Collaborator

dabreegster commented Mar 22, 2021

This is now done/released. Repurposing this bug to track cleanup tasks that came up along the way.

  • Stop calling reqwest directly in parking_mapper.
  • Stop calling reqwest directly in map_gui's updater. Maybe just need a FutureLoader.
  • Consider removing the dependency on osmconvert entirely. Need to improve performance of clip_osm first, then rerun and make sure it's equivalent.
  • Make the command runner cancellable. Need to rethink make_loading_screen to allow extra stuff.
  • Figure out web support for importing
  • Try using Overpass, which might be much faster than downloading a huge geofabrik extract
  • Maybe we don't need to handle the Osmosis boundary format at all; just natively use GeoJSON everywhere?
  • Reorganize all .pbf's downloaded in the import pipeline to live under data/input/shared, in a directory structure mirroring what geofabrik does. Change importer config to just refer to these relative filenames, and always assume geofabrik is our main source. The updater may have to get smarter to figure out what pbf to download, based on input DataPacks.

@michaelkirk
Copy link
Collaborator

Finally giving the new import screen a wirl.

I'm proposing some minor UI changes here: https://github.com/a-b-street/abstreet/pull/584/files

When running from my development build, I initially saw this screen:
Screen Shot 2021-03-23 at 10 55 44 AM

From the application log I was able to deduce I need to do a cargo build --release to compile the import binary, then the import process started to work. Probably fine for dev builds, but wanted to check - is this binary included in the release so that everything "just works"?

The import process is still running - I've swapped out because clip_osm is 16.5GB and climbing. It'd be neat to make that less of a hog.

@dabreegster
Copy link
Collaborator

From the application log I was able to deduce I need to do

Good point, this error could be more clear what you need to do. I guess I was assuming most people aren't building from source...

is this binary included in the release so that everything "just works"?

Yes, it is! I tested with the Linux .zip and it seems fine.

I've swapped out because clip_osm is 16.5GB and climbing

Turns out reimplementing this little piece of osmconvert wasn't as straightforward as I expected. OSM files (including pbf, afaict) always list all nodes, then ways, then relations. The clip tool currently works like this:

  1. Read and cache ALL nodes, no matter how far outside of the boundary they are
  2. For each way, check if ANY of its nodes are inside the boundary. Keep it if so
  3. For each relation, check if ANY of its children (nodes, ways, or other relations) have been saved. Save it if so.
  4. Crawl all ways (whoops just realized this should handle relations too) and build a list of nodes we actually use
  5. Write all of the saved nodes, ways, and relations

That first step is the memory hog. Filtering the nodes inside the boundary in step 1 doesn't work, because then a way crossing the boundary refers to nodes we haven't saved, so the final output doesn't have the node just outside the boundary, which messes up clipping and removes all border intersections.

From skimming the osmconvert source, they do lots of file rewinding. I'm still not sure all of the performance tricks they use, but I think we could adapt our approach to avoid the memory issue.

  1. Read all nodes, only caching the ones in bounds
  2. Read all ways, only caching the ones with at least one saved node
  3. Same for relations
  4. Build a set of all nodes that all of our saved ways and relations refer to
  5. Go back and re-read the file from the start. Save all nodes in the above set
  6. Write all 3

@matkoniecz
Copy link
Contributor Author

Same for relations

Note that you should have to all ways referred to by relations, what may include ways not cached in step 2.

So you will need to cache additional ways, what may trigger need to cache additional nodes (see for example multipolygon on edge of area).

You may also skip some relations, for example load only multipolygons and public transport routes (anything else?) to avoid loading entire boundary of country or entire waterway relations.

@michaelkirk
Copy link
Collaborator

So you will need to cache additional ways, what may trigger need to cache additional nodes (see for example multipolygon on edge of area).

That's a good point. I think that's a bug with the current implementation in abstreet too. Not sure how often it bites us.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Mar 24, 2021

Likely not a big issue, as it will affect only complex buildings and eyecandy that is both on map edge and placed in an unfortunate way.

Roads, footway should not be affected as pedestrian areas are not supported anyway from what I remember.

BTW, nested relations are possible - but unlikely to be used for anything AB related. Maybe some bus routes?

dabreegster added a commit that referenced this issue Apr 4, 2021
… spend more time drawing the progress than downloading, otherwise. #523
@matkoniecz
Copy link
Contributor Author

Is it possible to add it also to http://abstreet.s3-website.us-east-2.amazonaws.com/0.2.50/osm_viewer.html ?

@dabreegster
Copy link
Collaborator

Is it possible to add it also to http://abstreet.s3-website.us-east-2.amazonaws.com/0.2.50/osm_viewer.html ?

It's already there -- all of the tools share the same map loading UI. Click the map name in the top-left, then scroll down. This map loading UI is becoming overwhelming now that we have lots of countries, and Yuwen had a design somewhere for reorganizing it and making the search function more prominent -- I'll see if I can dig it up and prototype it.

Screenshot from 2021-07-12 12-38-37
Screenshot from 2021-07-12 12-38-47

@matkoniecz
Copy link
Contributor Author

On reread it seems that I missed

This only works in the downloaded version, not on web.

part

@dabreegster
Copy link
Collaborator

Ah right. It would technically be possible to make all of this work on the web, but to avoid exploding the initial .wasm size to download with the extra importer dependencies, we'd have to carve out a separate WASM app to separately fetch and execute. And maybe trying to import all in a browser is too intense, and at this point, it'd be simpler to just run a small API server somewhere with reasonable rate-limiting.

Not planning on tackling that soon.

But since you've reminded me of this issue, I'm actually going to take a shot at using Overpass instead of geofabrik + clipping, since I never actually tried that.

dabreegster added a commit that referenced this issue Jul 12, 2021
The returned XML seems to be missing lots of stuff, but it's much faster
and wasn't hard to wire up. I'll look into fixing the query...
dabreegster added a commit that referenced this issue Jul 12, 2021
…te. #523

Because it's so much faster, make it the default!
@dabreegster
Copy link
Collaborator

I switched the default to Overpass -- it was super easy to wire up, and is way faster than downloading and clipping a potentially massive PBF. This also maybe opens the door for some kind of "refresh my current map with recent data" feature.

https://twitter.com/CarlinoDustin/status/1414699376887287833

@matkoniecz
Copy link
Contributor Author

matkoniecz commented Jul 13, 2021

Ooo, thanks!

One thing that seems missing is a clear handling of Overpass refusal to serve data (say someone trying load entire USA in one go or repeated downloads resulting in exhausting quota)

429, 503, 504 error codes indicate rate limiting - so probably a clear message should be shown on them explaining that Overpass has refused to serve a query rather than generic one

@matkoniecz
Copy link
Contributor Author

BTW, has Overpass got exposed to web? From looking at diff it seems that it is not changed but it is likely that I missed relevant diff.

@dabreegster
Copy link
Collaborator

One thing that seems missing is a clear handling of Overpass refusal to serve data

Indeed. I can try testing this later to see how the error shows up. I think

pub async fn download_bytes<I: AsRef<str>>(
should plumb back the HTTP response code, and
abstio::download_to_file("https://overpass-api.de/api/interpreter", Some(query), &osm)
could check the error code and surface a better message. Might have to rearrange the error handling to plumb back the response code properly.

BTW, has Overpass got exposed to web?

No, nothing has changed here. https://github.com/a-b-street/abstreet/blob/master/importer/src/bin/one_step_import.rs isn't web-friendly. Everything could be done in memory in the browser -- downloading the osm.xml, running the importer, producing the binary map file, feeding it back to abstreet. But bundling all of the code to do that in the abstreet.wasm file would really bloat it; I'd want to keep it separate. And if it was separate, we'd have to asynchronously load and execute the extra WASM, then find some way (maybe the browser's user session storage) to plumb back the binary map file. All of that feels like it's getting complicated. The alternative is running a little server somewhere that does all of this and returns the URL of a hosted binary map.

dabreegster added a commit that referenced this issue Jul 25, 2021
…g the polygon. Without this, the map has no borders. #523, #717

I don't pretend to understand the Overpass query language, but I tried https://wiki.openstreetmap.org/wiki/Overpass_API/Advanced_examples#Completed_ways.2C_but_not_relations and it seems to work!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants