Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance idea: squeezing down file size #746

Open
dabreegster opened this issue Sep 2, 2021 · 8 comments
Open

Performance idea: squeezing down file size #746

dabreegster opened this issue Sep 2, 2021 · 8 comments

Comments

@dabreegster
Copy link
Collaborator

For the bike network tool, ultimately I'd like to work on and present some vision at a large scale -- like all of North or South Seattle (50-60MB right now, or 20 gzipped) or the whole area (270MB, or 100 compressed). The Map files are gigantic, but I have some ideas for paring this down and creating something smaller, containing less detail than what's needed for traffic simulation...

@dabreegster
Copy link
Collaborator Author

Current stats on huge_seattle:

[2021-09-02T01:41:13Z INFO  map_model::map] Total map size: 280,097,936 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 1 pathfinder: 111,774,757 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 217,197 buildings: 62,902,080 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 36,170 intersections: 48,813,308 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 50,299 roads: 28,934,342 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 245,033 lanes: 19,128,378 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 3,095 parking lots: 2,627,031 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 3,946 areas: 1,852,995 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 436 zones: 17,520 bytes

and north_seattle:

[2021-09-02T01:42:00Z INFO  map_model::map] Total map size: 54,725,302 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 1 pathfinder: 20,412,835 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 52,069 buildings: 15,263,206 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 6,457 intersections: 8,918,260 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 9,047 roads: 5,251,766 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 45,872 lanes: 3,482,380 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 603 parking lots: 545,807 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 436 areas: 204,548 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 68 zones: 2,900 bytes

Some fruit practically scrapes the ground. The contraction hierarchies are huge, but they're also probably not needed at all for this tool. We're not calculating lots of routes all the time. In fact, in the routing tool, I want to add a bunch of customization parameters for avoiding hills and busy roads -- and those're incompatible with CHs anyway, due to how slow rebuilding them would be. And large CHs slow down map edits. Dijkstra will serve us just fine. (We may want the CHs back for the travel demand model + mode shift analysis, but now sure how that'll work yet, and I don't think it's a main part of this tool.)

@dabreegster
Copy link
Collaborator Author

I spoke quickly. It'll take more work to squish down pathfinding. Just switching to CreateEngine::Dijkstra drops from 111MB to 44MB, which is substantial, but the result is still large. We don't need 6 graphs for the bike tool -- we just need the bike one.

Making edits will takes 10.5s in huge_seattle; even without updating the CH node ordering, calculating the graphs and edge weights is quite slow in such a huge map.

fn make_input_graph(
looks simple, but death by a thousand paper-clips is catching up... overall this method fetches and re-fetches lanes, intersections, turns, etc many times and winds up being nontrivial.

dabreegster added a commit that referenced this issue Sep 2, 2021
…y to the bike network tool. #746

Just a modest start, switching from CHs to Dijkstra's for pathfinding. Not uploading the minified result yet.
@dabreegster
Copy link
Collaborator Author

Better... with just the bike Dijkstra graph, huge_seattle is down from 270MB originally to 170. The pathfinder is now only about 5MB, not 111MB. make_input_graph could still use performance tuning -- that would help editing in the main game too, not just this tool.

@dabreegster
Copy link
Collaborator Author

With just the bike graph, recalculating pathfinding down from 10s to 3ish. But that's still absurd!

dabreegster added a commit that referenced this issue Sep 2, 2021
dabreegster added a commit that referenced this issue Sep 2, 2021
dabreegster added a commit that referenced this issue Sep 3, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
…ds. #746

Somewhat invasive API change internal to map_model, but not much impact
elsewhere.

Not regenerating anything yet.
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
…ds. #746

Somewhat invasive API change internal to map_model, but not much impact
elsewhere.

Not regenerating anything yet.
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
dabreegster added a commit that referenced this issue Sep 6, 2021
@easbar
Copy link
Contributor

easbar commented Sep 12, 2021

overall this method fetches and re-fetches lanes, intersections, turns, etc many times and winds up being nontrivial.

Not sure if you already decided against using CHs here, but maybe here it would help to be able to just update the weights of an input_graph instead of 'building' the entire graph again. Not sure though, because this would just mean we replace a list of (from, to, weight)s with a list of just weights.

@dabreegster
Copy link
Collaborator Author

Not sure if you already decided against using CHs here, but maybe here it would help to be able to just update the weights of an input_graph instead of 'building' the entire graph again

Luckily, #748 pretty much resolved this problem -- there was lots of redundant work happening when calculating the vehicle input graphs. Now this process is fast enough. I'm not sure updating the graph vs building from scratch would make much of a difference; the expensive thing is calculating each edge's weight.

Also, this issue is about a newer side-project to create a bike network planning tool (previewable at abstreet.s3-website.us-east-2.amazonaws.com/0.2.58/abstreet.html?--ungap&system/us/seattle/maps/udistrict.bin). There's no traffic simulation, so the usage patterns for pathfinding are very different. I'm still exploring the idea, but possibly the user will have a few sliders where they can tune parameters that make bike routes avoid hills or avoid stressful roads. Only one or two routes would be calculated using these new weights. So probably, just Dijkstra's makes much more sense here. But that's not known for sure -- I might discretize the routing preferences into just a few options, after doing some of the parameter tuning myself and finding values that produce subjectively good routes in some example areas I know well.

@dabreegster
Copy link
Collaborator Author

Pathfinding, including when editing the map, is now quite fast, even on the huge map. (One of the last perf bugs was actually unrelated to pathfinding, 9803558).

Back to the original goal -- how much can we squeeze down huge_seattle? Some measurements running natively (it'll be slower on the web).

As a baseline, 256MB filesize and about 9.5s on my fast machine to initially load.

Buildings

After stripping out some of the CHs that aren't used for the bike tool and removing all buildings, 100MB and only 4.6s! Only 34MB when gzipped! There really are lots of buildings in the huge map. What do we really need them for in the bike net tool? Display (to give context to the map), routing (right now start and end points are buildings, but we could maybe relax that to just snap to intersections), and mode shift (the input path requests are in terms of buildings, which then map over to road positions). One idea I've been toying with is to "summarize" buildings in one block. Originally we have
Screenshot from 2021-10-07 09-35-32
But we could "compress" this to one building
Screenshot from 2021-10-07 09-35-32
and just record the total number of commercial and residential units here. This would be particularly relevant for super-dense places like Sao Paulo:
Screenshot from 2021-10-07 09-38-19
Routing might be odd with one mega-building per block, because the building will snap to one of the 4 surrounding roads kind of by chance, and this'll force all the routes to look odd. So maybe routing by intersection endpoints would need to happen anyway.

Not sure this is doable in the short-term, but it's an idea to hold onto.

Lanes

Of the 4.5s to load the map in the previous section, 1.5s is building the quadtree! The number of lanes dominates easily -- about 5x the number of roads. We reorganized lanes to exist and be identified just as children of roads kind of recently, and it's much nicer code. But that change hasn't been done in the DrawMap layer yet -- we have separate DrawRoad and DrawLane objects. It would make way more sense to nest lanes under roads here too. When the quadtree sees a road in view, we can lazily expand to the few lanes it contains.

A quick test of what omitting the lanes from the quadtree would be like... yeah, 3.1s! The slow steps are then just deserializing the 100MB file and building the massive GeomBatch of unzoomed roads and intersections.

This is totally worth doing in the short term.

@dabreegster
Copy link
Collaborator Author

Turns

Let's get even crazier. Of the 100MB from the last section, intersections take half that space. Turns are nested underneath intersections now. Do we really need to save them? We recalculate them when we edit roads anyway, so... what if we didn't stash them in the file and just lazily calculated them?

60MB file, 2.5s to load. 20MB gzipped. Wow!

Optimizing for initial loading experience is important because of first impressions, but of course that has to be balanced with actually using the tool. How quickly would we need to fill out turns, and would it all happen at once (thus incurring a slow step shortly after initial load)? Really need to just try it out to know for sure, but I think the first time the user tries to route or quick-sketch, we'll need everything.

Lanes

There's similar redundancy with roads and lanes. We have the road center-line; we could lazily fill in lane center-lines from that, instead of serializing.

dabreegster added a commit that referenced this issue Oct 8, 2021
by squeezing some start-up performance of huge maps. #746

Shaving at least 1 second off quadtree initialization in huge_seattle
from this, no noticeable effects otherwise.
dabreegster added a commit that referenced this issue Oct 11, 2021
network tool reasonably on the web. #743, #746

I'm declaring the budget to be 20MB gzipped map files.

- north and south seattle boundaries extended a bit
- central seattle added
- stripping out unused pathfinding data for walking and transit to
  squeeze down the size. avoiding crashes for empty pathfinding -- if
  you try to simulate a minified map, most trips will just fail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants