De-duplicate edge-based-node data in RTree #3446

danpat · 2016-12-13T17:44:15Z

Issue

The StaticRTree contains lots of duplicated data. Every road segment contains a complete copy of the edge-based-node data that it's related to.

While this is good for cache efficiency, it's quite wasteful space wise.

This PR refactors all the common data into a standalone vector. This adds an additional lookup layer for every RTree operation, but reduces the total RTree datasize by >50% (in testing so far, Texas-sized RTree data went from 42MB to 17MB).

This PR is a WIP - I'm not 100% clear on the performance impact of this change yet. I'm hoping that it's relatively minor, and the significant space savings are worth the cost.

Tasklist

add regression / cucumber cases (see docs/testing.md)
review
adjust for comments

TheMarex · 2016-12-13T18:08:13Z

include/extractor/edge_based_node.hpp

@@ -15,43 +15,48 @@ namespace osrm
 namespace extractor
 {

+struct RoadSegment


We could use the space savings and use a little bit of that to inline the coordinates in the struct. That would be 16 bytes more, but would solve the following ugliness:

Speed: It will probably be much faster, no double-indirection

No need for a separate std::vector<std::pair<NodeID, NodeID>> on construction

No cyclic dependency on datafacade

No dependency on the coordinates array

@TheMarex yeah, good idea. The first round here was to see just how much space saving there was to be had.

In theory we can cram +/- 1m precision lon/lat values into 22-23 bits each - we may be able to pack lon/lat/fwd_segment_position into 64 bits and only cost an additional 4 bytes.

TheMarex · 2016-12-13T18:11:30Z

If you get the indexing for the EdgeBasedNode array right, we can also use it to get rid of information that we currently save per turn edge but should be saved by EdgeBasedNode, potentially unlocking even more memory/space savings.

What we would need for that is to modify the path unpacking to not create PathData objects but only return a list of EdgeIDs and a list of NodeIDs that we can then translate back to PathData.

TheMarex · 2017-05-15T09:36:03Z

Closing this as it's being replaced by #4036

First pass at moving edge-based-node data out of the rtree.

9d20e97

TheMarex changed the title ~~[WIP] De-duplicate edge-based-node data in RTree~~ De-duplicate edge-based-node data in RTree Dec 13, 2016

danpat added the Work In Progress label Dec 13, 2016

TheMarex reviewed Dec 13, 2016

View reviewed changes

oxidase mentioned this pull request Jan 9, 2017

Contractor loading edges refactoring #3545

Merged

4 tasks

TheMarex added this to the 5.8.0 milestone Apr 10, 2017

This was referenced Apr 18, 2017

Change some .edges data to be indexed by node #3954

Closed

Reduce file sizes #3955

Closed

oxidase mentioned this pull request May 2, 2017

Move geometry ids, name_ids and travel_modes to EdgeBasedNodeData #3994

Merged

4 tasks

danpat self-assigned this May 10, 2017

oxidase mentioned this pull request May 12, 2017

Refactor nodes file #4036

Merged

5 tasks

TheMarex closed this May 15, 2017

DennisOSRM deleted the refactor_edge_based_nodes branch November 6, 2022 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-duplicate edge-based-node data in RTree #3446

De-duplicate edge-based-node data in RTree #3446

danpat commented Dec 13, 2016

TheMarex Dec 13, 2016 •

edited

Loading

danpat Dec 13, 2016

TheMarex commented Dec 13, 2016

TheMarex commented May 15, 2017

De-duplicate edge-based-node data in RTree #3446

De-duplicate edge-based-node data in RTree #3446

Conversation

danpat commented Dec 13, 2016

Issue

Tasklist

TheMarex Dec 13, 2016 • edited Loading

Choose a reason for hiding this comment

danpat Dec 13, 2016

Choose a reason for hiding this comment

TheMarex commented Dec 13, 2016

TheMarex commented May 15, 2017

TheMarex Dec 13, 2016 •

edited

Loading