Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplicate edge-based-node data in RTree #3446

Closed
wants to merge 1 commit into from

Conversation

danpat
Copy link
Member

@danpat danpat commented Dec 13, 2016

Issue

The StaticRTree contains lots of duplicated data. Every road segment contains a complete copy of the edge-based-node data that it's related to.

While this is good for cache efficiency, it's quite wasteful space wise.

This PR refactors all the common data into a standalone vector. This adds an additional lookup layer for every RTree operation, but reduces the total RTree datasize by >50% (in testing so far, Texas-sized RTree data went from 42MB to 17MB).

This PR is a WIP - I'm not 100% clear on the performance impact of this change yet. I'm hoping that it's relatively minor, and the significant space savings are worth the cost.

Tasklist

  • add regression / cucumber cases (see docs/testing.md)
  • review
  • adjust for comments

@TheMarex TheMarex changed the title [WIP] De-duplicate edge-based-node data in RTree De-duplicate edge-based-node data in RTree Dec 13, 2016
@@ -15,43 +15,48 @@ namespace osrm
namespace extractor
{

struct RoadSegment
Copy link
Member

@TheMarex TheMarex Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the space savings and use a little bit of that to inline the coordinates in the struct. That would be 16 bytes more, but would solve the following ugliness:

  • Speed: It will probably be much faster, no double-indirection
  • No need for a separate std::vector<std::pair<NodeID, NodeID>> on construction
  • No cyclic dependency on datafacade
  • No dependency on the coordinates array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheMarex yeah, good idea. The first round here was to see just how much space saving there was to be had.

In theory we can cram +/- 1m precision lon/lat values into 22-23 bits each - we may be able to pack lon/lat/fwd_segment_position into 64 bits and only cost an additional 4 bytes.

@TheMarex
Copy link
Member

If you get the indexing for the EdgeBasedNode array right, we can also use it to get rid of information that we currently save per turn edge but should be saved by EdgeBasedNode, potentially unlocking even more memory/space savings.

What we would need for that is to modify the path unpacking to not create PathData objects but only return a list of EdgeIDs and a list of NodeIDs that we can then translate back to PathData.

@oxidase oxidase mentioned this pull request Jan 9, 2017
4 tasks
@TheMarex TheMarex added this to the 5.8.0 milestone Apr 10, 2017
@danpat danpat self-assigned this May 10, 2017
@oxidase oxidase mentioned this pull request May 12, 2017
5 tasks
@TheMarex
Copy link
Member

Closing this as it's being replaced by #4036

@TheMarex TheMarex closed this May 15, 2017
@DennisOSRM DennisOSRM deleted the refactor_edge_based_nodes branch November 6, 2022 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants