-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#5325 Store and show way ids #5968
base: master
Are you sure you want to change the base?
Conversation
@nnseva Thank you for your contribution :) Let's wait for more opinions before merging. I'm also curious, have you estimated, what is the dataset size change? |
The unpacked dataset size should be increased to 8*N bytes, where N is a number of indexes in the segments file. The segments file will contain an additional section I didn't estimate the real packed dataset file size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some minor code comments.
Not sure if it matters for your use-case, but you'll currently get duplicates returned in the annotation if the way can't be fully compressed (e.g. one of the nodes is an intersection).
For example:
Scenario: way annotations
Given the query options
| annotations | true |
Given the node map
"""
a--b--c
|
d
"""
And the ways
| nodes |
| abc |
| bd |
When I route I should get
| from | to | route |
| a | c | abc,abc |
The response will contain:
"ways": [ 5, 5 ],
Deduplication could be performed client-side as an alternative.
With regards to data size, you can get an idea of segment count by looking at the R-tree segs
column from this spreadsheet, as it should be almost identical.
For some of the bigger map extracts, unpacked size will be:
- Germany ~250MB
- North America ~2.3 GB
- Europe ~2.5 GB
I don't have numbers on hand for how much RAM is currently needed for routing on those extracts, it's probably adding a few %?
In fact, we'll get this even if the way is fully compressed. If a way contains N edges, we'll get that way returned N times in a row in the annotation. |
Sure, every pair of sequentially neighbor nodes will have a separate member in the ways array of the annotations:
(meaning that nodes 1, 2, 3 all in the way 5 and connected sequentially). Such a duplication allows recognizing the path structure without additional analysis of the OSM structure. The same role plays an additional sign of the returned way ID - positive means that the path segment direction corresponds to the way direction, while negative means that the path direction is opposite to the way direction. I.e. the response for the opposite direction request with the same OSM data might look like:
I should note also that the current code returns annotations always containing all nodes along a found path, without any compression. I've only added way IDs (with direction sign) between every sequential node pairs in the returned node sequence. |
I just did an extract/contract pass on the latest germany pbf using Given the impact here and the fact that RAM consumption has always been a touchy subject (often a bottleneck when using OSRM), maybe we could further investigate the impact? I can report some figures on a bigger extract such as Europe when the preprocessing is done. Maybe a silly question (I did not look up the implementation): would it be possible to make this change optional? There are definitely scenarios where this is super handy, but in other situations it would be just a pain to require extra RAM for an unused feature. UPDATE 2: RAM consumption usage for |
Given this, I'm inclined to agree that this should be stored as a separate file that can be optionally loaded. |
I should notice that the memory used for the additional segment data (Way IDs vector) is really included in mem-mapped storage (see https://github.com/Project-OSRM/osrm-backend/pull/5968/files#diff-ac92f7a34eb9d5b8b41e6a930db0f2e7356af695724842b390032f1e10842dc2R138). I'm not sure if we can rely on measuring memory consumption done this way. |
Hi @nnseva , A try to create a docker image with your PR but when it compile and always have the same error. c++: fatal error: Killed signal terminated program cc1plus how to fix the compilation error? Regards |
Hi @eveyeti ,
I don't know exactly, but usually such a problem means not enough memory to compile. Try to increase memory and/or reduce number of processor cores available for the docker in your dockerfile. Check your dockerfile against an OSRM master branch. |
I've implemented skipping osm way ids in memory. The extractor now can skip storing OSM Way IDs, the If the OSM Way IDs section is skipped, the router will skip showing them in annotations. The correspondent data section is not filled (the 0-size vector is created instead) and doesn't spend memory space. |
Since this feature requires more memory than previous and not required in main use case. It is not preferable to have an option to enable it, rather than one to disable it? |
|
I agree that this should be opt-in rather than opt-out (that was the initial proposal in #5325).
That's a fair point. However, there is interest in an alternative file loading mechanism that reduces memory footprint without a performance penalty (#5974, #5879, #5838). In which case, I can see OSM nodes also being an optionally loaded dataset in the not too distant future. |
That's a very interesting prospect indeed, which goes beyond the case of this PR. @mjjbell as you pointed out there are already a few tickets about various specific aspects of this question, maybe we should open a meta-ticket on the core principle of a new loading mechanism? |
Hi @nnseva thanks a lot, it works your advice! Regards |
I would like to notice that there is some open future in requirements to annotate nodes and segments along the found route. Looking for this future it might be appropriate to have a customizable annotations feature that might have an influence on memory consumption. This feature should allow customizing not only values (as it is now available) but keys of annotations. Such a customizable annotation feature should allow customizing a set of available annotation keys, as well as some customizable procedures to get values for such annotations. Such a customizable procedure may be a scripting procedure calling on the extraction phase, as well as just a special open data file format associating source OSM IDs (for nodes) and OSM ID pairs (for segments) with annotated values. Having such a feature, the deployment staff will have options to control memory consumption allowing only necessary annotation keys for the deployed OSRM instance. |
Please take your attention to that the annotation key set is defined on the compilation time like: enum class AnnotationsType
{
None = 0,
Duration = 0x01,
Nodes = 0x02,
Distance = 0x04,
Weight = 0x08,
Datasources = 0x10,
Speed = 0x20,
Ways = 0x40,
All = Duration | Nodes | Distance | Weight | Datasources | Speed | Ways
}; The unsophisticated user using the default options values will probably expect that all available annotation keys including ways will be shown in case of all annotations are requested ( |
That would indeed be somehow inconsistent. The remarks about opt-in are rather a stand on a status quo position resulting from a concern on RAM requirements for standard usage. Of course you're right the same could (should?) apply to other datasets, but we have to deal with things as they are now and none of the rest is optional. I think the core question here is whether we have the ability to add a generic mechanism to avoid loading a range of datasets. If this can be done realistically mid-term, we can probably merge this and live with the temporal increase until we provide opt-out mechanisms. Probably others with more knowledge of the codebase and implications can comment on that last part? |
@nnseva I see your point. I was thinking it would it would have to return an error indicating the annotation was disabled. But if this means we're building a quasi optional data loader, then maybe it's better to design it properly to work for all optional datasets.
There is an existing ticket (#5218) that outlines the steps required. It was a while ago, but @danpat maybe you have a sense for the effort required? I don't think the data loading code has changed much since then. |
@@ -494,7 +494,7 @@ class RouteAPI : public BaseAPI | |||
return anno.datasource; | |||
}); | |||
} | |||
std::vector<uint32_t> nodes; | |||
std::vector<uint64_t> nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this really be uint64 here? In Flatbuffers they are defined as uint32
Edit:
Have to correct myself, it was redefined to uint64, why?
This PR seems to be stale. Is it still relevant? |
Issue #5325
ways
subsection in theannotations
section containing way IDs along the found path, with sign meaning direction [Storeway_id
values for all edges #5325]nodes
section of theannotations
flatbuffers output format