-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any Advice to Shorten Traffic Update Interval #5503
Comments
From previous issues, the #4449 (comment) might be a solution to achieve real time for some cases, e.g. not a extremely long route. |
One more question: what's the traffic update interval of the demo site(http://map.project-osrm.org)? |
Most of the numbers you've posted look reasonably normal, except this one:
The cell customization step is highly parallelizable, you can speed this bit up by providing more CPUs (e.g. 16 or 32 cores, if you have them available). You can also speed up the boot of Other than that - this is about how fast OSRM is. It's certainly possible to design a system that will ingest traffic faster, but the price you will pay is in query performance. If you need faster turnaround times, I'd suggest you split your dataset into smaller pieces (regions), and run them all in parallel. This assumes you don't need support for long-distance queries that may cross a partition. |
@danpat Thanks very much for the very quick reply. For the The |
@danpat and @daniel-j-h Do you think its reasonable to do customization based on delta traffic from previous version? Current OSRM-Customize strategy is:
For each cell, re-calculate all boundary nodes' cost is needed as long as there is cost change inside, but for cells with no cost change, theoretically we don't need to touch them. Based on traffic data from professional sources, about 30% of segments in the graph have traffic coverage but only very small part of them will update minute to minute. If we could add an additional command line option which deal with delta traffic it could dramatically decrease customization time. We will do some profile to see the cell influence ratio based on delta traffic first. |
Below logs were generated when start up [2019-08-02T21:28:59 UTC] + /osrm-build/osrm-customize map.osrm --segment-speed-file traffic.csv -l DEBUG
[2019-08-02T21:29:49 UTC] [info] Loaded traffic.csv with 76208175values
[2019-08-02T21:29:54 UTC] [info] In total loaded 1 file(s) with a total of 76208175 unique values
[2019-08-02T21:30:12 UTC] [info] Used 535602493 speeds from LUA profile or input map
[2019-08-02T21:30:12 UTC] [info] Used 76202107 speeds from traffic.csv
[2019-08-02T21:30:12 UTC] [warn] Speed values were used to update 76202107 segments for 'routability' profile
[2019-08-02T21:30:15 UTC] [info] Updating segment data took 18170.4ms.
[2019-08-02T21:30:15 UTC] [info] In total loaded 0 file(s) with a total of 0 unique values
[2019-08-02T21:30:33 UTC] [info] Done reading edges in 94133.6ms.
[2019-08-02T21:31:35 UTC] [info] Loaded edge based graph: 314601014 edges, 77352770 nodes
[2019-08-02T21:31:36 UTC] [info] Loading partition data took 156.536 seconds
[2019-08-02T21:37:54 UTC] [info] Cells customization took 377.949 seconds
[2019-08-02T21:37:54 UTC] [info] Cells statistics per level
[2019-08-02T21:37:55 UTC] [info] Level 1 #cells 365014 #boundary nodes 8660415, sources: avg. 15, destinations: avg. 22, entries: 187754785 (1502038280 bytes)
[2019-08-02T21:37:55 UTC] [info] Level 2 #cells 28909 #boundary nodes 1547423, sources: avg. 36, destinations: avg. 50, entries: 73984124 (591872992 bytes)
[2019-08-02T21:37:55 UTC] [info] Level 3 #cells 1822 #boundary nodes 228172, sources: avg. 84, destinations: avg. 115, entries: 25856388 (206851104 bytes)
[2019-08-02T21:37:55 UTC] [info] Level 4 #cells 60 #boundary nodes 15566, sources: avg. 173, destinations: avg. 230, entries: 3726667 (29813336 bytes)
[2019-08-02T21:37:55 UTC] [info] Unreachable nodes statistics per level
[2019-08-02T21:37:55 UTC] [warn] Level 1 unreachable boundary nodes per cell: 0.00179445 sources, 0.00194787 destinations
[2019-08-02T21:37:55 UTC] [warn] Level 2 unreachable boundary nodes per cell: 0.00861323 sources, 0.00664153 destinations
[2019-08-02T21:37:55 UTC] [warn] Level 3 unreachable boundary nodes per cell: 0.0444566 sources, 0.0290889 destinations
[2019-08-02T21:37:55 UTC] [warn] Level 4 unreachable boundary nodes per cell: 0.316667 sources, 0.283333 destinations
[2019-08-02T21:37:55 UTC] [info] Unreachable nodes statistics per level
[2019-08-02T21:37:56 UTC] [warn] Level 1 unreachable boundary nodes per cell: 0.0157199 sources, 0.0167172 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 2 unreachable boundary nodes per cell: 0.110104 sources, 0.109481 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 3 unreachable boundary nodes per cell: 0.586718 sources, 0.572997 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 4 unreachable boundary nodes per cell: 1.51667 sources, 1.5 destinations
[2019-08-02T21:37:56 UTC] [info] Unreachable nodes statistics per level
[2019-08-02T21:37:56 UTC] [warn] Level 1 unreachable boundary nodes per cell: 0.121524 sources, 0.11242 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 2 unreachable boundary nodes per cell: 0.777924 sources, 0.701685 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 3 unreachable boundary nodes per cell: 3.21515 sources, 2.90999 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 4 unreachable boundary nodes per cell: 10.8667 sources, 9.8 destinations
[2019-08-02T21:37:56 UTC] [info] Unreachable nodes statistics per level
[2019-08-02T21:37:56 UTC] [warn] Level 1 unreachable boundary nodes per cell: 0.00245744 sources, 0.00307385 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 2 unreachable boundary nodes per cell: 0.0143208 sources, 0.0148051 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 3 unreachable boundary nodes per cell: 0.0938529 sources, 0.0944018 destinations
[2019-08-02T21:37:56 UTC] [warn] Level 4 unreachable boundary nodes per cell: 0.7 sources, 0.733333 destinations
[2019-08-02T21:38:32 UTC] [info] MLD customization writing took 35.1301 seconds
[2019-08-02T21:38:46 UTC] [info] Graph writing took 14.6054 seconds
[2019-08-02T21:38:47 UTC] [info] RAM: peak bytes used: 37251137536
[2019-08-02T21:38:47 UTC] + child=133
[2019-08-02T21:38:47 UTC] + wait 133
[2019-08-02T21:38:47 UTC] + /osrm-build/osrm-routed map.osrm --mmap -a MLD --max-table-size 8000
[2019-08-02T21:38:47 UTC] [info] starting up engines, v5.22.0
[2019-08-02T21:38:47 UTC] [info] Threads: 8
[2019-08-02T21:38:47 UTC] [info] IP address: 0.0.0.0
[2019-08-02T21:38:47 UTC] [info] IP port: 5000
[2019-08-02T21:39:44 UTC] [info] http 1.1 compression handled by zlib version 1.2.8
[2019-08-02T21:39:44 UTC] [info] Listening on: 0.0.0.0:5000
[2019-08-02T21:39:44 UTC] [info] running and waiting for requests |
Just be aware - using If the files are already in the operating system's page cache, then performance should be very close to having the data in RAM already. If the files are not in the page cache, or you have other processes on the system that are causing your routing data to be evicted from the page cache, then query performance can be affected. The ideal setup is one where you: a) have no swap enabled If you do all three of the above, then all file contents should be in the page cache when you start osrm-routed, and startup should be fast, and query performance should be good. One other thing to watch out for, especially in a potentially shared hosting environment, is filesystem cache evection - if you're using Docker, other containers on the same host may evict critical pages from the cache, leading to poor query performance when those pages are needed for a routing query. |
@CodeBear801 Yes - you should be able to skip re-customization of the lowest-level cells if no data has changed (these are the most numerous, and avoiding recalc would save the most time). It's been a while since I've looked at this part of the code, so I'm not sure how much change would be needed to re-read the existing cells into this part of the code. As a minor segue - depending on how ambitious you are, another approach to save a bunch of time would be to make |
@danpat Agree. Make osrm-customize a stand along daemon process could save loading time for frequent customization. Based on the profile, for each round of osrm-customize will take 10 minutes: 1 minute to load speed.csv, 3 minutes to load osrm data and 6 minutes to do customization. We plan to use OSRM to enable both live traffic + personalized information, I think quick turn around is needed for traffic accident and personalized metric. Our target is making live traffic to be ready as soon as possible, at seconds or around one minute level.
|
A simpler alternative might be to use a smaller map - have you considered slicing up North America into 1/4-sized chunks with decent overlap? Realtime traffic is generally irrelevant for long-distance routing, so if you have a fallback for long routes that doesn't update frequently, it likely doesn't matter too much. |
@danpat For #5503 (comment) and #5503 (comment), my understanding is that whatever the data in cache or not, the [2019-08-02T21:38:47 UTC] [info] IP port: 5000
[2019-08-02T21:39:44 UTC] [info] http 1.1 compression handled by zlib version 1.2.8 I agree that a clean server is necessary for ideally One more question: what do you mean for "c) download the already-processed files from a remote server"? |
@danpat For #5503 (comment), a long-distance route mostly can be split to 3 parts: Only part b) may irrelevant with realtime traffic. Both a) and c) require realtime traffic for better route. Even for part b), we still possible to get better highway entrance and exit with realtime traffic. |
Here, I meant don't run As for the 1-minute startup time - it's possible not all of the datafiles are being |
Remember that by the time the traveller arrives at part (c), the traffic data used to calculate the route is probably very stale, particularly for long routes. Generally, what you want is to consider realtime traffic for the first, say, 15-30 minutes of a route, then fade back to historical average conditions - error in travel times mean it becomes very uncertain when the driver will actually be performing the more distant parts of the route. Currently, OSRM does not support doing this, although we have plenty of historical discussion about it: |
@danpat, thanks for your suggestion, let me confirm my understanding.
|
@danpat You're right, part c) is also irrelevant with real time traffic. Real time traffic is important only for part a), i.e. mostly first 15 ~ 30 minutes, or maybe 2 ~ 3 hours at most. The proposal "slicing up North America into 1/4-sized chunks with decent overlap" should work for most of short/middle routes. The problem is that we may have to setup maybe a dozen of services based on these slicings, it costs resources but should work! Another problem is that we still hope to improve the part a) of a long route by real time traffic. As I metioned here(#4449 (comment)), is it possible to let OSRM support query on subgraph restricted by some segements? Moreover, thanks very much for the sharing, the paper(https://arxiv.org/pdf/1606.06636.pdf) is really interesting and looks good for rush hours. In OSRM there might be a simple way to inject it for MLD, i.e. import the historical speeds by |
Thanks for the https://github.com/Project-OSRM/osrm-backend/wiki/Traffic, we have succeed to integrate our traffic data into OSRM.
Current Situation:
osrm-customize
costs about 10 minutes andosrm-routed
costs about 2.5 minutes. (AWS r5.2xlarge, i.e. 8 cpus, 64GB memory). It makes sense since both of them process the whole North America data.Improve Target:
osrm-customize
only processes necessary part instead of the whole data, e.g. several cells? We can provide delta traffic data(maybe only 1% per 2-3 minutes), if it's possible we think theosrm-customize
can achieve a significant improvement.Many thanks in advance!
The text was updated successfully, but these errors were encountered: