Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use boost d-ary heap for MLD routing #3816

Merged
merged 4 commits into from
May 2, 2017
Merged

Use boost d-ary heap for MLD routing #3816

merged 4 commits into from
May 2, 2017

Conversation

oxidase
Copy link
Contributor

@oxidase oxidase commented Mar 13, 2017

Issue

For #3764 can be used boost::heap::d_ary_heap as a 4-ary heap. The PR adds SearchHeap that internally uses boost::heap::d_ary_heap with boost::heap::arity<4>. In general on a small dataset like bayern-latest the difference between 2-ary and 4-ary heaps is not statistically relevant, but the difference wrt to BinaryHeap is for 1000 queries http://localhost:5000/route/v1/driving/11.575395,48.137132;9.932966,49.792450

	Welch Two Sample t-test

data:  before$V1 and after$V1
t = 13.672, df = 1274.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.001313073 0.001753019
sample estimates:
  mean of x   mean of y 
0.010305615 0.008772569 

Data summary of MLD routing with BinaryHeap (red on histogram) is

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.007715 0.008152 0.008602 0.010310 0.011940 0.039360 

and for the new SearchHeap (blue on histogram) is

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.007814 0.008083 0.008232 0.008773 0.008688 0.014980 

heaps

Tasklist

  • add regression / cucumber cases (see docs/testing.md)
  • review
  • adjust for comments

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

BOOST_ASSERT(WasInserted(node));
auto reference = index.find(node)->second;
reference->weight = weight;
heap.increase(reference->handle, std::make_pair(weight, reference));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it implements a max heap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, greater instead of less and increase instead of decrease

@daniel-j-h
Copy link
Member

👍

What about our existing heap usages e.g. in the CH code?
I think we have some Map and Array storage for heaps in use somewhere.

@oxidase
Copy link
Contributor Author

oxidase commented Mar 13, 2017

It is possible to use boost heaps, but i don't want to make a performance regression for CH algorithms, as performance depends on memory access patterns. Basically for CH-routing only UnorderedMapStorage is used, but in contractor also XORFastHashStorage is used.

@oxidase oxidase force-pushed the mld/use-boost-heap branch 2 times, most recently from d9f3322 to 66b4a40 Compare March 14, 2017 08:22
@oxidase oxidase mentioned this pull request Mar 14, 2017
5 tasks
@oxidase oxidase mentioned this pull request Mar 26, 2017
5 tasks
@oxidase oxidase force-pushed the mld/use-boost-heap branch from 66b4a40 to ca3a034 Compare April 24, 2017 13:43
@oxidase
Copy link
Contributor Author

oxidase commented Apr 24, 2017

Rebased to the current master and checked performance with #3857 The current implementation of the binary heap is nearly the same as boost::heap::d_ary_heap
compare_heaps

@TheMarex as the 4-ary heap does not bring significant performance improvements, is it still needed? I see only benefit in using generalized library code instead of a custom version

I will continue with checking variants for DataIndex

@TheMarex
Copy link
Member

TheMarex commented Apr 24, 2017

@oxidase I suspect the current runtime is dominated by access to the DataIndex hence we don't see any improvements.

A comparison with ArrayIndex might be worthwhile. However if we already pay the price for a full array of size |V| for each heap we might actually go a step further and remove the indirection altogether.

Instead of having a separate index for translating global NodeID to a heap internal (continuous) ID and then use that for indexing the data, we could have a fixed pre-alloced array for that which is indexed by NodeID directly.

That said this would be a bit of work again and would yield an implementation with more memory usage. I'm fine with tabling again, since this is definitely to the category quick win anymore.

@oxidase oxidase force-pushed the mld/use-boost-heap branch from ca3a034 to 1f58fbe Compare April 24, 2017 18:30
@oxidase
Copy link
Contributor Author

oxidase commented Apr 25, 2017

@TheMarex ArrayIndex is somewhat not suitable already on Germany-size graphs, because almost all time spent in memset_avx2_erms by clearing heaps. I have reused generation arrays and here some results
compare_hashes

for the MLD algorithm and germany-latest OSM file

I think it is ready for review, but it brings almost no value, so it is ok to reject the PR as not needed.

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for merging this, because at least we don't have our own heap implementation anymore.

But we should only use one heap implementation in the code base. So we should remove query_heap.hpp and port the CH code to use this implementation.

This of course requires a benchmark of CH on old/new just to make sure nothing changed.

@oxidase
Copy link
Contributor Author

oxidase commented Apr 28, 2017

@TheMarex after lengthy performance investigation i think it is better to close PR without merging, because it is brings no performance improvement but increases memory consumption in places like in master const Key key = static_cast<Key>(heap.size()); with Key = uint32_t and in branch reference->handle = heap.push(std::make_pair(weight, reference)); with handle = void *

What i can do:

  • remove heap[0] element
  • make CheckHeap optional

@oxidase oxidase force-pushed the mld/use-boost-heap branch from 054c058 to 87b6e19 Compare May 2, 2017 08:42
@oxidase oxidase force-pushed the mld/use-boost-heap branch from 6d63a7d to 3b8742c Compare May 2, 2017 11:20
@oxidase
Copy link
Contributor Author

oxidase commented May 2, 2017

@TheMarex please could you check the PR? Summary:

  • removed heap[0] element
  • removed CheckHeap
  • heap changed to boost::heap::d_ary_heap, but any mutable heap structure can be used
  • renamed BinaryHeap to QueryHeap

I did not change the index storage as it is performance critical and could not find a better solution than the current one with XORFastHashStorage. The performance with 4-ary heap is the same as with BinaryHeap compare_hashes3

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@TheMarex TheMarex merged commit 07c7cb3 into master May 2, 2017
@TheMarex TheMarex deleted the mld/use-boost-heap branch May 2, 2017 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants