-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R-Tree tuning #4056
R-Tree tuning #4056
Conversation
f3c9d10
to
6e167ce
Compare
b665d14
to
4a7d015
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some minor cleanup comments
include/util/rectangle.hpp
Outdated
<< toFloating(rect.max_lat) << "],[" << toFloating(rect.max_lon) << "," | ||
<< toFloating(rect.max_lat) << "],[" << toFloating(rect.max_lon) << "," | ||
<< toFloating(rect.min_lat) << "],[" << toFloating(rect.min_lon) << "," | ||
<< toFloating(rect.min_lat) << "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a closing square bracket is missing or an opening one is not needed
template <class EdgeDataT, | ||
storage::Ownership Ownership = storage::Ownership::Container, | ||
std::uint32_t BRANCHING_FACTOR = 128, | ||
std::uint32_t BRANCHING_FACTOR = 64, | ||
std::uint32_t LEAF_PAGE_SIZE = 4096> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unused and can be removed, also L37-45 can be removed
static_assert(((LEAF_PAGE_SIZE - 1) & LEAF_PAGE_SIZE) == 0, "page size is not a power of 2"); | ||
static constexpr std::uint32_t LEAF_NODE_SIZE = | ||
(LEAF_PAGE_SIZE - sizeof(uint32_t) - sizeof(Rectangle)) / sizeof(EdgeDataT); | ||
static constexpr std::uint32_t LEAF_NODE_SIZE = (LEAF_PAGE_SIZE / sizeof(EdgeDataT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LEAF_NODE_SIZE
can be a template parameter similar to BRANCHING_FACTOR
without computing from LEAF_PAGE_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I left this in is so that the leaf-node groups of EdgeDataT objects are still clustered in page-sized blocks, to minimize disk reads. I guess we could make your suggested change and then just do some empirical tests to tune, but to be honest, the existing value seemed to perform best in my local tests, so I just left it alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I think there are some minor things that could improve performance.
include/util/static_rtree.hpp
Outdated
std::uint32_t is_leaf : 1; | ||
TreeIndex() : level(0), offset(0) {} | ||
TreeIndex(std::size_t level_, std::size_t offset_) : level(level_), offset(offset_) {} | ||
std::size_t level; // Which level of the tree is this node in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going smaller here would help, the size of TreeIndex
is important for the performance of the queue: The smaller this struct is the less data will be shuffled around. Going size_t
-> uint32
should work for both. (maybe even 16 bits?)
include/util/static_rtree.hpp
Outdated
{ | ||
} | ||
|
||
QueryCandidate(std::uint64_t squared_min_dist, | ||
TreeIndex tree_index, | ||
std::uint32_t segment_index, | ||
std::uint64_t segment_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 64bit?
include/util/static_rtree.hpp
Outdated
Vector<TreeNode> m_search_tree; | ||
// We use a const view type when we don't own the data, otherwise | ||
// we use a mutable type (probably becase we're building the tree) | ||
using treeviewtype = typename std::conditional<Ownership == storage::Ownership::View, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase
for our types.
include/util/static_rtree.hpp
Outdated
std::uint32_t nodes_in_current_level = | ||
std::ceil(static_cast<double>(nodes_in_previous_level) / BRANCHING_FACTOR); | ||
|
||
for (std::uint32_t current_node_idx = 0; current_node_idx < nodes_in_current_level; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be refactored using util::irange
include/util/static_rtree.hpp
Outdated
|
||
// Calculate the bounding box for BRANCHING_FACTOR nodes in the previous | ||
// level, then save that box as a new TreeNode in the new level. | ||
for (auto child_node_idx = first_child_index; child_node_idx < last_child_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util::irange
include/util/static_rtree.hpp
Outdated
// 0 12 345 6789 | ||
// This ordering keeps the position math easy to understand during later | ||
// searches | ||
for (std::size_t i = 0; i < m_tree_level_sizes.size(); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util::irange
732a367
to
75b18ae
Compare
…tree structure is in the .ramIndex file. Also tunes the BRANCHING_FACTOR a bit to speed up access with this new layout.
@TheMarex I've addressed most of the review points. I'm out for the next few days, please feel free to merge this as-is, or make whatever remaining modifications are necessary to get it into |
This PR refactors the R-Tree data layout to improve performance, reduce data sizes (a little bit) and simplify file structures.
TL;DR - 20% faster queries at the expense of 3.8x more
.ramIndex
data. Simpler file structures. 1% less R-Tree data overall (data moved from disk to RAM).The main changes here are:
LeafNode
structure is removed. The.fileIndex
file now contains a flat array ofEdgeDataT
objects, (still) sorted by Hilbert Code on their centroid. We still choose the number ofEdgeDataT
objects to pack into each leaf node based on a power-of-2 block size.TreeNode
structure now only contains the bounding rectangle for each node in the R-Tree.Tree navigation is now achieved by calculation, rather than storing pointers within the tree structure. A small new data structure
m_tree_level_sizes
is stored in the.ramIndex
file. It contains the number ofTreeNodes
in each level of the tree, starting from the root (which is always size 1).This is sufficient information, along with knowledge of BRANCHING_FACTOR and LEAF_NODE_SIZE, to calculate the position of the child of any node in the tree. The
ExploreTreeNode
,ExploreLeafNode
andSearchInBox
functions have been updated to use this approach (the calculation logic resides in thechild_indexes(TreeIndex parent)
member function of theStaticRTree
class.The net result of these changes is:
.ramIndex
is now approximately 3.8x largerHere is a spreadsheet that summarizes the size/performance differences after these changes are applied:
Tasklist