-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Flatbush RTree for Spatial Joins #13079
Conversation
cc @jwass |
5475e37
to
57bcd97
Compare
fd4d8db
to
823a83b
Compare
87e0eac
to
9123371
Compare
@jagill James, thanks for replacing JTS R-Tree with the one that doesn't create so many objects. Where can I read about the Flatbush you are using? Is https://github.com/mourner/flatbush/ the right place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore order of rows in spatial join tests
Looks good, but I would split it into two commits: (1) reformat the code; (2) switch to assertOperatorEqualsIgnoreOrder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HasExtent interface
Looks good.
presto-geospatial-toolkit/src/main/java/com/facebook/presto/geospatial/Rectangle.java
Show resolved
Hide resolved
return this; | ||
} | ||
|
||
public long getSizeInBytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public interface HasExtent | ||
{ | ||
Rectangle getExtent(); | ||
long getSizeInBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer getEstimatedSize
or getEstimatedSizeInBytes
because often we can't know exactly how much memory an object is using and return an estimate or approximation.
presto-geospatial-toolkit/src/main/java/com/facebook/presto/geospatial/rtree/Flatbush.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing comments as I ran out of time.
Implement Flatbush Hilbert RTree
Commit message is indented (most likely unintentionally) - would you fix that?
*/ | ||
public class Flatbush<T extends HasExtent> | ||
{ | ||
private static final int FLATBUSH_INSTANCE_SIZE = ClassLayout.parseClass(Flatbush.class).instanceSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FLATBUSH_INSTANCE_SIZE -> INSTANCE_SIZE by convention
checkArgument(degree > 0, "degree must be positive"); | ||
this.degree = degree; | ||
this.items = requireNonNull(items, "items is null"); | ||
this.levelOffsets = Flatbush.calculateLevelIndices(items.length, degree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Flatbush.calculateLevelIndices -> calculateLevelIndices
- calculateLevelIndices -> calculateLevelOffsets to avoid using multiple terms for the same thing
this.degree = degree; | ||
this.items = requireNonNull(items, "items is null"); | ||
this.levelOffsets = Flatbush.calculateLevelIndices(items.length, degree); | ||
this.tree = this.buildTree(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this.buildTree() -> buildTree()
*/ | ||
private static int[] calculateLevelIndices(int numItems, int degree) | ||
{ | ||
List<Integer> levelIndices = new ArrayList<Integer>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- levelIndices -> offsets
- new ArrayList() -> new ArrayList<>()
List<Integer> offsets = new ArrayList<>();
{ | ||
List<Integer> levelIndices = new ArrayList<Integer>(); | ||
// Leaf nodes start at 0, root is the last element. | ||
levelIndices.add(Integer.valueOf(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer.valueOf(0) -> 0
offsets.add(0);
return items; | ||
} | ||
|
||
public int getHeight() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is not used
DoubleBigArray tree = new DoubleBigArray(Double.NaN); | ||
tree.ensureCapacity(levelOffsets[levelOffsets.length - 1] + ENVELOPE_SIZE); | ||
|
||
sortByHilberIndex(this.items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.items -> items
sortByHilberIndex(this.items); | ||
|
||
int writeOffset = 0; | ||
int readOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable is not used until next loop; move it there
} | ||
|
||
int numChildren = items.length; | ||
for (int childLevel = 0; childLevel < levelOffsets.length - 1; childLevel++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
childLevel -> level
presto-geospatial-toolkit/src/main/java/com/facebook/presto/geospatial/rtree/Flatbush.java
Show resolved
Hide resolved
@mbasmanova For further reading on the flatbush, the mourner/flatbush repo is a good start; it also links to the wikipedia page which describes the general class/algorith: https://en.wikipedia.org/wiki/Hilbert_R-tree#Packed_Hilbert_R-trees Note in this case we only sort the leaf notes, and rely on the hierarchy of the Hilbert curve to keep all parent nodes roughly in the right place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jagill Finished reading Implement Flatbush Hilbert RTree
commit. Overall looks good. Here are some comments.
yMax = max(yMax, tree.get(readOffset++)); | ||
|
||
// Every `degree` children, make a parent | ||
if (readOffset % (degree * ENVELOPE_SIZE) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (child % degree == 0) {
might be clearer and make the comment unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts re:
if (child % degree == 0) {
might be clearer and make the comment unnecessary
?
} | ||
} | ||
// Add any "dangling" parent not added above | ||
if (readOffset % (degree * ENVELOPE_SIZE) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (numChildren % degree > 0) {
might be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts re:
if (readOffset % degree > 0) {
might be clearer and make the comment unnecessary
?
int level = 0; | ||
while (numItems > 1) { | ||
// The level capacity will be the smallest multiple of degree gte numItems | ||
int capacity = ((numItems + degree - 1) / degree) * degree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capacity -> numChildren to match buildTree?
import com.facebook.presto.array.DoubleBigArray; | ||
import com.facebook.presto.geospatial.Rectangle; | ||
import it.unimi.dsi.fastutil.ints.IntArrayList; | ||
import org.locationtech.jts.index.ItemVisitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add this dependency. This interface is trivial. Let's just make our own.
public interface ItemVisitor<T>
{
void visitItem(T item);
}
} | ||
} | ||
|
||
public List<T> findIntersections(Rectangle query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used only in TestFlatbush. I'd move it there and use results::add
:
private static List<Rectangle> findIntersections(Flatbush<Rectangle> flatbush, Rectangle rectangle)
{
List<Rectangle> results = new ArrayList<>();
flatbush.findIntersections(rectangle, results::add);
return results;
}
} | ||
|
||
@Test | ||
public void testRandomRects100Probe1000Build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't abbreviate.
- I think a better structure would to be have helper method take two lists of rectangles and have the tests compose these lists as they see fit.
@Test
public void testRandomRectanglesSmall()
{
Random random = new Random(42);
assertRectangles(makeRectangles(random, 1_000), makeRectangles(random, 100));
}
@Test
public void testRandomRectanglesMedium()
{
Random random = new Random(123);
assertRectangles(makeRectangles(random, 10_000), makeRectangles(random, 1_000));
}
@Test
public void testRandomRectanglesLarge()
{
Random random = new Random(321);
assertRectangles(makeRectangles(random, 50_000), makeRectangles(random, 5_000));
}
private static void assertRectangles(List<Rectangle> buildRectanges, List<Rectangle> probeRectanges)
{
Flatbush<Rectangle> rtree = new Flatbush<>(buildRectanges.toArray(new Rectangle[] {}));
for (Rectangle probe : probeRectanges) {
List<Rectangle> expected = buildRectanges.stream()
.filter(rectangle -> rectangle.intersects(probe))
.collect(toImmutableList());
assertEqualsNoOrder(findIntersections(rtree, probe), expected);
}
}
private static void assertEqualsNoOrder(List<Rectangle> actual, List<Rectangle> expected)
{
List<Rectangle> actualSorted = actual.stream().sorted(RECTANGLE_COMPARATOR).collect(toImmutableList());
List<Rectangle> expectedSorted = expected.stream().sorted(RECTANGLE_COMPARATOR).collect(toImmutableList());
assertEquals(actualSorted, expectedSorted);
}
OGCGeometryWrapper pointW = new OGCGeometryWrapper(POINT_W); | ||
|
||
Flatbush rtree = new Flatbush<OGCGeometryWrapper>(new OGCGeometryWrapper[] { | ||
pointX, pointY, pointZ, pointW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the previous line:
Flatbush<OGCGeometryWrapper> rtree = new Flatbush<>(new OGCGeometryWrapper[] {pointX, pointY, pointZ, pointW});
|
||
List<OGCGeometryWrapper> resultsA = rtree.findIntersections(octagonA.getExtent()); | ||
resultsA.sort(Comparator.naturalOrder()); | ||
assertEquals(resultsA, Arrays.asList(pointX, pointY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assertEqualsNoOrder
helper function
assertEqualsNoOrder(findIntersections(rtree, octagonA.getExtent()), ImmutableList.of(pointX, pointY), Comparator.naturalOrder());
assertEqualsNoOrder(findIntersections(rtree, octagonB.getExtent()), ImmutableList.of(pointY, pointZ), Comparator.naturalOrder());
private static <T> void assertEqualsNoOrder(List<T> actual, List<T> expected, Comparator<T> comparator)
{
List<T> actualSorted = actual.stream().sorted(comparator).collect(toImmutableList());
List<T> expectedSorted = expected.stream().sorted(comparator).collect(toImmutableList());
assertEquals(actualSorted, expectedSorted);
}
return createFromEsriGeometry(new Point(x, y), null); | ||
} | ||
|
||
public static double randomDouble(Random rng, double minVal, double maxVal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't abbreviate
- rng -> random, minVal -> min, maxVal -> max
- makeRandomRectangle -> makeRectangle; make it private
- makeRandomRectangles -> makeRectanges
public static List<Rectangle> makeRectangles(Random random, int count)
{
return IntStream.range(0, count)
.mapToObj(i -> makeRectangle(random))
.collect(toImmutableList());
}
private static double randomDouble(Random random, double min, double max)
{
return min + random.nextDouble() * (max - min);
}
/*
* Make a random rectangle "near" origin of size "about" size.
*/
private static Rectangle makeRectangle(Random random)
{
double minX = randomDouble(random, -100, 100);
double minY = randomDouble(random, -100, 100);
double sizeX = randomDouble(random, 0.0, 10);
double sizeY = randomDouble(random, 0.0, 10);
return new Rectangle(minX, minY, minX + sizeX, minY + sizeY);
}
|
||
import static com.esri.core.geometry.ogc.OGCGeometry.createFromEsriGeometry; | ||
|
||
public final class TestUtils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very generic name. Perhaps, RectangleTestUtils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Flatbush instead of JTS STR RTree for spatial joins
This is awesome!!!
I'd change commit title to Use Flatbush R-Tree in spatial join
@@ -155,11 +112,11 @@ private static Envelope getEnvelope(OGCGeometry ogcGeometry) | |||
|
|||
IntArrayList matchingPositions = new IntArrayList(); | |||
|
|||
Envelope envelope = getEnvelope(probeGeometry); | |||
rtree.query(envelope, item -> { | |||
Rectangle queryRect = GeometryUtils.getExtent(probeGeometry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import + inline: rtree.findIntersections(getExtent(probeGeometry), item -> {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't inline, because it's used in the closure (for testReferencePoint
).
@@ -221,4 +178,60 @@ public void appendTo(int joinPosition, PageBuilder pageBuilder, int outputChanne | |||
outputChannelOffset++; | |||
} | |||
} | |||
|
|||
public static final class GeometryWithPosition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move this class in a separate commit?
3bff965
to
05cff9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Flatbush instead of JTS STR RTree for spatial joins
Very nice. Thank you!
Envelope envelope = getEnvelope(probeGeometry); | ||
rtree.query(envelope, item -> { | ||
GeometryWithPosition geometryWithPosition = (GeometryWithPosition) item; | ||
Rectangle queryRect = getExtent(probeGeometry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not abbreviate: queryRect -> query or rectangle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding benchmarks. Some of them show very high error bounds. Would you adjust the settings on the benchmarks to bring errors down?
293.261 ± 41.055
444.502 ± 86.900
{ | ||
private static final int SEED = 613; | ||
|
||
public static void main(String[] args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move main method to the end of the class; ditto other benchmarks
@Param({"8", "16", "32"}) | ||
private int rtreeDegree; | ||
@Param({"1000", "3000", "10000", "30000", "100000", "300000", "1000000"}) | ||
private int numBuildRects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't abbreviate: numBuildRects -> numRectangles and buildRects -> rectangles
public static class BenchmarkData | ||
{ | ||
@Param({"1000", "3000", "10000", "30000", "100000", "300000", "1000000"}) | ||
private int numBuildEnvs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't abbreviate
@Param({"1000", "3000", "10000", "30000", "100000", "300000", "1000000"}) | ||
private int numBuildEnvs; | ||
|
||
private List<Envelope> buildEnvs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't abbreviate throughout this commit
CC: @pettyjamesm |
803aa45
to
c79c59a
Compare
cc @mourner: This PR ports your Flatbush to Java for use with Presto. Just thought you should know, and happy to hear any feedback you have! |
cc @rawrunprotected We used your log(n) 2D hilbert curve algorithm here. Just wanted to let you know, and to see if you had any feedback! We had to port to Java signed ints (wrapped in a long), because Java doesn't have unsigned ints :(. |
Thanks for letting me know, awesome to see my stuff being used :) I haven't touched Java in a decade, but I think the code should work fine with just plain ints and without explicit masking to 16bit intermediates. Bitwise operations don't care about signedness, and the code already uses logical right shifts (>>>), so unsigned vs signed doesn't really matter at all. The only caveat is that you'll want to zero-extend the result at the very end to long in order to allow using it as a non-negative offset. That can be done with a quick pair of shifts. See attached, but I only did some very superficial testing ;) |
@jagill this is awesome, thanks a lot for letting me know! Can you share the benchmark results here to get an idea for the improvement? @rawrunprotected thanks so much for chiming in here and for the amazing original implementation! In the JS version (which this PR ports), I use it like this, although it could probably be simplified further because I limit the input to 16-bit uints (plenty for spatial indexing needs) so that the resulting hilbert values can be stored in a uint32 array. |
Spatial joins don't have a defined output order, so don't test for that.
RTrees can take anything that exposes a bounding-box extent. Create an interface for this (that the RTree will use later), and implement it for Rectangle. Also, add a containment function in Rectangle, and implement helper functions for HasExtent in GeometryUtils.
There are memory problems with spatial joins, potentially due to the large number of objects created by the JTS RTree. The JTS tree used for the build side of spatial joins creates many objects and may not report its memory precisely. A Flatbush is a flattened, bulk-loaded immutable RTree that stores the tree as a single array of doubles. Its internal fields are just a few primitive fields, a shared pointer to the input geometries, a small (~5-6 size) array of ints, and a DoubleBigArray of a size approximately the same as the input geometries. Construction involves sorting the geometries by their Hilbert curve index, which has good hierarchical properties and makes construction of the rest of the tree straightforward.
The flatbush provides performance benefits over the JTS tree, with over 4x query speed at 1M build geometries, and 2.5x faster builds. Benchmarks (included in change) Query Benchmark (numBuildRects) Mode Cnt Score Error Units JTS STR RTree BenchmarkJtsStrTreeQuery.rtreeQuery 1000 avgt 20 0.439 ± 0.006 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 3000 avgt 20 0.778 ± 0.010 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 10000 avgt 20 1.675 ± 0.017 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 30000 avgt 20 4.208 ± 0.565 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 100000 avgt 20 18.533 ± 0.786 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 300000 avgt 20 59.184 ± 1.175 us/op BenchmarkJtsStrTreeQuery.rtreeQuery 1000000 avgt 20 250.904 ± 10.420 us/op Flatbush (degree 16) BenchmarkFlatbushQuery.rtreeQuery 1000 avgt 20 0.578 ± 0.019 us/op BenchmarkFlatbushQuery.rtreeQuery 3000 avgt 20 0.902 ± 0.013 us/op BenchmarkFlatbushQuery.rtreeQuery 10000 avgt 20 1.667 ± 0.018 us/op BenchmarkFlatbushQuery.rtreeQuery 30000 avgt 20 3.329 ± 0.033 us/op BenchmarkFlatbushQuery.rtreeQuery 100000 avgt 20 9.435 ± 0.599 us/op BenchmarkFlatbushQuery.rtreeQuery 300000 avgt 20 27.564 ± 0.851 us/op BenchmarkFlatbushQuery.rtreeQuery 1000000 avgt 20 72.725 ± 0.582 us/op Build Benchmark (numBuildRects) Mode Cnt Score Error Units JTS STR RTree BenchmarkJtsStrTreeBuild.buildRtree 1000 avgt 20 0.306 ± 0.020 ms/op BenchmarkJtsStrTreeBuild.buildRtree 3000 avgt 20 1.077 ± 0.023 ms/op BenchmarkJtsStrTreeBuild.buildRtree 10000 avgt 20 4.542 ± 0.055 ms/op BenchmarkJtsStrTreeBuild.buildRtree 30000 avgt 20 15.837 ± 0.414 ms/op BenchmarkJtsStrTreeBuild.buildRtree 100000 avgt 20 77.911 ± 2.883 ms/op BenchmarkJtsStrTreeBuild.buildRtree 300000 avgt 20 309.898 ± 7.493 ms/op BenchmarkJtsStrTreeBuild.buildRtree 1000000 avgt 20 1325.381 ± 67.566 ms/op Flatbush (degree 16) BenchmarkFlatbushBuild.buildRtree 1000 avgt 20 0.503 ± 0.007 ms/op BenchmarkFlatbushBuild.buildRtree 3000 avgt 20 1.786 ± 0.017 ms/op BenchmarkFlatbushBuild.buildRtree 10000 avgt 20 3.213 ± 0.154 ms/op BenchmarkFlatbushBuild.buildRtree 30000 avgt 20 9.667 ± 0.389 ms/op BenchmarkFlatbushBuild.buildRtree 100000 avgt 20 30.333 ± 0.559 ms/op BenchmarkFlatbushBuild.buildRtree 300000 avgt 20 98.058 ± 2.014 ms/op BenchmarkFlatbushBuild.buildRtree 1000000 avgt 20 367.947 ± 5.481 ms/op
There are memory problems with spatial joins, potentially due to the large number of objects created by the JTS RTree. The JTS tree used for the build side of spatial joins creates many objects and may not report its memory precisely. A Flatbush creates very few objects, with a low memory footprint, as well as faster build/query times.
@rawrunprotected Thanks! That little change sped up build times about 10%. Prior to using (the first version of) your code, the building bottleneck was the Hilbert index calculation, using the bit twiddling massively sped up the sorting. @mourner I've included some benchmarks in the description of the commit that adds them: c1d6b4d . In general, both querying and building are about 4x faster when we get to 1M geometries. Critically, it has a better scaling factor than the JTS STR RTree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HasExtent interface
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement Flatbush Hilbert RTree
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Flatbush instead of JTS STR RTree for spatial joins
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore order of rows in spatial join tests
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Flatbush RTree for Spatial Joins
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jagill Looks good to me. This is a very nice and desirable improvement for the spatial joins. Thanks for researching and implementing such an elegant fix.
} | ||
|
||
HilbertIndex hilbert = new HilbertIndex(totalExtent); | ||
Arrays.parallelSort(items, Comparator.comparing(item -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jagill is it inefficient to compute the hilbert index dynamically like this? Won't there be many computations of the same index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index is calculated on average O(ln N)
times per entry, as you said. I originally used davidmoten/hilbert-curve, and (via profiling) index calculation did indeed dominate construction CPU. However, this new calculation is very efficient (all bit twiddling), and so now it's only a few % of construction CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense, and certainly simpler than maintaining a pre-computed value.
|
||
private boolean doesIntersect(Rectangle query, int nodeIndex) | ||
{ | ||
return query.getXMax() >= tree.get(nodeIndex) // xMin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic compute "query contains node", or "query intersects node"?
There are memory problems with spatial joins, potentially due to the
large number of objects created by the JTS RTree. A Flatbush is an
RTree with a low memory footprint, minimal heap allocation, and fast
lookup. As the name would suggest, it is very flat -- the tree itself
is just a DoubleBigArray, with a couple other small fields. Lookup is
also pretty fast: the main difficulty is some index ninjitsu to handle
parent/child relationships.
This PR is branched off of #13235, which contains formatting and
other changes that are good independent of this one.