Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds inflation unit tests #92901

Merged
merged 6 commits into from
Jan 24, 2023
Merged

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Jan 13, 2023

Originally this PR added an alternative algorithm to BoundedGeoHexGridTiler, but performance tests showed that it was not actually better than the original, so now the PR only adds a number of unit tests.

@craigtaverner craigtaverner added >non-issue :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jan 13, 2023
@craigtaverner craigtaverner requested a review from iverase January 13, 2023 09:49
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.7.0 labels Jan 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment, you need to be very careful with bounding boxes crossing the dateline.

final double maxY = Math.min(wAvg(bbox.top(), boundsOfCells.getMaxY(), factor), 90d);
final double left = GeoUtils.normalizeLon(wAvg(bbox.left(), boundsOfCells.getMinX(), factor));
final double right = GeoUtils.normalizeLon(wAvg(bbox.right(), boundsOfCells.getMaxX(), factor));
if (right - left >= 360d) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right as we are normalizing the latitudes, it will always be false?

This is one of the tricky parts here. In the current approach we compute the width taking into account that the bounding box might cross the dateline.

Then we can compute if inflating the bounding box will actually exceed 360d:

2 * factor * width + width(bbox) >= 360d

In your approach, detecting this case does not seem trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've since written some unit tests, and they do indeed fail. I'll see if the fix is easy enough.

The BoundedGeoHexGridTiler inflates the bounding box for parent
cell searches since the parent cells bounds do not match the
descendents.

This algorithm does a slightly more refined calculation of the
inflation factor by considering all four corners and inflating in
the four directions independently.
The new algorithm was faster than the old for the same inflation factor
but had more failures, and a longer tail of failures (which needed
higher inflation factors). So it would take a lot more work (probably)
to find and fix these edge cases. An easier option is to keep the
original algorithm.
This increases performance by about 10% (a factor of 0.22 increased
performance by about 13%).

We found the error rate for `testGeoShapeGeoHex` was about 0.5% for
inflation factor 0.19 and 0.2% for inflation factor 0.22.
These last 2/1000 failed tests required inflation factors of 0.27
0.30 respectively, so we'll investigate them individually.
@craigtaverner craigtaverner changed the title Alternative algorithm for inflated bounds Bounds inflation performance and unit tests Jan 19, 2023
Since there were three failures discovered for 0.25, all of which passed
at 0.30, we decided to set the inflation factor to 0.35 initially for
the sake of caution, and consider revising it down again later with
more careful testing.
* This is done by taking the H3 cells at two corners, and expanding the filter width
* by 50% of the max width of those cells, and filter height by 50% of the max height of those cells.
*
* The inflation factor of 50% has been verified using test GeoHexTilerTests#testLargeShapeWithBounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not correct any longer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are talking about the last line, I changed that from tuned to verified, as I thought that was correct. But would you prefer we remove the line entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more that the inflation factor is not 50% any more but 35%?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. I fixed that now.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@craigtaverner craigtaverner merged commit 649a5d7 into elastic:main Jan 24, 2023
@craigtaverner craigtaverner deleted the refinedHexBounds branch January 24, 2023 15:14
@craigtaverner craigtaverner changed the title Bounds inflation performance and unit tests Bounds inflation unit tests Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants