You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is with regards to writing and indexing geo_shape values in a performant way, and then being able to query them correctly with a bounding box, without getting false positive results.
We currently need to store and index geo_shape values, using a quadtree with (the default) precision of 50m precision, in order for to be able to query/search them (their documents) with a bounding box without getting false positive results. When using the 50m quadtree precision (tree depth) we are noticing a big performance hit trying to write geo_shape values (e.g. polygons, polylines, etc.), resulting in needing to wait long minutes, between 20 minutes to an hour or more, until the write (and index) is done. That said, when using a 50m precision, when issuing queries with a bounding box on the index, we are getting back the correct results, (almost) without any false positive results.
When writing the geo_shape values with a quadtree index of 50km (rather than 50m) precision, the write time is much improved and is useable, but when running queries with a bounding box, we are getting too many false positive results.
We discussed this issue with the support team, @nknize and other elastic PMs, using the support site/app, over phone calls, and during meetings we had in the last two Elastic{ON} conferences - Elastic{ON} 17 and Elastic{ON} 18. We discussed multiple different workaround approaches needed to be done client side, both when writing/indexing as well as when querying/searching, and we explained why these workarounds can't work for our use cases, since we need to render a portion of the index using a specific bounding box, after running an external analytic, and storing the analytic results in an Elasticsearch index. Having the write time take long minutes to an hour is a showstopper for our use cases.
In our last meeting with @nknize, @zuketo, and others during Elastic{ON} 18 we came to a conclusion that this issue will be addressed by Elastic in three phases:
Phase One - implement a Geo Post Filtering on the ES (DB) side, having the ES queries always return correct results with no false positives. This will be done by using new Lucene v7.4 capabilities. That will allow us to use any precision with the quadtree index, including 50km which is performant enough for most of our use cases, but will ensure that queries with bounding boxes will always return correct results. Possibly also add a post_filtering=true parameter to the query parameters with a default value (true/false) TBD.
Phase Two - geo_shape BKD tree support phase 1 of 2 - implement BKD tree based geo_shape indexing. This index will still rasterize the geo_shape geometry value into multiple raster LODs, but will use the BKD tree approach which is supposed to be somewhat more performing than the existing quadtree indexing approach.
Phase Three - geo_shape BKD tree support phase 2 of 2 - switch to use some vector based indexing rather than rasterizing the geo_shape geometry value.
This GitHub issue is about implementing Phase One - Add the missing Geo post filtering ES (DB) side to the query/search implementation, using Lucene v7.4? capabilities, returning correct results when querying/searching with a bounding box, for any quadtree precision.
The text was updated successfully, but these errors were encountered:
Hi @hanoch, I'll close this issue in favor of #32039. BKD based geo shapes are a preferred solution to phase 1 within your description, which means we can move directly to phase 2 and use the linked issue #32039.
This issue is with regards to writing and indexing
geo_shape
values in a performant way, and then being able to query them correctly with a bounding box, without getting false positive results.We currently need to store and index
geo_shape
values, using a quadtree with (the default) precision of50m
precision, in order for to be able to query/search them (their documents) with a bounding box without getting false positive results. When using the50m
quadtree precision (tree depth) we are noticing a big performance hit trying to writegeo_shape
values (e.g. polygons, polylines, etc.), resulting in needing to wait long minutes, between 20 minutes to an hour or more, until the write (and index) is done. That said, when using a50m
precision, when issuing queries with a bounding box on the index, we are getting back the correct results, (almost) without any false positive results.When writing the geo_shape values with a quadtree index of
50km
(rather than50m
) precision, the write time is much improved and is useable, but when running queries with a bounding box, we are getting too many false positive results.We discussed this issue with the support team, @nknize and other elastic PMs, using the support site/app, over phone calls, and during meetings we had in the last two Elastic{ON} conferences - Elastic{ON} 17 and Elastic{ON} 18. We discussed multiple different workaround approaches needed to be done client side, both when writing/indexing as well as when querying/searching, and we explained why these workarounds can't work for our use cases, since we need to render a portion of the index using a specific bounding box, after running an external analytic, and storing the analytic results in an Elasticsearch index. Having the write time take long minutes to an hour is a showstopper for our use cases.
In our last meeting with @nknize, @zuketo, and others during Elastic{ON} 18 we came to a conclusion that this issue will be addressed by Elastic in three phases:
50km
which is performant enough for most of our use cases, but will ensure that queries with bounding boxes will always return correct results. Possibly also add apost_filtering=true
parameter to the query parameters with a default value (true/false) TBD.geo_shape
BKD tree support phase 1 of 2 - implement BKD tree basedgeo_shape
indexing. This index will still rasterize thegeo_shape
geometry value into multiple raster LODs, but will use the BKD tree approach which is supposed to be somewhat more performing than the existing quadtree indexing approach.geo_shape
BKD tree support phase 2 of 2 - switch to use some vector based indexing rather than rasterizing thegeo_shape
geometry value.This GitHub issue is about implementing
Phase One
- Add the missing Geo post filtering ES (DB) side to the query/search implementation, using Lucene v7.4? capabilities, returning correct results when querying/searching with a bounding box, for any quadtree precision.The text was updated successfully, but these errors were encountered: