-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate/query by geometry-type with geo_shape fields #49569
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Geo) |
This enhancement would also be useful for vector tiling (elastic/kibana#58519). When a |
Also useful to determine if Maps can construct point-2-point layers elastic/kibana#68540 |
Let's make sure we capture the "ask". I don't know many customers that have specifically requested this capability (even though it is a standard function in Oracle and PostGIS), so lets make sure we document if this is beneficial for either a. a performance boost, or b. enables "power play" functionality in maps. Technically we can split this into two issues:
This way we could nest other metric aggs and run interesting analysis (e.g., attribute stats by geometry type). Note that we won't be able to support
|
That is right, we currently do not store information in the index / doc values about how a shape was defined. Note that the following shapes consisting in two points are equivalent for us:
On the other hand, we do have information about the shape
It would be straight forward to use is to provide an aggregation by topological dimensions. I would rename the egg accordingly.
As above we can provide some filter capabilities wrt the topological dimensionality. For a stand alone query, we would have to implement the query on top of the doc values as BKD index is only efficient if we provide a spatial constraint. |
I wonder if this is something we should resist doing. Getting this information would be very costly, and couldn't be cached since a polygon could be added to a field that only stored points so far at any time. |
thx @jpountz. The Maps-app would not need to cache this information anywhere. It would request it when bootstrapping the UX for a layer. To give some context for this request: Clients of Elasticsearch-API have no efficient way of determining the types of the shapes stored in the geo_shape field (points, lines, or polygons) without actually pulling all of them. This affects general purpose visualization tools like Kibana Maps. Not knowing up front what geometries there are actually stored in an index, cascades in the UX. It results in a UX that has a "grabbag" look&feel. Consider: We are showing all 3 options because we don't know the geometry-types of all the documents in that index (e.g. consider there could be millions of documents. In the screenshot example 600k building footprints, too many to pull out for web-apps). For end-users, this grab-bag UX is less than optimal. Especially since most users will store their data "thematically". e.g. rivers (lines) in one index, building footprints (polygons) in another, points-of-interest (points) in another. This implicit knowledge can be used in the UX. e.g. Maps could simplify its UX by having knowledge of the geometry-type:
As for (3), this is more hypothetical since Maps does not do this today (although we do want Kibana to handle display of large datasets better elastic/kibana#58519). Not being able to filter on geometry-type, makes it harder to build maps where the display of documents is scale-dependent (ie. based on the zoom-level of the map, data gets filtered/simplified). Point-data should be handled differently than lines and polygons. E.g. building footprints should be filtered-out when zoomed-out (since they are invisible at that scale), but points-of-interest should be retained (because points have no size). |
Could we get half-way there by looking at field caps to know whether the field is mapped as a I'd really like to avoid making the UI block waiting for the result of an aggregation to know how it should specialize for the type of geometries that are stored in the index. This is something that would work with small amounts of data but would start giving users a bad experience as they start having non-negligible amounts of data and using our slow features (e.g. schema-on-read, searchable snapshots). To be clear I'm not against adding this aggregation, which can be useful, I'm opposed to making UI loading depend on the result of this aggregation. |
Yes, Maps is already doing this right now. The "gap" is that for
I don't think Maps would "block" the UI. Rather, knowledge about geometry-types would be used to fine-tune some of the presentation in an async-operation. The potential that Kibana runs an agg on all the data is a generic issue in Kibana (e.g. date-histograms in Discover). A couple example in the Maps-application where this potentially occurs (absent any filter-context constraints, like time-range etc..)
I do agree that for enormous data-sets, this would result in a poor experience. But then likely Kibana is not the right tool to build a map-visualization on top of that data. Just in general, it is really helpful for a web-app like Kibana to be able to determine relevant meta-data before actually having to query all the documents. Also, maybe the ask was worded the wrong way. Rather than asking for "can we add an agg that gives us geometry-types", maybe the ask should be more along the lines of "How would clients get useful meta-data about geometries stored in ES, without actually pulling the entire dataset?" (e.g. the bounds of the data, the geometry-type of the shapes, the size of the shapes, ...). |
I was seeing scale as a competitive advantage, so I would be disappointed if we dropped the objective of making Maps usable with large amounts of data.
So maybe we should recommend more strongly to use A middle ground that would be better than aggregating on the geometry type would be to enhance |
Me too. A lot of the focus is on Maps is in working with ES-data at any scale. Blended layers (merged), aggs on geo_shape (merged), vector-tiling (future) are all efforts to display ES geo-data on a map at any scale (in two senses: whether there's few or many documents but also whether user is zoomed-out or zoomed-in). Every once in a while, feature request will trickle down to the ES-level to help Kibana achieve that ;)
++ can do. I also understand that there is a performance benefit for using
Filtering-by-type would be very useful. Many other geo-tools allow querying geometries by type because the type impacts the styling. So it would be really useful for end-users to be able to structure their layers based on type. |
This is the corresponding issue on the Kibana-side, which is blocked by not being able to determine the geometry-type (or dimensionality) of the shapes. elastic/kibana#92672 (comment) |
So it seems like this already exists in SQL? https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-geo.html#sql-functions-geo-st-geometrytype |
not exactly, this SQL function operates on the source, it is not an aggregation |
Would it work in a |
@thomasneirynck you are correct, the function exists and it works with shapes. Unfortunately, as @talevy also correctly pointed out it can only extract the shape type from the shape source, which means it is available only in the contexts where source is available, which basically means we cannot use it for filtering ( |
With the introduction of painless support for geo_shape fields on #72886, this can now be achieved by using runtime fields. For example:
would that fulfil the need? |
@iverase - yes I think using the runtime field satisfies the use-case. To confirm, this function is available starting 7.14? |
yes, 7.14. |
for geometries indexed into the
geo_shape
field, it would be helpful to be able to aggregate on the type of geometry.Example use cases:
(1) for UX-applications that need to present a different UX based on the type-of geometries stored in the index.
geo_shape
(2) count/unique counts are especially relevant, but could be appropriate for all aggregations
(3) Similarly, it would be great to be able to specify filters on the data based on geometry-type
- e.g. only query for points for POI-type data.
This would be similar to the
ST_GeometryType
function in SQL.The text was updated successfully, but these errors were encountered: