-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a geospatial heatmap aggregation #20666
Conversation
My company had previously signed the company agreement but I guess that didn't jive with the CLA check. I just signed the other two individual agreements. |
@karmi, I just added that email address. Thanks! |
This looks pretty similar to the geo-hash grid aggregation. Can you comment on what it brings? |
The key difference between the geohash grid agg and the heatmap is that the geohash grid can only deal with indexed points ( The output format is also different, and may be a little easier for clients to integrate with. Rather than buckets labelled with geohash prefixes and their corresponding counts, the output of the hashmap is an array of arrays of ints (which are the # of hits). That eliminates the need to map geohash prefixes to spatial areas. |
Thanks, I get it better now. I need to think more whether this should be the same aggregation, but at the very least it should be a similar response format. So I guess we'd either need to change the response format of the geo-hash grid agg to look more like this one of the format of this one to look more like the geo-hash grid agg. I can see benefits to both, ie. the geo-hash grid works better if there is some sparsity in the data while this one is more compact if most cells have data. @epixa Any opinions about what response format would be best from a Kibana perspective? We could either have something like the geo-hash grid agg, ie.
or
@nknize I think you've been working on different ways to encode geo shapes using points APIs, do you know how challenging it would be to have a similar aggregation for the new API? We might want to be careful about merging this kind of aggregation if the new format would make such aggregations hard. |
@jpountz Sorry I've been pulled in a couple of directions here and haven't had a chance to give this some proper thought. @thomasneirynck Do you have any opinions on the returned data shape? #20666 (comment) |
…earch into feature/heatmap-disterr
The last commit adds the ability to calculate the grid level implicitly by specifying |
In general, I like the proposal of the grid-like format. It is denser, and especially for contiguous data it allows for more efficient transport of the result, and it could allow for easier/faster implementations on the clients. That said, right now, I'd stick with the key-value pair format:
I am interested though. For a future enhancement, it would be nice to be able to specify the output format for such an aggregation. ES could optionally output a numerical array, and clients (such as Kibana) could leverage this for more efficient rendering (e.g. by loading the result straight into an image or texture). |
I really like the idea of having multiple output formats. In fact the heatmaps could very easily be sent back as base-64 encoded PNGs. Geohashes have a human-readable character representation, whereas quadtrees only have a binary representation. I don't think the actual quadtree prefix is accessible from existing queries, so instead I could send back labels like:
The keys would be x,y coordinates on the heatmap. I couldn't find an algorithm that maps quadtree cells to geohashes, but if someone were to point me in the right direction then we could get output to exactly match geohash grid as well. |
I'll wait for @nknize to comment since he might have ideas about the response format. On my end I don't like having multiple response formats since this would be a nightmare to maintain. |
The geohash grid format allows us to have sub-aggs within each cell, eg the geo centroid agg, or terms, histos, whatever. The grid doesn't allow for this flexibility |
Yeah, I like the motivation for having a grid aggregation compatible with
|
@nknize, I agree. Closing the PR. |
Closes #20665.
A few of implementation notes:
First, the geom parameter contains a field name and so does the agg specification itself. That's a little repetitive but I couldn't see a clean way around that (feedback welcome of course).
Second, I would have liked to have implemented this as a plugin, but several of the core-tests classes aren't available yet to the test framework. I'll open a separate ticket for discussion about that.
Third, I would have also like to have passed docValues throughout the agg tree, but docValues for shapes seem to be very experimental at this stage.
Lastly, the core Lucene heatmap collector takes sort of a top-down approach to building the counts array. It starts from a top-level IndexReaderContext, but really what we'd want there is a LeafReaderContext. I've tried to work around that, but quite possibly a refactor of the Lucene class would make the heatmap agg more efficient.