-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc-value-only fields #52728
Comments
Pinging @elastic/es-search (:Search/Mapping) |
I would like to clarify something. Users currently can disable inverted data structures for fields by using "mappings": {
"properties": {
"location": {
"type": "geo_point",
"doc_values": true,
"index": false
}
}
} By this will disable some queries. Is your proposal to make all queries work on docValues instead of inverted data structures? I am wondering if this even possible for all field types? |
Yes. For instance all numeric fields create queries on doc values today, but only as a way to speed up query execution when there is another filter in the query that is more selective than this one (using
I think this would work for all fields that support doc values, but I think that this feature would be mostly useful for number fields, so I was thinking of focusing on those first. |
The issue #48665 could relate to this idea -- if an index is sorted on a numeric field, we could perform fast range queries that rely only on the field's doc values. |
This is being worked on as part of #82409 for numeric fields. @jpountz you mentioned above that you would like this type of slower queries to be linked to some warning mechanism. Do you think that that is a hard requirement or can we add support queries on doc_values although we haven't yet figured out the details around the warning mechanism? |
Allows searching on number field types (long, short, int, float, double, byte, half_float) when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Note to reviewers: I have split isSearchable into two separate methods isIndexed and isSearchable on MappedFieldType. The former one is about whether actual indexing data structures have been used (postings or points), and the latter one on whether you can run queries on the given field (e.g. used by field caps). For number field types, queries are now allowed whenever points are available or when doc values are available (i.e. searchability is expanded). Relates #81210 and #52728
I think that the direction we're taking involves doing more and more costly stuff (think of runtime fields or identifying sequences of events with EQL) and we will want to reconsider whether we really want to warn on slow operations. So I wouldn't make this a hard requirement, and I wonder if this should still be a requirement at all. |
Similar to #82409, but for date fields. Allows searching on date field types (date, date_nanos) when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
Allows searching on keyword fields when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
Allows searching on boolean fields when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
Allows searching on ip fields when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
The following doc-value-only fields (term + range query support) have been implemented in 8.1.0:
For feature parity with runtime fields, |
Similar to #82409, but for geo_point fields. Allows searching on geo_point fields when those fields are not indexed (index: false) but just doc values are enabled. Also adds distance feature query support for date fields (bringing date field to feature parity with runtime fields) This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
These missing pieces for feature parity with runtime fields have been implemented as well now (ES 8.1.0):
I've used native doc-value based queries where possible, and have used their runtime field equivalents when not available. |
Implemented in 8.1. |
Just some basic benchmarks: I've run the nyc_taxis Rally benchmark once with all eligible fields set to
|
Some observations on the above:
|
Users who index time series typically care a lot about indexing rate and space efficiency. Disabling inverted structures like the inverted index and points would help on both fronts. Queries could still work using doc values, but more slowly, which is a trade-off that these users are often happy to make.
Default mappings would still create inverted structures, so users would have to opt-in to trade search efficiency for disk space / indexing rate.
I'd like to make this change depend on some feedback mechanism as outlined in #48058, so that users having slow queries because inverted structures have been disabled would never come as a surprise.
The text was updated successfully, but these errors were encountered: