-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give doc-value-only mappings to numeric fields on metrics templates. #87100
Conversation
It doesn't make much sense to filter samples depending on the values of some metric fields, as these metrics mostly make sense in aggregate and over time. This change proposes to give doc-value-only mappings to numeric fields in the default metrics template. This helps reduce index-time CPU usage and disk usage while not impacting the search-time experience. Another improvement that this change brings is mapping floating-point numbers as doubles instead of floats by default, since double precision is more commonly used across other systems that are designed at handling metrics. I could revert that part and switch back to floats if it proved controversial. These doc-value-only dynamic templates might catch fields that are not metrics at times, which is probably ok.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @jpountz, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the ES side, but since these mappings came from the observability side I'd prefer someone from that team sign off before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine by me too.
Change LGTM. On the double change, will this have any storage or performance impact for numbers with only 1 digit for example? From an Observability perspective, by default this will not have an effect on all the data we load packages with templates for as these have higher priority. It will mainly affect the data without predefined mappings. If the above change is unexpected for a user, I assume our recommendation is to set a specific template in place for these data streams with higher priority to revert back to the old behaviour. @axw @felixbarny I assume this will not really have an affect on APM? @joshdover This change of defaults should also be applied to Fleet and the default mappings we load for each data stream. We should have again a discussion with @jpountz around reusing or not the "base" templates from Elasticsearch for these. |
+1 to index: false and rely only on doc values. This is how I used to map metrics indices in the past. Last time I checked, which was quite a while ago, doubles took up more space than floats but if the industry standard is to use doubles that that's what we should do, too. But maybe compression has improved in the meantime in ES?
Is there any capability to "inherit" from the base template or will a higher priority template always override and not merge with other templates? |
Yes. If we use component templates, I think the last (or first) component wins. |
APM has its own dynamic templates. We should probably follow suit though: elastic/apm-server#8261 |
It will increase storage requirements indeed. One special case that Elasticsearch/Lucene detect and that wouldn't become less space-efficient is when numbers are effectively all integers across a segment. In that case, the binary representation of the numbers likely has lots of trailing zeroes and Lucene manages to store them implicitly. However, if numbers have a decimal part then Lucene will likely spend 8 whole bytes on doc values for numbers mapped as doubles and 4 whole bytes for numbers mapped as floats. This is something that we discussed improving a bit for time series data, but that hasn't been done yet. I made this change because my understanding is that double precision is pretty standard for metrics, but like I said I can revert this part if it's not as obvious as I thought, we can do it later.
Yes. We have to be able to make improvements to the out-of-the-box experience. :)
Oh, I like references to old blog posts. :)
It's a different topic, but I have a strong preference for not reusing the Elasticsearch base templates as they have a different lifecycle from integrations templates, so creating dependencies between them has a high risk of introducing breaks. I'd rather duplicate the relevant bits. |
Pinging @elastic/clients-team (Team:Clients) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this change. The assumption is that at some point we likely will optimise double vs float usage with TSDB and on the reuse (or not) of component templates, we will have a separate discussion.
@ruflin Which defaults are you referring to? Today Fleet only has two base component templates whcih contain mappings that are installed for all data streams. Neither of these contain any numeric fields:
|
@joshdover These are the ones I was referring to and we should likely add some numeric values. But lets split the discussion from this PR as there is more to it. I'll ping you. |
I reverted the change from |
@elasticmachine test this please |
It doesn't make much sense to filter samples depending on the values of some
metric fields, as these metrics mostly make sense in aggregate and over time.
This change proposes to give doc-value-only mappings to numeric fields in the
default metrics template. This helps reduce index-time CPU usage and disk usage
while not impacting the search-time experience.
Another improvement that this change brings is mapping floating-point numbers
as doubles instead of floats by default, since double precision is more
commonly used across other systems that are designed at handling metrics. I
could revert that part and switch back to floats if it proved controversial.
These doc-value-only dynamic templates might catch fields that are not metrics
at times, which is probably ok.