-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many fields in template's default_fields #14262
Comments
I propose adding a new optional setting to the fields.yml file that will allow specifying that a field, or a group of fields, should not be included in the
Any suggestions on naming for the param? I'm not too keen on |
Good you found this, was not aware of the limit. And it seems it can only be changed on query time: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-top-level-params Does this mean already today if someone queries across I like the proposed solution as a short term solution but I think we need also a more scalable long term to follow up with. Naming: What about just using @skh This is also relavant for you moving the logic to Kibana. |
Filebeat is actually broken now in master. I thought it was my additions causing the problem but in master we already have 1175 default fields. +1 on Longer term if we move to an index per module indexing strategy then this should not be a problem. |
@andrewkroh sound like a good strategy / bandaid. We are on borrowed time here, maybe we should also goes through existing modules in filebeat and see if any fields could be removed from the list. also +1 on naming |
Just adding a note here so do not lose track, it's possible that 7.x is broken too in #14298 Also, that error is probably testable by just doing a query on Elasticsearch OR check the number of f default fields in the generated template. So we need to add a guard on theses cases. |
@ph We should definitively add something like this to our CI. It's kind of bad that we only realised it |
@ruflin yes, what concern me is, that error is only raised at query time. Maybe it should also validated when we push the template? Maybe its because users can change that limit on the fly they do not validate it at insert. |
@ph I think the argument here is that it is a query time parameter. So an index with 3000 default_fields is totally fine as long as the query param will be adjusted. Also 3000 fields could come from multiple indices with multiple templates. But I share your sentiment, that perhaps there should be a more global option around how many |
The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262
* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes #14262 * Exclude new zeek datasets from default_field list
* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262 * Exclude new zeek datasets from default_field list (cherry picked from commit 9f21b96)
Re-opening this because we only solved it with a temporary solution that comes with quite a few maintainability problems. The current fix for this was to just not add any new fields to the This also has implications on ECS when we want to update to a new fields.ecs.yml because we have to make sure any new ECS fields are not added to Are there any proposals for a more permanent fix? Will we move to a different indexing strategy where the number of fields is less of an issue? |
elastic/ecs#687 now automatically sets This workaround in ECS should also be considered a temporary workaround, IMO. Otherwise that would mean any field added from here to the end of the 7.x line wouldn't be added to I think one of the non-breaking ways we could address this (but only wrt the ECS fields) is consider culling some ECS fields that made it to Note that there's a broader point that these unrelated ECS field field definitions could also be removed entirely from each Beat's template, but that's another issue. |
This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
…709) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
…710) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
@andrewkroh Indexing wise we will switch to have 1 index per dataset so the total number of fields will be heavily reduced (as long as ECS doesn't have too many fields ;-) ). But this is the long term solution. As we keep adding modules on the Beats side at the moment, I think we also need a mid term solution for this. Perhaps we should loop in someone from the Elasticsearch team like @jpountz to get some input? |
I ran into this today, while trying to figure out why one of my fields wasn't queryable. Eventually I resolved it by removing several hundred fields I wasn't using from the filebeat index template, then updating I was initially surprised to see the 1024 limit come into play, since I was using ~50 fields from filebeat, and allowing ~100 or so of my own to be dynamically indexed. (Sorry if I'm using the wrong terminology, elasticsearch is hard :) ) |
@ruflin Sorry I had missed your ping. Lucene and Elasticsearch are both not designed for the case that many fields exist but don't have any data and would be a lot of work. Improving the situation for this case doesn't feel like the right trade-off given that we're moving to one index per module. I wonder whether improving defaults could make maintenance easier. For instance, float/double/scaled_float fields are usually not useful for |
@jpountz Having one dataset per index will solve the issue. Unfortunately we are not there yet and we have the above issue at the moment with Filebeat and Metricbeat. We already have quite a bit of logic / magic on what fields get added to default_fields and what not. We already exclude all the "number" fields. We once reverted back to only use |
How difficult would it be to enable logic so that templates are only enabled for those Beats modules which have been selected by the user, and does that present the potential for a long-term solution to this issue? |
@MorrieAtElastic Unfortunately not an easy problem. But we tackle exactly that with the new Elastic Package Manager which means, Beats doesn't have to do any setups anymore. |
…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262
* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262 * Exclude new zeek datasets from default_field list
We are currently unable to do wildcard and regex lucene searches in filebeat-* due to this problem. Elastic supprt tells me: "the developers say that is too many fields" But we are using the provided Elastic Filebeat modules and templates.....?? What are you expecting from us? That we index filebeat datasets to non filebeat-* indices? The result is builtin dashboards etc will stop working. See case 00571318 Increasing So just to be clear, is this issue only related to the default fields in the filebeat template? If so, why not remove some from the filebeat template untill a better solution is implementable...? Also, it contains some * fields, which makes the total number of default fields even more inpredictable? |
We are adding many fields with name From
|
This may be introduced by the I think this is a bug in the script that takes the field definitions and generates the default_fields. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue doesn't have a |
In Filebeat we are close to going over 1024 fields in the
default_field
setting in Elasticsearch index template. This issue could affect other Beats too in the future (most likely Metricbeat). This will cause certain queries to the index to fail with an exception like:In Beats when the index template is generated it automatically adds all
text
andkeyword
fields to thedefault_field
list.beats/libbeat/template/processor.go
Line 106 in cbd7749
We need a plan to deal with the growing number of fields in
default_field
. This issue is causing problems for me because I'm adding fields from CEF to the fields.yml.The text was updated successfully, but these errors were encountered: