Too many fields in template's default_fields #14262

andrewkroh · 2019-10-28T03:01:12Z

In Filebeat we are close to going over 1024 fields in the default_field setting in Elasticsearch index template. This issue could affect other Beats too in the future (most likely Metricbeat). This will cause certain queries to the index to fail with an exception like:

"caused_by": {
      "type": "illegal_argument_exception",
      "reason": "field expansion matches too many fields, limit: 1024, got: 1293"
}

In Beats when the index template is generated it automatically adds all text and keyword fields to the default_field list.

beats/libbeat/template/processor.go

Line 106 in cbd7749

addToDefaultFields(&field)

We need a plan to deal with the growing number of fields in default_field. This issue is causing problems for me because I'm adding fields from CEF to the fields.yml.

The text was updated successfully, but these errors were encountered:

andrewkroh · 2019-10-28T03:08:21Z

I propose adding a new optional setting to the fields.yml file that will allow specifying that a field, or a group of fields, should not be included in the default_field. This will keep the existing behavior for now (added by default) and allow developers to exclude if needed.

        - name: extensions
          type: group
          include_in_default_field: false
          description: >
            Collection of key-value pairs carried in the CEF extension field.
          fields:
            - name: agentAddress
              type: ip
              description: The IP address of the ArcSight connector that processed the event.

Any suggestions on naming for the param? I'm not too keen on include_in_default_field, but it's what I thought of initially.

@ruflin @tsg Thoughts?

ruflin · 2019-10-28T08:55:14Z

Good you found this, was not aware of the limit. And it seems it can only be changed on query time: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-top-level-params Does this mean already today if someone queries across metricbeat-*,filebeat-* he would hit this limit? If yes, this would probably have wider implications. The good news is, that it looks like with the new package manager we will have 1 index per input, so the number of fields overall will be much lower.

I like the proposed solution as a short term solution but I think we need also a more scalable long term to follow up with.

Naming: What about just using default_field: true?

@skh This is also relavant for you moving the logic to Kibana.

andrewkroh · 2019-10-29T17:23:07Z

Filebeat is actually broken now in master. I thought it was my additions causing the problem but in master we already have 1175 default fields.

+1 on default_field as the name. I have added that logic and it's working for me. I now need to go find the fields that were recently added in master and remove some of them from the default field list to get the number back below 1024.

Longer term if we move to an index per module indexing strategy then this should not be a problem.

ph · 2019-10-29T17:28:20Z

@andrewkroh sound like a good strategy / bandaid. We are on borrowed time here, maybe we should also goes through existing modules in filebeat and see if any fields could be removed from the list.

also +1 on naming default_field

ph · 2019-10-29T17:33:11Z

Just adding a note here so do not lose track, it's possible that 7.x is broken too in #14298

Also, that error is probably testable by just doing a query on Elasticsearch OR check the number of f default fields in the generated template. So we need to add a guard on theses cases.

ruflin · 2019-10-30T08:38:36Z

@ph We should definitively add something like this to our CI. It's kind of bad that we only realised it 151 fields too late.

ph · 2019-10-30T12:17:21Z

@ruflin yes, what concern me is, that error is only raised at query time. Maybe it should also validated when we push the template? Maybe its because users can change that limit on the fly they do not validate it at insert.

ruflin · 2019-10-30T13:56:23Z

@ph I think the argument here is that it is a query time parameter. So an index with 3000 default_fields is totally fine as long as the query param will be adjusted. Also 3000 fields could come from multiple indices with multiple templates. But I share your sentiment, that perhaps there should be a more global option around how many default_fields in a template are allowed and a warning / error in this case.

The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262

* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes #14262 * Exclude new zeek datasets from default_field list

* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262 * Exclude new zeek datasets from default_field list (cherry picked from commit 9f21b96)

andrewkroh · 2019-12-19T16:49:35Z

Re-opening this because we only solved it with a temporary solution that comes with quite a few maintainability problems. The current fix for this was to just not add any new fields to the default_field list. So any new fields must be explicilty marked in the fields.yml with default_field: false.

This also has implications on ECS when we want to update to a new fields.ecs.yml because we have to make sure any new ECS fields are not added to default_field. This is rippling into the elastic/ecs repo as a result. See elastic/ecs#687.

Are there any proposals for a more permanent fix? Will we move to a different indexing strategy where the number of fields is less of an issue?

webmat · 2019-12-20T19:42:03Z

elastic/ecs#687 now automatically sets default_fields: false on any field not part of a whitelist file. It's currently populated only with ECS 1.2 fields, as was the original intention of the PR. But if the Beats team wants to strategically whitelist a few more fields, we only need to add them to that whitelist file.

This workaround in ECS should also be considered a temporary workaround, IMO. Otherwise that would mean any field added from here to the end of the 7.x line wouldn't be added to default_field.

I think one of the non-breaking ways we could address this (but only wrt the ECS fields) is consider culling some ECS fields that made it to default_field that aren't actually useful in some of the Beats. Right now all of ECS is imported in all of the Beats. But for example I would never expect dns.* or tls.* to ever show up in Metricbeat. So I think a careful look at each Beat could lead to a cleanup of many fields -- different per Beat -- that can be removed from the current default_field setting their templates.

Note that there's a broader point that these unrelated ECS field field definitions could also be removed entirely from each Beat's template, but that's another issue.

This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

…709) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

…710) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

ruflin · 2019-12-23T12:52:48Z

@andrewkroh Indexing wise we will switch to have 1 index per dataset so the total number of fields will be heavily reduced (as long as ECS doesn't have too many fields ;-) ). But this is the long term solution. As we keep adding modules on the Beats side at the moment, I think we also need a mid term solution for this. Perhaps we should loop in someone from the Elasticsearch team like @jpountz to get some input?

collinbachi · 2020-01-16T21:34:41Z

I ran into this today, while trying to figure out why one of my fields wasn't queryable.

Eventually I resolved it by removing several hundred fields I wasn't using from the filebeat index template, then updating default_field to *.

I was initially surprised to see the 1024 limit come into play, since I was using ~50 fields from filebeat, and allowing ~100 or so of my own to be dynamically indexed.

(Sorry if I'm using the wrong terminology, elasticsearch is hard :) )

jpountz · 2020-01-20T13:52:21Z

@ruflin Sorry I had missed your ping. Lucene and Elasticsearch are both not designed for the case that many fields exist but don't have any data and would be a lot of work. Improving the situation for this case doesn't feel like the right trade-off given that we're moving to one index per module.

I wonder whether improving defaults could make maintenance easier. For instance, float/double/scaled_float fields are usually not useful for default_fields, and it's also generally not needed to add both a text field and its sub keyword fields to this list, could we try to only add the text field in that case?

ruflin · 2020-01-21T07:32:04Z

@jpountz Having one dataset per index will solve the issue. Unfortunately we are not there yet and we have the above issue at the moment with Filebeat and Metricbeat. We already have quite a bit of logic / magic on what fields get added to default_fields and what not. We already exclude all the "number" fields. We once reverted back to only use text or keyword but then realised ip fields are pretty important. So if someone types 192.168.1.1 he searches across all the ip fields.

MorrieAtElastic · 2020-02-12T15:51:22Z

How difficult would it be to enable logic so that templates are only enabled for those Beats modules which have been selected by the user, and does that present the potential for a long-term solution to this issue?

ruflin · 2020-02-13T08:41:31Z

@MorrieAtElastic Unfortunately not an easy problem. But we tackle exactly that with the new Elastic Package Manager which means, Beats doesn't have to do any setups anymore.

…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

* Add default_field option to fields.yml The number of fields in the Elasticsearch index template's `settings.index.query.default_field` option has grown over time, and is now greater than 1024 in Filebeat (Elastic licensed version). This causes queries to Elasticsearch to fail when a list of fields is not specified because there is a default limit of 1024 in Elasticsearch. This adds a new setting to fields.yml called `default_field` whose value can be true/false (defaults to true). When true the text/keyword fields are added to the `default_field` list (as was the behavior before this change). And when set to false the field is omitted from the default_field list. This adds a test for every beat to check if the default_field list contains more than 1000 fields. The limit is a little less than 1024 because `fields.*` is in the default_field list already and at query time that wildcard will be expanded and count toward the limit. Fixes elastic#14262 * Exclude new zeek datasets from default_field list

willemdh · 2020-08-05T16:10:45Z

We are currently unable to do wildcard and regex lucene searches in filebeat-* due to this problem.

**

**

Elastic supprt tells me: "the developers say that is too many fields"

But we are using the provided Elastic Filebeat modules and templates.....?? What are you expecting from us? That we index filebeat datasets to non filebeat-* indices? The result is builtin dashboards etc will stop working.

See case 00571318

Increasing indices.query.bool.max_clause_count seem like the only decent solution.... But this conflicts with "Higher values can lead to performance degradations and memory issues, especially in clusters with a high load or few resources.", as documented in Search Settings

So just to be clear, is this issue only related to the default fields in the filebeat template? If so, why not remove some from the filebeat template untill a better solution is implementable...? Also, it contains some * fields, which makes the total number of default fields even more inpredictable?

jsoriano · 2020-08-12T16:00:13Z

We are adding many fields with name text to the template's default_fields, this is probably unexpected, if we need a text default field we should probably add it only once:

From filebeat export template:

          "process.args",
          "text",
          "process.executable",
          "process.hash.md5",
          "process.hash.sha1",
          "process.hash.sha256",
          "process.hash.sha512",
          "process.name",
          "text",
          "text",
          "text",
          "text",
          "text",
          "process.thread.name",

webmat · 2020-08-12T16:26:44Z

This may be introduced by the .text multi-fields that have been added in many places.

I think this is a bug in the script that takes the field definitions and generates the default_fields.

botelastic · 2021-07-13T17:16:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic · 2021-07-13T17:16:41Z

This issue doesn't have a Team:<team> label.

andrewkroh added discuss Issue needs further discussion. Filebeat Filebeat labels Oct 28, 2019

andrewkroh mentioned this issue Oct 30, 2019

Add default_field option to fields.yml #14341

Merged

andrewkroh mentioned this issue Oct 30, 2019

[Filebeat] Add dashboards to CEF module #14342

Merged

ph mentioned this issue Oct 31, 2019

Add integration test in test_base to cover the limits for default fields #14352

Closed

andrewkroh closed this as completed in #14341 Oct 31, 2019

andrewkroh mentioned this issue Nov 22, 2019

Cherry-pick #14341 to 7.x: Add default_field option to fields.yml #14710

Closed

urso mentioned this issue Dec 5, 2019

Allow to add custom fields configured via append_fields to be added to default_fields #14949

Closed

andrewkroh reopened this Dec 19, 2019

webmat pushed a commit to elastic/ecs that referenced this issue Dec 23, 2019

Workaround for Beats issue with default_field growing too big (#687)

59ff574

This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

webmat pushed a commit to webmat/ecs that referenced this issue Dec 23, 2019

Workaround for Beats issue with default_field growing too big (elasti…

c655f78

…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

webmat mentioned this issue Dec 23, 2019

Backport #687 to 1.4: Workaround for Beats issue with default_field growing too big elastic/ecs#709

Merged

webmat pushed a commit to webmat/ecs that referenced this issue Dec 23, 2019

Workaround for Beats issue with default_field growing too big (elasti…

a8a5a70

…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

webmat mentioned this issue Dec 23, 2019

Backport #687 to 1.3: Workaround for Beats issue with default_field growing too big elastic/ecs#710

Merged

webmat pushed a commit to elastic/ecs that referenced this issue Dec 23, 2019

Workaround for Beats issue with default_field growing too big (#687) (#…

d0fb13f

…709) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

webmat pushed a commit to elastic/ecs that referenced this issue Dec 23, 2019

Workaround for Beats issue with default_field growing too big (#687) (#…

f1461ec

…710) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

dcode pushed a commit to dcode/ecs that referenced this issue Apr 15, 2020

Workaround for Beats issue with default_field growing too big (elasti…

77c52a8

…c#687) This is so that Beats' default_fields don't go above 1024 field limit. See also elastic/beats#14262

botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jul 13, 2021

botelastic bot closed this as completed Aug 12, 2021

adriansr mentioned this issue Oct 1, 2021

Make default_field: false the default for v8.0 #28215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many fields in template's default_fields #14262

Too many fields in template's default_fields #14262

andrewkroh commented Oct 28, 2019

andrewkroh commented Oct 28, 2019

ruflin commented Oct 28, 2019

andrewkroh commented Oct 29, 2019

ph commented Oct 29, 2019

ph commented Oct 29, 2019

ruflin commented Oct 30, 2019

ph commented Oct 30, 2019 •

edited

Loading

ruflin commented Oct 30, 2019

andrewkroh commented Dec 19, 2019 •

edited

Loading

webmat commented Dec 20, 2019

ruflin commented Dec 23, 2019

collinbachi commented Jan 16, 2020 •

edited

Loading

jpountz commented Jan 20, 2020

ruflin commented Jan 21, 2020

MorrieAtElastic commented Feb 12, 2020

ruflin commented Feb 13, 2020

willemdh commented Aug 5, 2020 •

edited

Loading

jsoriano commented Aug 12, 2020

webmat commented Aug 12, 2020

botelastic bot commented Jul 13, 2021

botelastic bot commented Jul 13, 2021

Too many fields in template's default_fields #14262

Too many fields in template's default_fields #14262

Comments

andrewkroh commented Oct 28, 2019

andrewkroh commented Oct 28, 2019

ruflin commented Oct 28, 2019

andrewkroh commented Oct 29, 2019

ph commented Oct 29, 2019

ph commented Oct 29, 2019

ruflin commented Oct 30, 2019

ph commented Oct 30, 2019 • edited Loading

ruflin commented Oct 30, 2019

andrewkroh commented Dec 19, 2019 • edited Loading

webmat commented Dec 20, 2019

ruflin commented Dec 23, 2019

collinbachi commented Jan 16, 2020 • edited Loading

jpountz commented Jan 20, 2020

ruflin commented Jan 21, 2020

MorrieAtElastic commented Feb 12, 2020

ruflin commented Feb 13, 2020

willemdh commented Aug 5, 2020 • edited Loading

jsoriano commented Aug 12, 2020

webmat commented Aug 12, 2020

botelastic bot commented Jul 13, 2021

botelastic bot commented Jul 13, 2021

ph commented Oct 30, 2019 •

edited

Loading

andrewkroh commented Dec 19, 2019 •

edited

Loading

collinbachi commented Jan 16, 2020 •

edited

Loading

willemdh commented Aug 5, 2020 •

edited

Loading