Don't index so many saved object fields #43673

timroes · 2019-08-21T13:48:29Z

Update 29 June 2020

With 7.9 currently having ~960 fields we're fast approaching the 1000 field default limit. Please audit your plugins mappings and remove any unnecessary fields. Link from your PR back to this issue and mark your plugin's task as complete once the PR has been merged.

Removing fields

Setting index:false and doc_values:false removes some of the overhead of a field, but doesn't reduce the field count. To reduce the field count fields need to be removed from the mappings completely. This can be done by specifying dynamic: false on any level of your mappings.

For example, the following diff will remove three fields from the field count. The removed fields can still be stored in the Saved Object type but searching and aggregation is only possible on the timestamp field. Note: this change also removes any validation on Elasticsearch, which will allow saved objects with unknown attributes to be saved. Because of this we recommend by starting only with low-risk saved object types like telemetry data.

--- a/src/plugins/kibana_usage_collection/server/collectors/application_usage/saved_objects_types.ts
+++ b/src/plugins/kibana_usage_collection/server/collectors/application_usage/saved_objects_types.ts
@@ -47,11 +47,9 @@ export function registerMappings(registerType: SavedObjectsServiceSetup['registe
     hidden: false,
     namespaceType: 'agnostic',
     mappings: {
+      dynamic: false,
       properties: {
         timestamp: { type: 'date' },
-        appId: { type: 'keyword' },
-        minutesOnScreen: { type: 'float' },
-        numberOfClicks: { type: 'long' },
       },
     },
   });

You can use the following command to count the amount of fields to do a before/after comparison (requires brew install jq):

curl -X GET "elastic:changeme@localhost:9200/.kibana/_field_caps?fields=*&pretty=true" |  jq '.fields|length'

Plugins:

Original issue

Looking at the current mapping for a lot of our saved objects we're indexing a terrible amount of unnecessary fields, i.e. fields we know we'll never want to search through or filter over. Indexing those will just waste some more heap in Elasticsearch, if the field is unnecessary analyzed waste a couple of milliseconds on every insert and thus every migration. We even use a lot of text fields in places where we store stringified JSON which doesn't make any sense, since the analyzer won't end up with anything meaningful here.

This is not a huge problem, since the .kibana index is rather small usually, and also a lot of those JSON fields might be over the default ignore_above value of 256 and thus not indexed in most documents. Despite not being a huge problem I discussed this with @joshdover @tylersmalley and @rudolf and we agreed, that we should not waste Heap and indexing performance on fields we know we'll never need indexed.

As the field count on .kibana is approaching the default limit of 1000 fields we need to urgently evaluate whether or not all fields are really necessary for performing queries or filters.

Mapping recommendations

Here are a couple of general recommendations for how the mappings of a saved object should look:

type=text only for full text search on real text

A field with type text in the mapping will be analyzed and indexed. This makes sense only for fields we know we want to do full text search on, e.g. the title or description of a field. If you don't need the field value analyzed for full text search, don't index the field (see below) or use keyword with an appropriate ignore_above as a type instead. Good examples for a proper keyword field would be the visType or language of a query.

Don't index if not needed

Especially with keyword fields, we very often index a field without thinking about it (because it's the default option). If we know we'll never need to aggregate over that field or query for that field, but just have it available when retrieving the saved object, set index: false and doc_values: false (unless it's a text or annotated_text field) in the mapping for that field.

A couple of examples where it might make sense to have a (keyword) field indexed:

visType: we might want to filter on that later and thus need to be able to query by that field
language (of a query): even though we might never want to expose that in the UI, we might want to aggregate that field for telemetry data

A couple of examples where indexing doesn't make much sense:

expression (the "canvas" expression of a visualization): It doesn't make any sense filtering on the complex expression as a whole, neither aggregate over it. If we would want to build telemetry, we would anyway need to look at each document individually and e.g. parse it and count the containing functions.

JSON fields

We have a couple of places where we use a keyword field (often even indexed) to store some JSON object, like the configuration of a visualization, or the state of a dashboard. As a first step, these fields should be set to index: false.

As a further optimization this data can be saved as a field of type object with enabled: false. That way the content of that field will simply be ignored by Elasticsearch, it won't be indexed or analyzed, but still returned as it was indexed (as JSON) in the saved object. This removes an unnecessary JSON.stringify and JSON.parse when saving/loading those objects. Note: this will require writing a migration function for your saved object and changing any consuming code, so this is not an immediate need, but rather something to work towards for 8.0.

Consider using `type: 'flattened'` (licence basic) if you need to search over many fields or an unknown amount of fields

Flattened types uses a single field for the entire object. It comes with some limitations but in many instances can significantly reduce the field count while still being able to search/aggregate over the fields inside the object.

Keep in mind, that using the flattened field type, will still index all data within this field. If you just need one specific sub-field aggregated/searchable, but the rest not, the above described dynamic: false approach (where the parent key is dynamic: false and just that one sub-field you need search/aggregation on would have an (indexed) typing) would be more preferable. Usage of flattened is mostly preferred, if you potentially need to search/aggregate through a larger amount of sub-fields.

What happens after I changed my plugins mappings?

If you switch a field from an indexed to a not-indexed state (e.g. with enabled: false or index: false), the migration system will automatically update the mappings when Kibana is upgraded, no further action is required. If your plugin has recently removed or renamed an entire Saved Object type, these old mappings might not have been cleaned up. Please reach out to @elastic/kibana-platform if you think this might be the case.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-08-21T13:48:31Z

Pinging @elastic/kibana-platform

ruflin · 2019-08-26T06:53:39Z

If we switch from keyword to object (which I hope) it would be great if we could introduce this as optional in the 7.x releases. The saved objects could contain a field decoded: true to indicate in the 7.x cycle if KB has to encode it or not. It would be off by default. In 8.0 the default would then be true. This would help with the migration, allowing the assets of the last minors of the 7.x releases assuming they have decoded: true inside, also to be compatible with 8.x.

In the packages, the saved objects are stored with decoded JSON fields. The reason is that this makes versioning of it much easier and the diffs are simpler. But inside Kibana some of the fields are stored as encoded json strings (this might change in the future elastic/kibana#43673). To not require special logic on the Package Manager to encode the strings, this is done directly during packaging. One thing not too nice about this PR is that it includes now a dependency on `common.MapStr` from Beats. Reason is that it makes the code much simpler. Part of elastic#42

In the packages, the saved objects are stored with decoded JSON fields. The reason is that this makes versioning of it much easier and the diffs are simpler. But inside Kibana some of the fields are stored as encoded json strings (this might change in the future elastic/kibana#43673). To not require special logic on the Package Manager to encode the strings, this is done directly during packaging. One thing not too nice about this PR is that it includes now a dependency on `common.MapStr` from Beats. Reason is that it makes the code much simpler. Part of #42

ruflin · 2020-04-22T08:35:46Z

@elastic/kibana-platform Any update on this? We are using decoded JSON in all our packages (elastic/package-registry#354) at the moment for versioning purposes but then encode them again during the packaging. It would be nice to align here if possible.

rudolf · 2020-04-23T09:02:01Z

@ruflin

This would help with the migration, allowing the assets of the last minors of the 7.x releases assuming they have decoded: true inside, also to be compatible with 8.x.

Kibana migrations already take care of backwards compatibility. So in 8.0 the maps team could write a migration to store layerlistJSON as a decoded object. If a user imports a 7.last map or a 7.last map is sent to the saved objects HTTP API, these objects will automatically be migrated (decoded) before being stored.

This doesn't solve for the unnecessary encoding/decoding you currently have in package-registry (at least not until 8.0), but I assume the pain isn't bad enough to justify adding a JSON encoding/decoding option to saved objects just to remove this serialization?

ruflin · 2020-04-23T09:04:46Z

My question is more about long term alignment. Is the plan in 8.0 to have all SO content decoded or will it stay as is today?

rudolf · 2020-04-23T09:32:21Z

Yes, all JSON strings should be stored decoded as objects in 8.x. This will ultimately be up to different teams to implement and I'm not sure if we could enforce this for 8.0 but the effort should be minimal so I don't see a reason teams wouldn't be able to comply.

ruflin · 2020-04-23T10:26:42Z

This is great! I'm wondering if the "owners" of each saved object know about this? If not, perhaps worth sending out a note or ping them here?

rudolf · 2020-04-24T12:17:16Z

Yes I agree, it's definitely worth coordinating this with all the teams. However, since 8.x is still a long way off I think teams would benefit by not having 7.x and master branches diverge until we're closer to 8.x. I will own giving teams an early heads up during 8.0-alpha1

TinaHeiligers · 2020-06-29T20:33:36Z

@timroes, @rudolf The telemetry saved object has all of 8 fields and we use all of them to determine a few things:

should usage data be reported?
can the current user change the opt in status?
is the current cluster owner aware of sending usage data to Elastic?
when was the last data reported? (we only send data at certain intervals to avoid network overload)
was the data sent successfully? If not, how many times did Kibana fail to report the data and what version was that?
If we can't make any changes (hence won't have a PR to link to), how should we proceed with acknowledging we've done what we can to help out?

rudolf · 2020-06-29T21:00:10Z

@TinaHeiligers if you've audited your plugin and no fields can be removed you can just tick off the task in the issue, thanks!

afharo · 2020-06-30T12:26:26Z

There are many plugins running taskManager tasks to calculate the telemetry object and storing them as savedObjects. So when the fetcher kicks in, they'll return the savedObject content. Maybe it's a good idea to use the type object with enabled: false approach in these scenarios.

timroes · 2020-07-15T09:51:22Z

I made the flattened section a bit more detailed to explain, when the dynamic: false approach is more preferable, since I think there are not that much cases where flattened would actually be the preferred way.

thomasneirynck · 2023-10-02T19:03:13Z

striked out a few plugins that are deprecated.

rudolf · 2023-10-03T09:31:08Z

Closing this as the enhancements we've made to scale saved object migrations #144035 and serverless zdt migrations prevent us from removing existing fields. To mitigate the field growth we have split the .kibana saved objects into several smaller indices.

alexfrancoeur · 2023-10-04T03:47:38Z

gmmorris · 2023-10-04T08:38:40Z

timroes added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects labels Aug 21, 2019

ruflin mentioned this issue Nov 20, 2019

Encode the Kibana saved objects during packaging elastic/package-registry#157

Merged

ruflin mentioned this issue Apr 22, 2020

import-beats: Decode fields layerListJSON, mapStateJSON elastic/package-registry#354

Merged

rudolf added the v8.0.0 label Apr 23, 2020

rudolf added the blocker label Apr 24, 2020

rudolf self-assigned this Apr 24, 2020

rudolf mentioned this issue May 20, 2020

Saved Object Migrations should clean up unused mappings #67086

Closed

rudolf added the v7.9.0 label Jun 29, 2020

cjcenizal self-assigned this Jun 29, 2020

cjcenizal mentioned this issue Jun 29, 2020

Remove fields from UA mappings which don't need to be searchable #64547

Closed

cjcenizal added the release_note:skip Skip the PR/issue when compiling release notes label Jun 29, 2020

flash1293 mentioned this issue Jun 30, 2020

Disable indexing for graph and timelion fields #70291

Closed

1 task

timroes self-assigned this Jun 30, 2020

joshdover mentioned this issue Jun 30, 2020

New Enterprise Search Kibana plugin #66922

Merged

7 tasks

rudolf mentioned this issue Jul 13, 2020

[Search] Add telemetry for data plugin search service #70677

Merged

1 task

rudolf removed the blocker label Jul 14, 2020

andrewvc mentioned this issue Jul 22, 2020

[Uptime] Stop indexing saved object fields #72782

Closed

jasonrhodes mentioned this issue Jul 30, 2020

[Logs App] [Metrics App] Remove unnecessarily mapped fields from saved object types #73848

Closed

mattkime mentioned this issue Aug 12, 2020

Reduce number of indexed fields in index pattern saved object #74817

Merged

timroes mentioned this issue Sep 8, 2021

Inspect view in saved objects management should show read-only JSON #59588

Closed

rudolf mentioned this issue Nov 4, 2021

Visualization collectors iterate over all visualizations 4 times #117367

Closed

cnasikas mentioned this issue Nov 16, 2021

[Cases] User actions enhancements #115730

Closed

5 tasks

timroes removed their assignment Dec 18, 2021

This was referenced Aug 12, 2022

Reduce field count for SIEM saved object types #138726

Open

Reduce field count for cases saved object types #138727

Closed

cjcenizal removed their assignment Sep 7, 2022

rudolf closed this as completed Oct 3, 2023

github-project-automation bot added this to kibana-core [DEPRECATED] Aug 28, 2024

github-project-automation bot moved this to Done (7.13) in kibana-core [DEPRECATED] Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't index so many saved object fields #43673

Don't index so many saved object fields #43673

timroes commented Aug 21, 2019 •

edited by thomasneirynck

Loading

elasticmachine commented Aug 21, 2019

ruflin commented Aug 26, 2019

ruflin commented Apr 22, 2020

rudolf commented Apr 23, 2020 •

edited

Loading

ruflin commented Apr 23, 2020

rudolf commented Apr 23, 2020

ruflin commented Apr 23, 2020

rudolf commented Apr 24, 2020

TinaHeiligers commented Jun 29, 2020 •

edited

Loading

rudolf commented Jun 29, 2020

afharo commented Jun 30, 2020

timroes commented Jul 15, 2020

thomasneirynck commented Oct 2, 2023 •

edited

Loading

rudolf commented Oct 3, 2023

alexfrancoeur commented Oct 4, 2023

gmmorris commented Oct 4, 2023

Don't index so many saved object fields #43673

Don't index so many saved object fields #43673

Comments

timroes commented Aug 21, 2019 • edited by thomasneirynck Loading

Update 29 June 2020

Removing fields

Original issue

Mapping recommendations

type=text only for full text search on real text

Don't index if not needed

JSON fields

Consider using type: 'flattened' (licence basic) if you need to search over many fields or an unknown amount of fields

What happens after I changed my plugins mappings?

elasticmachine commented Aug 21, 2019

ruflin commented Aug 26, 2019

ruflin commented Apr 22, 2020

rudolf commented Apr 23, 2020 • edited Loading

ruflin commented Apr 23, 2020

rudolf commented Apr 23, 2020

ruflin commented Apr 23, 2020

rudolf commented Apr 24, 2020

TinaHeiligers commented Jun 29, 2020 • edited Loading

rudolf commented Jun 29, 2020

afharo commented Jun 30, 2020

timroes commented Jul 15, 2020

thomasneirynck commented Oct 2, 2023 • edited Loading

rudolf commented Oct 3, 2023

alexfrancoeur commented Oct 4, 2023

gmmorris commented Oct 4, 2023

timroes commented Aug 21, 2019 •

edited by thomasneirynck

Loading

Consider using `type: 'flattened'` (licence basic) if you need to search over many fields or an unknown amount of fields

rudolf commented Apr 23, 2020 •

edited

Loading

TinaHeiligers commented Jun 29, 2020 •

edited

Loading

thomasneirynck commented Oct 2, 2023 •

edited

Loading