Support large saved object indices consuming 10s of GBs #147852

rudolf · 2022-12-20T11:50:12Z

While #144035 will reduce the upgrade downtime of clusters with millions of saved objects large indices with GBs of data introduce new challenges. The general Elasticsearch guidance is to ensure that shards are between 10GB to 50GB in size while the saved object indices always use 1 shard only. We currently only have a handful of customers with .kibana indices > 10GB but this is likely to increase.

There are a few options to mitigate this problem:

Config option to specify .kibana primary shards #156306
Have all saved object indices use a high shard count by default.
This consumes unecessary shards for small clusters but improves the scalability for larger clusters.
Re-shard the indices of large clusters.
Because this requires a reindex this will cause downtime. Given that the reason for a re-shard is a large cluster such downtime would be significant.
Use a rollover index with an ILM size policy.
- This would require significant changes to the saved objects repository. Update operations need to first search for the _id of the document being updated to locate the index in which the document resides. The update operation would need to be made directly against the index.
- Would updateByQuery operations continue to work?
- We would need to change the migration algorithm to use a mappings template instead of creating indices with explicit mappings (or perform the "rollover" manually from inside Kibana).
- Deletes might cause unbalanced shards.
Ask Elasticsearch for a zero downtime resharding API
Perform zero downtime reindex during upgrade (powered by Elasticsearch or Kibana side)

elasticmachine · 2022-12-20T11:50:14Z

Pinging @elastic/kibana-core (Team:Core)

pgayvallet · 2022-12-21T09:37:41Z

Have all saved object indices use a high shard count by default.

First, I'd like to understand which reasonable maximum size we would expect to cover 99.9% of our customers usages. If we're talking about 100GB, my gut feeling would be that increasing the sharding count to 2 or 3 by default could be a very acceptable and pragmatic compromise?

Also, do we have any guess on the per-type repartition for large usages of saved objects? Because if it's not just one type taking 90% of the total size, splitting our indices per group of types (as we're currently discussing) would also help here.

Re-shard the indices of large clusters.

What about environments where this may not be acceptable to have downtime? I feel like this single statement makes this option a no-go, wdyt?

Use a rollover index with an ILM size policy

I agree that if nothing else works, it may be something that we would have to look at. The implications are so significant though, at various level of the SOR and migration systems, that we would need a very strong reason to take the option as being worth it ihmo.

jasonrhodes · 2023-01-18T15:15:00Z

which reasonable maximum size we would expect to cover 99.9% of our customers usages

I'm nervous about this approach mainly because it sounds so similar to the "100K saved objects migration takes 10 minutes" capacity guessing problem that led to us wanting to restrict migrations. My question with this kind of thing would always be "what happens if a customer has 10x the upper limit of our expectations"? Is there a simple workaround for that scenario?

rudolf · 2023-01-19T13:38:49Z

I agree with Pierre that (1) would at least buy us some time.

Manually resharding the index would always be a last resort workaround (for 10x or 100x the data size) but the downtime might be a non-negotiable for users.

But 1-3 aren't really good long term options. I've added (4) and (5) which are options we're exploring with the Elasticsearch team.

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Migrations labels Dec 20, 2022

a03nikki mentioned this issue May 1, 2023

Config option to specify .kibana primary shards #156306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support large saved object indices consuming 10s of GBs #147852

Support large saved object indices consuming 10s of GBs #147852

rudolf commented Dec 20, 2022 •

edited

Loading

elasticmachine commented Dec 20, 2022

pgayvallet commented Dec 21, 2022

jasonrhodes commented Jan 18, 2023

rudolf commented Jan 19, 2023

Support large saved object indices consuming 10s of GBs #147852

Support large saved object indices consuming 10s of GBs #147852

Comments

rudolf commented Dec 20, 2022 • edited Loading

elasticmachine commented Dec 20, 2022

pgayvallet commented Dec 21, 2022

jasonrhodes commented Jan 18, 2023

rudolf commented Jan 19, 2023

rudolf commented Dec 20, 2022 •

edited

Loading