Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrationsv2: Dynamically adjust batch size to prevent exceeding ES HTTP payload size #108708

Closed
rudolf opened this issue Aug 16, 2021 · 3 comments
Labels
project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Aug 16, 2021

As described in #107288 migrations can fail when they create batches of saved objects that are larger than the configured ES http.max_content_length (default 100Mb).

Since saved objects vary greatly in size from a few kb to several megabytes it's very difficult to choose an appropriate batch size for a given deployment. Documents like siem-detection-engine-rule-status store debug logs which come from external systems introducing variability even within the same type.

To ensure a smooth upgrade experience without reducing the default batch size, we should dynamically adjust the size of each indexed batch to ensure that the default payload size is never exceeded.

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Aug 16, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor Author

rudolf commented Aug 16, 2021

Thoughts on implementation...

For simplicity we should continue to read batches of migrations.batchSize, theoretically reading a large batch could exceed Kibana's available heap, but this is currently a low risk.

So the dynamic batch size would only be relevant when indexing documents (the bulkOverwriteTransformedDocuments action). I see two alternatives:

  1. Use a library like json-size to calculate the size of each document as we add it to the current batch (elasticsearch uses the uncompressed payload size to enforce it's limit). Once the batch is "full" or has migrations.batchSize documents, index it and build the next batch.
  2. Index using migrations.batchSize until we hit a 413 error and then halve the payload size before trying again.

Unless (1) adds significant CPU overhead and slows down migrations, it feels like the preferred approach.

@rudolf
Copy link
Contributor Author

rudolf commented Aug 16, 2021

Duplicate of #107641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

2 participants