-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrations: dynamically adjust batchSize when reading #157494
Conversation
…amic-read-batchsize
Pinging @elastic/kibana-core (Team:Core) |
@@ -60,6 +60,7 @@ describe('split .kibana index into multiple system indices', () => { | |||
beforeAll(async () => { | |||
esServer = await startElasticsearch({ | |||
dataArchive: Path.join(__dirname, '..', 'archives', '7.3.0_xpack_sample_saved_objects.zip'), | |||
timeout: 60000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI was failing a few times on this test due to timeout
}, | ||
}); | ||
|
||
root = createRoot({ maxReadBatchSizeBytes: 50000 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test configures a really small maxReadBatchSizeBytes
so that we hit the es response size too large error. I tried by actually creating really large documents but this consumes a lot of memory on the ES side which both when reading batches, but also when the update_by_query runs.
@elasticmachine merge upstream |
note to any reviewers: Replace the full path to the data archive with the full path to the archive on your machine when running the branch.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to reduce the batch size all the way down to < 100 to get around
Error: Unable to complete saved object migrations for the [.kibana] index. RequestAbortedError: The content length (536936024) is bigger than the maximum allowed string (536870888)
Migrations ran fine after that! Being able to specify the migrations batch size is going to be a huge win for getting around the "too-many-saved-objects" issue.
LGTM on CI green.
packages/core/saved-objects/core-saved-objects-migration-server-internal/src/initial_state.ts
Outdated
Show resolved
Hide resolved
}); | ||
|
||
it.only('reduces the read batchSize in half if a batch exceeds maxReadBatchSizeBytes', async () => { | ||
const { startES } = createTestServers({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT you can directly use the startElasticsearch()
method of the kibana_migrator_test_kit.ts
packages/core/saved-objects/core-saved-objects-base-server-internal/src/saved_objects_config.ts
Outdated
Show resolved
Hide resolved
track_total_hits: typeof searchAfter === 'undefined', | ||
query, | ||
}, | ||
{ maxResponseSize: maxResponseSizeBytes } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL elastic-transport-js is handling this limit:
https://github.com/elastic/elastic-transport-js/blob/main/src/connection/HttpConnection.ts#L146,L159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a drive-by review, non blocker comments only
src/core/server/integration_tests/saved_objects/migrations/group3/read_batch_size.test.ts
Outdated
Show resolved
Hide resolved
packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/model.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a drive-by review, non blocker comments only
packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/helpers.ts
Outdated
Show resolved
Hide resolved
if (isTypeof(left, 'es_response_too_large')) { | ||
const batchSize = Math.floor(stateP.batchSize / 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: probably unnecessary, but we may want to have an escape hatch to avoid potentially entering an infinite loop here? Should we check if stateP.batchSize
is higher than 1/2 or something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added better handling for when the response size > maxReadBatchSizeBytes even if the batchSize is 1. Also addresses a similar comment from @jloleysens. Even though we can't create documents that exceeds MAX_STRING_LENGTH
users could configure maxReadBatchSizeBytes
to a low value to e.g. avoid an OOM so it felt worth explicitly handling for this scenario.
@@ -71,6 +71,7 @@ export const nextActionMap = (context: MigratorContext) => { | |||
client, | |||
index: state.currentIndex, | |||
mappings: { properties: state.additiveMappingChanges }, | |||
batchSize: context.batchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to also plug this logic into the zdt algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, would definitely be worth adding there. I'll do that in a follow-up PR 👍
.catch((e) => { | ||
if ( | ||
e instanceof EsErrors.RequestAbortedError && | ||
e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: .test
might be more performant because it doesn't need to extract the groups. I don't have numbers that back this assumption though.
e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null | |
/The content length \(\d+\) is bigger than the maximum/.test(e.message) |
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @rudolf |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
## Summary Migrations read 1000 documents by default which works well for most deployments. But if any batch happens to be > ~512MB we hit NodeJS' max string length limit and cannot process that batch. This forces users to reduce the batch size to a smaller number which could severely slow down migrations. This PR reduces the impact of large batches by catching elasticsearch-js' `RequestAbortedError` and reducing the batch size in half. When subsequent batches are successful the batchSize increases by 20%. This means we'll have a sequence like: 1. Read 1000 docs ✅ (small batch) 2. Read 1000 docs 🔴 (too large batch) 3. Read 500 docs ✅ 4. Read 600 docs ✅ 5. Read 720 docs ✅ 6. Read 864 docs ✅ 7. Read 1000 docs ✅ (small batch) This assumes that most clusters just have a few large batches exceeding the limit. If all batches exceed the limit we'd have 1 failure for every 4 successful reads so we pay a 20% throughput penalty. In such a case it would be better to configure a lower `migrations.batchSize`. Tested this manually: 1. Start ES with more heap than the default, otherwise reading large batches will cause it to run out of memory `ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip` 2. Ingest lots of large documents of ~5mb ``` curl -XPUT "elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "indices": [ { "names": [ ".kibana*" ], "privileges": [ "all" ], "allow_restricted_indices": true } ] }' curl -XPOST "elastic:changeme@localhost:9200/_security/user/superuser" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "password" : "changeme", "roles" : [ "superuser", "grant_kibana_system_indices" ] }' curl -XPUT "superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "dynamic": false, "properties": { } }' set -B # enable brace expansion for i in {1..400}; do curl -k --data-binary "@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json" -X PUT "http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:"{$i}"?&pretty=true" -H "Content-Type: application/json" done ``` 3. Start Kibana with a modest batchSize otherwise we could OOM ES `node scripts/kibana --dev --migrations.batchSize=120` <details><summary>Example logs. Note the "Processed x documents" only logs when the next batch is successfull read, so the order seems wrong. To improve it we'd need to log progress after a batch is successfully written instead 🤷 </summary> ``` [.kibana] Processed 120 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms. [.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms. [.kibana] Processed 240 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms. [.kibana] Processed 300 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms. [.kibana] Processed 372 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms. [.kibana] Processed 458 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms. [.kibana] Processed 542 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms. ``` </details> ### Checklist Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### Risks ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Gerard Soldevila <[email protected]> (cherry picked from commit 094b62a) # Conflicts: # packages/core/saved-objects/core-saved-objects-migration-server-internal/src/zdt/test_helpers/context.ts
…#158660) # Backport This will backport the following commits from `main` to `8.8`: - [Migrations: dynamically adjust batchSize when reading (#157494)](#157494) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Rudolf Meijering","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-30T13:25:07Z","message":"Migrations: dynamically adjust batchSize when reading (#157494)\n\n## Summary\r\n\r\nMigrations read 1000 documents by default which works well for most\r\ndeployments. But if any batch happens to be > ~512MB we hit NodeJS' max\r\nstring length limit and cannot process that batch. This forces users to\r\nreduce the batch size to a smaller number which could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the impact of large batches by catching\r\nelasticsearch-js' `RequestAbortedError` and reducing the batch size in\r\nhalf. When subsequent batches are successful the batchSize increases by\r\n20%. This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅ \r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs ✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most clusters just have a few large batches exceeding\r\nthe limit. If all batches exceed the limit we'd have 1 failure for every\r\n4 successful reads so we pay a 20% throughput penalty. In such a case it\r\nwould be better to configure a lower `migrations.batchSize`.\r\n\r\nTested this manually:\r\n1. Start ES with more heap than the default, otherwise reading large\r\nbatches will cause it to run out of memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2. Ingest lots of large documents of ~5mb\r\n ```\r\ncurl -XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n { \r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n \"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\": true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST \"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\", \"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl -XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\" -H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n }'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400}; do\r\ncurl -k --data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H \"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana with a modest batchSize otherwise we could OOM ES `node\r\nscripts/kibana --dev --migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example logs. Note the \"Processed x documents\" only\r\nlogs when the next batch is successfull read, so the order seems wrong.\r\nTo improve it we'd need to log progress after a batch is successfully\r\nwritten instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","Team:Core","release_note:fix","Feature:Migrations","backport:prev-minor","v8.9.0","v8.8.1"],"number":157494,"url":"https://github.com/elastic/kibana/pull/157494","mergeCommit":{"message":"Migrations: dynamically adjust batchSize when reading (#157494)\n\n## Summary\r\n\r\nMigrations read 1000 documents by default which works well for most\r\ndeployments. But if any batch happens to be > ~512MB we hit NodeJS' max\r\nstring length limit and cannot process that batch. This forces users to\r\nreduce the batch size to a smaller number which could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the impact of large batches by catching\r\nelasticsearch-js' `RequestAbortedError` and reducing the batch size in\r\nhalf. When subsequent batches are successful the batchSize increases by\r\n20%. This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅ \r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs ✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most clusters just have a few large batches exceeding\r\nthe limit. If all batches exceed the limit we'd have 1 failure for every\r\n4 successful reads so we pay a 20% throughput penalty. In such a case it\r\nwould be better to configure a lower `migrations.batchSize`.\r\n\r\nTested this manually:\r\n1. Start ES with more heap than the default, otherwise reading large\r\nbatches will cause it to run out of memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2. Ingest lots of large documents of ~5mb\r\n ```\r\ncurl -XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n { \r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n \"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\": true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST \"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\", \"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl -XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\" -H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n }'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400}; do\r\ncurl -k --data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H \"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana with a modest batchSize otherwise we could OOM ES `node\r\nscripts/kibana --dev --migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example logs. Note the \"Processed x documents\" only\r\nlogs when the next batch is successfull read, so the order seems wrong.\r\nTo improve it we'd need to log progress after a batch is successfully\r\nwritten instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157494","number":157494,"mergeCommit":{"message":"Migrations: dynamically adjust batchSize when reading (#157494)\n\n## Summary\r\n\r\nMigrations read 1000 documents by default which works well for most\r\ndeployments. But if any batch happens to be > ~512MB we hit NodeJS' max\r\nstring length limit and cannot process that batch. This forces users to\r\nreduce the batch size to a smaller number which could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the impact of large batches by catching\r\nelasticsearch-js' `RequestAbortedError` and reducing the batch size in\r\nhalf. When subsequent batches are successful the batchSize increases by\r\n20%. This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅ \r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs ✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most clusters just have a few large batches exceeding\r\nthe limit. If all batches exceed the limit we'd have 1 failure for every\r\n4 successful reads so we pay a 20% throughput penalty. In such a case it\r\nwould be better to configure a lower `migrations.batchSize`.\r\n\r\nTested this manually:\r\n1. Start ES with more heap than the default, otherwise reading large\r\nbatches will cause it to run out of memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2. Ingest lots of large documents of ~5mb\r\n ```\r\ncurl -XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n { \r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n \"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\": true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST \"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H \"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\", \"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl -XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\" -H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n }'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400}; do\r\ncurl -k --data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H \"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana with a modest batchSize otherwise we could OOM ES `node\r\nscripts/kibana --dev --migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example logs. Note the \"Processed x documents\" only\r\nlogs when the next batch is successfull read, so the order seems wrong.\r\nTo improve it we'd need to log progress after a batch is successfully\r\nwritten instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542 documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a"}},{"branch":"8.8","label":"v8.8.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> --------- Co-authored-by: Rudolf Meijering <[email protected]>
Release notes
Very large saved objects can cause migrations to fail when reading batches of saved objects from Elasticsearch that exceeds the NodeJS max string length. This forces users to use a much smaller
migrations.batchSize
value which in turn slows down migration performance. With this fix, Kibana will dynamically reduce the batch size when it encounters a batch that's too big to process improving migration performance.Summary
Migrations read 1000 documents by default which works well for most deployments. But if any batch happens to be > ~512MB we hit NodeJS' max string length limit and cannot process that batch. This forces users to reduce the batch size to a smaller number which could severely slow down migrations.
This PR reduces the impact of large batches by catching elasticsearch-js'
RequestAbortedError
and reducing the batch size in half. When subsequent batches are successful the batchSize increases by 20%. This means we'll have a sequence like:This assumes that most clusters just have a few large batches exceeding the limit. If all batches exceed the limit we'd have 1 failure for every 4 successful reads so we pay a 20% throughput penalty. In such a case it would be better to configure a lower
migrations.batchSize
.Tested this manually:
ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip
node scripts/kibana --dev --migrations.batchSize=120
Example logs. Note the "Processed x documents" only logs when the next batch is successfull read, so the order seems wrong. To improve it we'd need to log progress after a batch is successfully written instead 🤷
``` [.kibana] Processed 120 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms. [.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms. [.kibana] Processed 240 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms. [.kibana] Processed 300 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms. [.kibana] Processed 372 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms. [.kibana] Processed 458 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms. [.kibana] Processed 542 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms. ```Delete any items that are not applicable to this PR.
Risks
For maintainers