Migrations: dynamically adjust batchSize when reading #157494

rudolf · 2023-05-12T13:45:29Z

Release notes

Very large saved objects can cause migrations to fail when reading batches of saved objects from Elasticsearch that exceeds the NodeJS max string length. This forces users to use a much smaller migrations.batchSize value which in turn slows down migration performance. With this fix, Kibana will dynamically reduce the batch size when it encounters a batch that's too big to process improving migration performance.

Summary

Migrations read 1000 documents by default which works well for most deployments. But if any batch happens to be > ~512MB we hit NodeJS' max string length limit and cannot process that batch. This forces users to reduce the batch size to a smaller number which could severely slow down migrations.

This PR reduces the impact of large batches by catching elasticsearch-js' RequestAbortedError and reducing the batch size in half. When subsequent batches are successful the batchSize increases by 20%. This means we'll have a sequence like:

Read 1000 docs ✅ (small batch)
Read 1000 docs 🔴 (too large batch)
Read 500 docs ✅
Read 600 docs ✅
Read 720 docs ✅
Read 864 docs ✅
Read 1000 docs ✅ (small batch)

This assumes that most clusters just have a few large batches exceeding the limit. If all batches exceed the limit we'd have 1 failure for every 4 successful reads so we pay a 20% throughput penalty. In such a case it would be better to configure a lower migrations.batchSize.

Tested this manually:

Start ES with more heap than the default, otherwise reading large batches will cause it to run out of memory
ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip

Ingest lots of large documents of ~5mb

   curl -XPUT "elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
   {                           
     "indices": [                                            
       {
         "names": [
           ".kibana*"
         ],
         "privileges": [
           "all"
         ],
         "allow_restricted_indices": true
       }
     ]
   }'

   curl -XPOST "elastic:changeme@localhost:9200/_security/user/superuser" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'    
   {
     "password" : "changeme",  
     "roles" : [ "superuser", "grant_kibana_system_indices" ]
   }'

   curl -XPUT "superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
   {
         "dynamic": false,                                                                       
         "properties": {

         }

   }'

   set -B                  # enable brace expansion
   for i in {1..400}; do
     curl -k --data-binary "@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json" -X PUT "http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:"{$i}"?&pretty=true" -H "Content-Type: application/json"
   done

Start Kibana with a modest batchSize otherwise we could OOM ES node scripts/kibana --dev --migrations.batchSize=120

Example logs. Note the "Processed x documents" only logs when the next batch is successfull read, so the order seems wrong. To improve it we'd need to log progress after a batch is successfully written instead 🤷

``` [.kibana] Processed 120 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms. [.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms. [.kibana] Processed 240 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms. [.kibana] Processed 300 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms. [.kibana] Processed 372 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms. [.kibana] Processed 458 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms. [.kibana] Processed 542 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms. ```

### Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
This was checked for cross-browser compatibility

Risks

For maintainers

This was checked for breaking API changes and was labeled appropriately

…amic-read-batchsize

elasticmachine · 2023-05-16T15:14:03Z

Pinging @elastic/kibana-core (Team:Core)

rudolf · 2023-05-16T15:19:19Z

src/core/server/integration_tests/saved_objects/migrations/group3/dot_kibana_split.test.ts

@@ -60,6 +60,7 @@ describe('split .kibana index into multiple system indices', () => {
    beforeAll(async () => {
      esServer = await startElasticsearch({
        dataArchive: Path.join(__dirname, '..', 'archives', '7.3.0_xpack_sample_saved_objects.zip'),
+        timeout: 60000,


CI was failing a few times on this test due to timeout

rudolf · 2023-05-16T15:21:44Z

src/core/server/integration_tests/saved_objects/migrations/group3/read_batch_size.test.ts

+      },
+    });
+
+    root = createRoot({ maxReadBatchSizeBytes: 50000 });


This test configures a really small maxReadBatchSizeBytes so that we hit the es response size too large error. I tried by actually creating really large documents but this consumes a lot of memory on the ES side which both when reading batches, but also when the update_by_query runs.

rudolf · 2023-05-16T15:23:19Z

@elasticmachine merge upstream

TinaHeiligers · 2023-05-16T17:21:11Z

note to any reviewers: Replace the full path to the data archive with the full path to the archive on your machine when running the branch.
i.e.

ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=<path-to-repo>/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip

TinaHeiligers

I had to reduce the batch size all the way down to < 100 to get around
Error: Unable to complete saved object migrations for the [.kibana] index. RequestAbortedError: The content length (536936024) is bigger than the maximum allowed string (536870888)
Migrations ran fine after that! Being able to specify the migrations batch size is going to be a huge win for getting around the "too-many-saved-objects" issue.

LGTM on CI green.

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/initial_state.ts

gsoldevila · 2023-05-17T09:01:03Z

src/core/server/integration_tests/saved_objects/migrations/group3/read_batch_size.test.ts

+  });
+
+  it.only('reduces the read batchSize in half if a batch exceeds maxReadBatchSizeBytes', async () => {
+    const { startES } = createTestServers({


NIT you can directly use the startElasticsearch() method of the kibana_migrator_test_kit.ts

packages/core/saved-objects/core-saved-objects-base-server-internal/src/saved_objects_config.ts

gsoldevila · 2023-05-17T09:24:38Z

...core/saved-objects/core-saved-objects-migration-server-internal/src/actions/read_with_pit.ts

+          track_total_hits: typeof searchAfter === 'undefined',
+          query,
+        },
+        { maxResponseSize: maxResponseSizeBytes }


TIL elastic-transport-js is handling this limit:
https://github.com/elastic/elastic-transport-js/blob/main/src/connection/HttpConnection.ts#L146,L159

jloleysens

Left a drive-by review, non blocker comments only

src/core/server/integration_tests/saved_objects/migrations/group3/read_batch_size.test.ts

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/model.ts

jloleysens

Left a drive-by review, non blocker comments only

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/helpers.ts

pgayvallet · 2023-05-17T10:54:23Z

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/model.ts

+      if (isTypeof(left, 'es_response_too_large')) {
+        const batchSize = Math.floor(stateP.batchSize / 2);


NIT: probably unnecessary, but we may want to have an escape hatch to avoid potentially entering an infinite loop here? Should we check if stateP.batchSize is higher than 1/2 or something here?

I added better handling for when the response size > maxReadBatchSizeBytes even if the batchSize is 1. Also addresses a similar comment from @jloleysens. Even though we can't create documents that exceeds MAX_STRING_LENGTH users could configure maxReadBatchSizeBytes to a low value to e.g. avoid an OOM so it felt worth explicitly handling for this scenario.

pgayvallet · 2023-05-17T10:56:51Z

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/zdt/next.ts

@@ -71,6 +71,7 @@ export const nextActionMap = (context: MigratorContext) => {
        client,
        index: state.currentIndex,
        mappings: { properties: state.additiveMappingChanges },
+        batchSize: context.batchSize,


Do we want to also plug this logic into the zdt algorithm?

yeah, would definitely be worth adding there. I'll do that in a follow-up PR 👍

…eadBatchSizeBytes

…n success

afharo · 2023-05-25T10:40:58Z

...core/saved-objects/core-saved-objects-migration-server-internal/src/actions/read_with_pit.ts

+      .catch((e) => {
+        if (
+          e instanceof EsErrors.RequestAbortedError &&
+          e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null


nit: .test might be more performant because it doesn't need to extract the groups. I don't have numbers that back this assumption though.

Suggested change

e.message.match(/The content length $\d+$ is bigger than the maximum/) != null

/The content length $\d+$ is bigger than the maximum/.test(e.message)

kibana-ci · 2023-05-30T13:25:07Z

💚 Build Succeeded

Buildkite Build
Commit: eb35b1b

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/core-saved-objects-migration-server-internal`	86	89	+3

Unknown metric groups

API count

id	before	after	diff
`@kbn/core-saved-objects-migration-server-internal`	120	123	+3

ESLint disabled line counts

id	before	after	diff
`enterpriseSearch`	19	21	+2
`securitySolution`	401	405	+4
total			+6

Total ESLint disabled count

id	before	after	diff
`enterpriseSearch`	20	22	+2
`securitySolution`	481	485	+4
total			+6

History

💚 Build #130944 succeeded 0efb67a
💔 Build #130320 failed b81deb1
💔 Build #130045 failed c3c272f
💔 Build #130039 failed 98826d4
💔 Build #128513 failed 532a41d
💔 Build #128272 failed 5202b28

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @rudolf

kibanamachine · 2023-05-30T13:29:42Z

💔 All backports failed

Status	Branch	Result
❌	8.8	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 157494

Questions ?

Please refer to the Backport tool documentation

gsoldevila · 2023-05-30T14:18:00Z

💚 All backports created successfully

Status	Branch	Result
✅	8.8

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

## Summary Migrations read 1000 documents by default which works well for most deployments. But if any batch happens to be > ~512MB we hit NodeJS' max string length limit and cannot process that batch. This forces users to reduce the batch size to a smaller number which could severely slow down migrations. This PR reduces the impact of large batches by catching elasticsearch-js' `RequestAbortedError` and reducing the batch size in half. When subsequent batches are successful the batchSize increases by 20%. This means we'll have a sequence like: 1. Read 1000 docs ✅ (small batch) 2. Read 1000 docs 🔴 (too large batch) 3. Read 500 docs ✅ 4. Read 600 docs ✅ 5. Read 720 docs ✅ 6. Read 864 docs ✅ 7. Read 1000 docs ✅ (small batch) This assumes that most clusters just have a few large batches exceeding the limit. If all batches exceed the limit we'd have 1 failure for every 4 successful reads so we pay a 20% throughput penalty. In such a case it would be better to configure a lower `migrations.batchSize`. Tested this manually: 1. Start ES with more heap than the default, otherwise reading large batches will cause it to run out of memory `ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip` 2. Ingest lots of large documents of ~5mb ``` curl -XPUT "elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "indices": [ { "names": [ ".kibana*" ], "privileges": [ "all" ], "allow_restricted_indices": true } ] }' curl -XPOST "elastic:changeme@localhost:9200/_security/user/superuser" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "password" : "changeme", "roles" : [ "superuser", "grant_kibana_system_indices" ] }' curl -XPUT "superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d' { "dynamic": false, "properties": { } }' set -B # enable brace expansion for i in {1..400}; do curl -k --data-binary "@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json" -X PUT "http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:"{$i}"?&pretty=true" -H "Content-Type: application/json" done ``` 3. Start Kibana with a modest batchSize otherwise we could OOM ES `node scripts/kibana --dev --migrations.batchSize=120` <details><summary>Example logs. Note the "Processed x documents" only logs when the next batch is successfull read, so the order seems wrong. To improve it we'd need to log progress after a batch is successfully written instead 🤷 </summary> ``` [.kibana] Processed 120 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms. [.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms. [.kibana] Processed 240 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms. [.kibana] Processed 300 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms. [.kibana] Processed 372 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms. [.kibana] Processed 458 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms. [.kibana] Processed 542 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms. ``` </details> ### Checklist Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### Risks ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Gerard Soldevila <[email protected]> (cherry picked from commit 094b62a) # Conflicts: # packages/core/saved-objects/core-saved-objects-migration-server-internal/src/zdt/test_helpers/context.ts

…#158660) # Backport This will backport the following commits from `main` to `8.8`: - [Migrations: dynamically adjust batchSize when reading (#157494)](#157494)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  --------- Co-authored-by: Rudolf Meijering <[email protected]>

rudolf added 5 commits March 15, 2023 06:25

Use batchSize config for update_by_query in updateAndPickupMappings

019eb09

Merge branch 'main' into updateAndPickupMappings-batch-size

87bc176

Add batchSize to ZDT

ae9cdc9

Merge branch 'main' into updateAndPickupMappings-batch-size

3e5b0e5

Migrations: dynamically adjust batchSize when reading

e7d328e

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Migrations labels May 12, 2023

rudolf added 8 commits May 12, 2023 16:20

Fixes and improve logging

a570e3a

model.test.ts unit tests

7e761ac

Unit tests

61cdf9c

Merge branch 'main' into updateAndPickupMappings-batch-size

d972600

Merge branch 'updateAndPickupMappings-batch-size' into migrations-dyn…

156619f

…amic-read-batchsize

E2E & integration tests

e3f11bc

Increase dot_kibana_split test timeout to reduce flakiness

701e53b

Fix tests

b71ab24

rudolf marked this pull request as ready for review May 16, 2023 15:14

rudolf requested a review from a team as a code owner May 16, 2023 15:14

Delete unecessary file

9c241b4

rudolf commented May 16, 2023

View reviewed changes

Merge branch 'main' into migrations-dynamic-read-batchsize

5202b28

rudolf added the bug Fixes for quality problems that affect the customer experience label May 16, 2023

TinaHeiligers approved these changes May 16, 2023

View reviewed changes

gsoldevila reviewed May 17, 2023

View reviewed changes

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/initial_state.ts Outdated Show resolved Hide resolved

gsoldevila reviewed May 17, 2023

View reviewed changes

packages/core/saved-objects/core-saved-objects-base-server-internal/src/saved_objects_config.ts Outdated Show resolved Hide resolved

gsoldevila reviewed May 17, 2023

View reviewed changes

Retry when there's circuit breaker exceptions from Elasticsearch

532a41d

jloleysens reviewed May 17, 2023

View reviewed changes

src/core/server/integration_tests/saved_objects/migrations/group3/read_batch_size.test.ts Outdated Show resolved Hide resolved

packages/core/saved-objects/core-saved-objects-migration-server-internal/src/model/model.ts Outdated Show resolved Hide resolved

jloleysens reviewed May 17, 2023

View reviewed changes

pgayvallet reviewed May 17, 2023

View reviewed changes

rudolf self-assigned this May 23, 2023

rudolf added 3 commits May 24, 2023 22:27

Address reviews, better handling when batchSize: 1 still exceeds maxR…

67562ec

…eadBatchSizeBytes

Merge branch 'main' into migrations-dynamic-read-batchsize

98826d4

Review feedback: increase coverage of recovering up to maxBatchSize o…

c3c272f

…n success

gsoldevila mentioned this pull request May 25, 2023

Validate migration downtime improvements #152063

Closed

afharo approved these changes May 25, 2023

View reviewed changes

rudolf added 2 commits May 25, 2023 22:06

Review: why match when you can test

9a41d95

Merge branch 'main' into migrations-dynamic-read-batchsize

b81deb1

rudolf added the release_note:fix label May 26, 2023

Fix outdated integration test

0efb67a

rudolf enabled auto-merge (squash) May 30, 2023 11:35

gsoldevila added backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.9.0 v8.8.1 labels May 30, 2023

Merge branch 'main' into migrations-dynamic-read-batchsize

eb35b1b

rudolf merged commit 094b62a into main May 30, 2023

rudolf deleted the migrations-dynamic-read-batchsize branch May 30, 2023 13:25

gsoldevila mentioned this pull request May 30, 2023

[8.8] Migrations: dynamically adjust batchSize when reading (#157494) #158660

Merged

gsoldevila mentioned this pull request Jun 12, 2023

[Saved Objects Migrations] The default migrations.batchSize might be too large in some scenarios #145753

Closed

nchaulet mentioned this pull request Jun 13, 2023

[Fleet]: Kibana upgrade failed from 8.7.1>8.8.0 BC8 when multiple agent policies with integrations exist. #158361

Closed

rudolf mentioned this pull request Jun 27, 2023

v2 migrations: Dynamically reduce batchSize before response size exceeds MAX_STRING_LENGTH #160626

Draft

9 tasks

rudolf added the Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects label Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrations: dynamically adjust batchSize when reading #157494

Migrations: dynamically adjust batchSize when reading #157494

rudolf commented May 12, 2023 •

edited

Loading

elasticmachine commented May 16, 2023

rudolf May 16, 2023

rudolf May 16, 2023

rudolf commented May 16, 2023

TinaHeiligers commented May 16, 2023

TinaHeiligers left a comment

gsoldevila May 17, 2023

gsoldevila May 17, 2023

jloleysens left a comment

jloleysens left a comment

pgayvallet May 17, 2023

rudolf May 24, 2023

pgayvallet May 17, 2023

rudolf May 25, 2023

afharo May 25, 2023

kibana-ci commented May 30, 2023

API count

ESLint disabled line counts

Total ESLint disabled count

kibanamachine commented May 30, 2023

gsoldevila commented May 30, 2023

		if (isTypeof(left, 'es_response_too_large')) {
		const batchSize = Math.floor(stateP.batchSize / 2);

	e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null
	/The content length \(\d+\) is bigger than the maximum/.test(e.message)

Migrations: dynamically adjust batchSize when reading #157494

Migrations: dynamically adjust batchSize when reading #157494

Conversation

rudolf commented May 12, 2023 • edited Loading

Release notes

Summary

Risks

For maintainers

elasticmachine commented May 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rudolf commented May 16, 2023

TinaHeiligers commented May 16, 2023

TinaHeiligers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jloleysens left a comment

Choose a reason for hiding this comment

jloleysens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibana-ci commented May 30, 2023

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

API count

ESLint disabled line counts

Total ESLint disabled count

History

kibanamachine commented May 30, 2023

💔 All backports failed

Manual backport

Questions ?

gsoldevila commented May 30, 2023

💚 All backports created successfully

Questions ?

rudolf commented May 12, 2023 •

edited

Loading