Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations: dynamically adjust batchSize when reading #157494

Merged
merged 23 commits into from
May 30, 2023

Conversation

rudolf
Copy link
Contributor

@rudolf rudolf commented May 12, 2023

Release notes

Very large saved objects can cause migrations to fail when reading batches of saved objects from Elasticsearch that exceeds the NodeJS max string length. This forces users to use a much smaller migrations.batchSize value which in turn slows down migration performance. With this fix, Kibana will dynamically reduce the batch size when it encounters a batch that's too big to process improving migration performance.

Summary

Migrations read 1000 documents by default which works well for most deployments. But if any batch happens to be > ~512MB we hit NodeJS' max string length limit and cannot process that batch. This forces users to reduce the batch size to a smaller number which could severely slow down migrations.

This PR reduces the impact of large batches by catching elasticsearch-js' RequestAbortedError and reducing the batch size in half. When subsequent batches are successful the batchSize increases by 20%. This means we'll have a sequence like:

  1. Read 1000 docs ✅ (small batch)
  2. Read 1000 docs 🔴 (too large batch)
  3. Read 500 docs ✅
  4. Read 600 docs ✅
  5. Read 720 docs ✅
  6. Read 864 docs ✅
  7. Read 1000 docs ✅ (small batch)

This assumes that most clusters just have a few large batches exceeding the limit. If all batches exceed the limit we'd have 1 failure for every 4 successful reads so we pay a 20% throughput penalty. In such a case it would be better to configure a lower migrations.batchSize.

Tested this manually:

  1. Start ES with more heap than the default, otherwise reading large batches will cause it to run out of memory
    ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip
  2. Ingest lots of large documents of ~5mb
       curl -XPUT "elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
       {                           
         "indices": [                                            
           {
             "names": [
               ".kibana*"
             ],
             "privileges": [
               "all"
             ],
             "allow_restricted_indices": true
           }
         ]
       }'
    
       curl -XPOST "elastic:changeme@localhost:9200/_security/user/superuser" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'    
       {
         "password" : "changeme",  
         "roles" : [ "superuser", "grant_kibana_system_indices" ]
       }'
    
       curl -XPUT "superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
       {
             "dynamic": false,                                                                       
             "properties": {
    
             }
    
       }'
    
       set -B                  # enable brace expansion
       for i in {1..400}; do
         curl -k --data-binary "@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json" -X PUT "http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:"{$i}"?&pretty=true" -H "Content-Type: application/json"
       done
    
  3. Start Kibana with a modest batchSize otherwise we could OOM ES node scripts/kibana --dev --migrations.batchSize=120
Example logs. Note the "Processed x documents" only logs when the next batch is successfull read, so the order seems wrong. To improve it we'd need to log progress after a batch is successfully written instead 🤷 ``` [.kibana] Processed 120 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms. [.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms. [.kibana] Processed 240 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms. [.kibana] Processed 300 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms. [.kibana] Processed 372 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms. [.kibana] Processed 458 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms. [.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms. [.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms. [.kibana] Processed 542 documents out of 542. [.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms. ```
### Checklist

Delete any items that are not applicable to this PR.

Risks

For maintainers

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Migrations labels May 12, 2023
@rudolf rudolf marked this pull request as ready for review May 16, 2023 15:14
@rudolf rudolf requested a review from a team as a code owner May 16, 2023 15:14
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@@ -60,6 +60,7 @@ describe('split .kibana index into multiple system indices', () => {
beforeAll(async () => {
esServer = await startElasticsearch({
dataArchive: Path.join(__dirname, '..', 'archives', '7.3.0_xpack_sample_saved_objects.zip'),
timeout: 60000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI was failing a few times on this test due to timeout

},
});

root = createRoot({ maxReadBatchSizeBytes: 50000 });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test configures a really small maxReadBatchSizeBytes so that we hit the es response size too large error. I tried by actually creating really large documents but this consumes a lot of memory on the ES side which both when reading batches, but also when the update_by_query runs.

@rudolf
Copy link
Contributor Author

rudolf commented May 16, 2023

@elasticmachine merge upstream

@rudolf rudolf added the bug Fixes for quality problems that affect the customer experience label May 16, 2023
@TinaHeiligers
Copy link
Contributor

note to any reviewers: Replace the full path to the data archive with the full path to the archive on your machine when running the branch.
i.e.

ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot --data-archive=<path-to-repo>/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reduce the batch size all the way down to < 100 to get around
Error: Unable to complete saved object migrations for the [.kibana] index. RequestAbortedError: The content length (536936024) is bigger than the maximum allowed string (536870888)
Migrations ran fine after that! Being able to specify the migrations batch size is going to be a huge win for getting around the "too-many-saved-objects" issue.

LGTM on CI green.

});

it.only('reduces the read batchSize in half if a batch exceeds maxReadBatchSizeBytes', async () => {
const { startES } = createTestServers({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT you can directly use the startElasticsearch() method of the kibana_migrator_test_kit.ts

track_total_hits: typeof searchAfter === 'undefined',
query,
},
{ maxResponseSize: maxResponseSizeBytes }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a drive-by review, non blocker comments only

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a drive-by review, non blocker comments only

Comment on lines 1205 to 1206
if (isTypeof(left, 'es_response_too_large')) {
const batchSize = Math.floor(stateP.batchSize / 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: probably unnecessary, but we may want to have an escape hatch to avoid potentially entering an infinite loop here? Should we check if stateP.batchSize is higher than 1/2 or something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added better handling for when the response size > maxReadBatchSizeBytes even if the batchSize is 1. Also addresses a similar comment from @jloleysens. Even though we can't create documents that exceeds MAX_STRING_LENGTH users could configure maxReadBatchSizeBytes to a low value to e.g. avoid an OOM so it felt worth explicitly handling for this scenario.

@@ -71,6 +71,7 @@ export const nextActionMap = (context: MigratorContext) => {
client,
index: state.currentIndex,
mappings: { properties: state.additiveMappingChanges },
batchSize: context.batchSize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also plug this logic into the zdt algorithm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, would definitely be worth adding there. I'll do that in a follow-up PR 👍

@rudolf rudolf self-assigned this May 23, 2023
.catch((e) => {
if (
e instanceof EsErrors.RequestAbortedError &&
e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .test might be more performant because it doesn't need to extract the groups. I don't have numbers that back this assumption though.

Suggested change
e.message.match(/The content length \(\d+\) is bigger than the maximum/) != null
/The content length \(\d+\) is bigger than the maximum/.test(e.message)

@rudolf rudolf enabled auto-merge (squash) May 30, 2023 11:35
@gsoldevila gsoldevila added backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.9.0 v8.8.1 labels May 30, 2023
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/core-saved-objects-migration-server-internal 86 89 +3
Unknown metric groups

API count

id before after diff
@kbn/core-saved-objects-migration-server-internal 120 123 +3

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 401 405 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 481 485 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @rudolf

@rudolf rudolf merged commit 094b62a into main May 30, 2023
@rudolf rudolf deleted the migrations-dynamic-read-batchsize branch May 30, 2023 13:25
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.8 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 157494

Questions ?

Please refer to the Backport tool documentation

@gsoldevila
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.8

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

gsoldevila pushed a commit to gsoldevila/kibana that referenced this pull request May 30, 2023
## Summary

Migrations read 1000 documents by default which works well for most
deployments. But if any batch happens to be > ~512MB we hit NodeJS' max
string length limit and cannot process that batch. This forces users to
reduce the batch size to a smaller number which could severely slow down
migrations.

This PR reduces the impact of large batches by catching
elasticsearch-js' `RequestAbortedError` and reducing the batch size in
half. When subsequent batches are successful the batchSize increases by
20%. This means we'll have a sequence like:

1. Read 1000 docs ✅ (small batch)
2. Read 1000 docs 🔴 (too large batch)
3. Read 500 docs ✅
4. Read 600 docs ✅
5. Read 720 docs ✅
6. Read 864 docs ✅
7. Read 1000 docs ✅ (small batch)

This assumes that most clusters just have a few large batches exceeding
the limit. If all batches exceed the limit we'd have 1 failure for every
4 successful reads so we pay a 20% throughput penalty. In such a case it
would be better to configure a lower `migrations.batchSize`.

Tested this manually:
1. Start ES with more heap than the default, otherwise reading large
batches will cause it to run out of memory
`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es snapshot
--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`
2. Ingest lots of large documents of ~5mb
   ```
curl -XPUT
"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices"
-H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
      {
        "indices": [
          {
            "names": [
              ".kibana*"
            ],
            "privileges": [
              "all"
            ],
            "allow_restricted_indices": true
          }
        ]
      }'

curl -XPOST "elastic:changeme@localhost:9200/_security/user/superuser"
-H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
      {
        "password" : "changeme",
        "roles" : [ "superuser", "grant_kibana_system_indices" ]
      }'

curl -XPUT
"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings" -H
"kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
      {
"dynamic": false,
            "properties": {

            }

      }'

      set -B                  # enable brace expansion
      for i in {1..400}; do
curl -k --data-binary
"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json"
-X PUT
"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:"{$i}"?&pretty=true"
-H "Content-Type: application/json"
      done
   ```
3. Start Kibana with a modest batchSize otherwise we could OOM ES `node
scripts/kibana --dev --migrations.batchSize=120`

<details><summary>Example logs. Note the "Processed x documents" only
logs when the next batch is successfull read, so the order seems wrong.
To improve it we'd need to log progress after a batch is successfully
written instead 🤷 </summary>
```
[.kibana] Processed 120 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1740ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1402ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.
[.kibana] Read a batch that exceeded the NodeJS maximum string length, retrying by reducing the batch size in half to 60.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.
[.kibana] Processed 240 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1042ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1388ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.
[.kibana] Processed 300 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1262ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1363ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 572ms.
[.kibana] Processed 372 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3330ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1349ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1380ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 139ms.
[.kibana] Processed 458 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3278ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1370ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1384ms.
[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.
[.kibana] Processed 542 documents out of 542.
[.kibana] REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.
```
</details>
### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

### Risks

### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Gerard Soldevila <[email protected]>
(cherry picked from commit 094b62a)

# Conflicts:
#	packages/core/saved-objects/core-saved-objects-migration-server-internal/src/zdt/test_helpers/context.ts
gsoldevila added a commit that referenced this pull request May 31, 2023
…#158660)

# Backport

This will backport the following commits from `main` to `8.8`:
- [Migrations: dynamically adjust batchSize when reading
(#157494)](#157494)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Rudolf
Meijering","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-30T13:25:07Z","message":"Migrations:
dynamically adjust batchSize when reading (#157494)\n\n##
Summary\r\n\r\nMigrations read 1000 documents by default which works
well for most\r\ndeployments. But if any batch happens to be > ~512MB we
hit NodeJS' max\r\nstring length limit and cannot process that batch.
This forces users to\r\nreduce the batch size to a smaller number which
could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the
impact of large batches by catching\r\nelasticsearch-js'
`RequestAbortedError` and reducing the batch size in\r\nhalf. When
subsequent batches are successful the batchSize increases by\r\n20%.
This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small
batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅
\r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs
✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most
clusters just have a few large batches exceeding\r\nthe limit. If all
batches exceed the limit we'd have 1 failure for every\r\n4 successful
reads so we pay a 20% throughput penalty. In such a case it\r\nwould be
better to configure a lower `migrations.batchSize`.\r\n\r\nTested this
manually:\r\n1. Start ES with more heap than the default, otherwise
reading large\r\nbatches will cause it to run out of
memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es
snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2.
Ingest lots of large documents of ~5mb\r\n ```\r\ncurl
-XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {
\r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n
\"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\":
true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST
\"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n
{\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\",
\"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl
-XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\"
-H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\"
-d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n
}'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400};
do\r\ncurl -k
--data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X
PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H
\"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana
with a modest batchSize otherwise we could OOM ES
`node\r\nscripts/kibana --dev
--migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example
logs. Note the \"Processed x documents\" only\r\nlogs when the next
batch is successfull read, so the order seems wrong.\r\nTo improve it
we'd need to log progress after a batch is successfully\r\nwritten
instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of
542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that
exceeded the NodeJS maximum string length, retrying by reducing the
batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>\r\nCo-authored-by:
Gerard Soldevila
<[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","Team:Core","release_note:fix","Feature:Migrations","backport:prev-minor","v8.9.0","v8.8.1"],"number":157494,"url":"https://github.com/elastic/kibana/pull/157494","mergeCommit":{"message":"Migrations:
dynamically adjust batchSize when reading (#157494)\n\n##
Summary\r\n\r\nMigrations read 1000 documents by default which works
well for most\r\ndeployments. But if any batch happens to be > ~512MB we
hit NodeJS' max\r\nstring length limit and cannot process that batch.
This forces users to\r\nreduce the batch size to a smaller number which
could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the
impact of large batches by catching\r\nelasticsearch-js'
`RequestAbortedError` and reducing the batch size in\r\nhalf. When
subsequent batches are successful the batchSize increases by\r\n20%.
This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small
batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅
\r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs
✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most
clusters just have a few large batches exceeding\r\nthe limit. If all
batches exceed the limit we'd have 1 failure for every\r\n4 successful
reads so we pay a 20% throughput penalty. In such a case it\r\nwould be
better to configure a lower `migrations.batchSize`.\r\n\r\nTested this
manually:\r\n1. Start ES with more heap than the default, otherwise
reading large\r\nbatches will cause it to run out of
memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es
snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2.
Ingest lots of large documents of ~5mb\r\n ```\r\ncurl
-XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {
\r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n
\"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\":
true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST
\"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n
{\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\",
\"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl
-XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\"
-H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\"
-d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n
}'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400};
do\r\ncurl -k
--data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X
PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H
\"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana
with a modest batchSize otherwise we could OOM ES
`node\r\nscripts/kibana --dev
--migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example
logs. Note the \"Processed x documents\" only\r\nlogs when the next
batch is successfull read, so the order seems wrong.\r\nTo improve it
we'd need to log progress after a batch is successfully\r\nwritten
instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of
542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that
exceeded the NodeJS maximum string length, retrying by reducing the
batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>\r\nCo-authored-by:
Gerard Soldevila
<[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157494","number":157494,"mergeCommit":{"message":"Migrations:
dynamically adjust batchSize when reading (#157494)\n\n##
Summary\r\n\r\nMigrations read 1000 documents by default which works
well for most\r\ndeployments. But if any batch happens to be > ~512MB we
hit NodeJS' max\r\nstring length limit and cannot process that batch.
This forces users to\r\nreduce the batch size to a smaller number which
could severely slow down\r\nmigrations.\r\n\r\nThis PR reduces the
impact of large batches by catching\r\nelasticsearch-js'
`RequestAbortedError` and reducing the batch size in\r\nhalf. When
subsequent batches are successful the batchSize increases by\r\n20%.
This means we'll have a sequence like:\r\n\r\n1. Read 1000 docs ✅ (small
batch)\r\n2. Read 1000 docs 🔴 (too large batch)\r\n3. Read 500 docs ✅
\r\n4. Read 600 docs ✅ \r\n5. Read 720 docs ✅\r\n6. Read 864 docs
✅\r\n7. Read 1000 docs ✅ (small batch)\r\n\r\nThis assumes that most
clusters just have a few large batches exceeding\r\nthe limit. If all
batches exceed the limit we'd have 1 failure for every\r\n4 successful
reads so we pay a 20% throughput penalty. In such a case it\r\nwould be
better to configure a lower `migrations.batchSize`.\r\n\r\nTested this
manually:\r\n1. Start ES with more heap than the default, otherwise
reading large\r\nbatches will cause it to run out of
memory\r\n`ES_JAVA_OPTS=' -Xms6g -Xmx6g' yarn es
snapshot\r\n--data-archive=/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/archives/8.4.0_with_sample_data_logs.zip`\r\n2.
Ingest lots of large documents of ~5mb\r\n ```\r\ncurl
-XPUT\r\n\"elastic:changeme@localhost:9200/_security/role/grant_kibana_system_indices\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n {
\r\n \"indices\": [ \r\n {\r\n \"names\": [\r\n \".kibana*\"\r\n ],\r\n
\"privileges\": [\r\n \"all\"\r\n ],\r\n \"allow_restricted_indices\":
true\r\n }\r\n ]\r\n }'\r\n\r\ncurl -XPOST
\"elastic:changeme@localhost:9200/_security/user/superuser\"\r\n-H
\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\" -d'\r\n
{\r\n \"password\" : \"changeme\", \r\n \"roles\" : [ \"superuser\",
\"grant_kibana_system_indices\" ]\r\n }'\r\n\r\ncurl
-XPUT\r\n\"superuser:changeme@localhost:9200/.kibana_8.4.0_001/_mappings\"
-H\r\n\"kbn-xsrf: reporting\" -H \"Content-Type: application/json\"
-d'\r\n {\r\n\"dynamic\": false,\r\n \"properties\": {\r\n\r\n }\r\n\r\n
}'\r\n\r\n set -B # enable brace expansion\r\n for i in {1..400};
do\r\ncurl -k
--data-binary\r\n\"@/Users/rudolf/dev/kibana/src/core/server/integration_tests/saved_objects/migrations/group3/body.json\"\r\n-X
PUT\r\n\"http://superuser:changeme@localhost:9200/.kibana_8.4.0_001/_doc/cases-comments:\"{$i}\"?&pretty=true\"\r\n-H
\"Content-Type: application/json\"\r\n done\r\n ```\r\n3. Start Kibana
with a modest batchSize otherwise we could OOM ES
`node\r\nscripts/kibana --dev
--migrations.batchSize=120`\r\n\r\n\r\n\r\n<details><summary>Example
logs. Note the \"Processed x documents\" only\r\nlogs when the next
batch is successfull read, so the order seems wrong.\r\nTo improve it
we'd need to log progress after a batch is successfully\r\nwritten
instead 🤷 </summary>\r\n```\r\n[.kibana] Processed 120 documents out of
542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 3667ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1740ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1376ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1402ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1311ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 900ms.\r\n[.kibana] Read a batch that
exceeded the NodeJS maximum string length, retrying by reducing the
batch size in half to 60.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1538ms.\r\n[.kibana] Processed 240
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2054ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1042ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1388ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1130ms.\r\n[.kibana] Processed 300
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_TRANSFORM. took: 2610ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_TRANSFORM -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1262ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1299ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1363ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1341ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
572ms.\r\n[.kibana] Processed 372 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3330ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1488ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1349ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1312ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1380ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1310ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_READ. took:
139ms.\r\n[.kibana] Processed 458 documents out of 542.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_READ -> REINDEX_SOURCE_TO_TEMP_TRANSFORM. took:
3278ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_TRANSFORM ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1460ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1370ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_INDEX_BULK. took: 1303ms.\r\n[.kibana]
REINDEX_SOURCE_TO_TEMP_INDEX_BULK -> REINDEX_SOURCE_TO_TEMP_INDEX_BULK.
took: 1384ms.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_INDEX_BULK ->
REINDEX_SOURCE_TO_TEMP_READ. took: 1298ms.\r\n[.kibana] Processed 542
documents out of 542.\r\n[.kibana] REINDEX_SOURCE_TO_TEMP_READ ->
REINDEX_SOURCE_TO_TEMP_CLOSE_PIT. took: 4ms.\r\n```\r\n</details>\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risks\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>\r\nCo-authored-by:
Gerard Soldevila
<[email protected]>","sha":"094b62a6d6afd30914584e03bb6616e7c2eaec4a"}},{"branch":"8.8","label":"v8.8.1","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

---------

Co-authored-by: Rudolf Meijering <[email protected]>
@rudolf rudolf added the Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects label Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects Feature:Migrations release_note:fix Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.8.1 v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants