From 2397e3139d2a318d31c49b1af01db2142889df4b Mon Sep 17 00:00:00 2001 From: Elastic Machine Date: Fri, 17 Jan 2025 16:45:14 +0000 Subject: [PATCH] Update specification output --- output/openapi/elasticsearch-openapi.json | 12 ++++++------ output/openapi/elasticsearch-serverless-openapi.json | 12 ++++++------ output/schema/schema-serverless.json | 12 ++++++------ output/schema/schema.json | 12 ++++++------ 4 files changed, 24 insertions(+), 24 deletions(-) diff --git a/output/openapi/elasticsearch-openapi.json b/output/openapi/elasticsearch-openapi.json index 204fc13559..dc0a00296e 100644 --- a/output/openapi/elasticsearch-openapi.json +++ b/output/openapi/elasticsearch-openapi.json @@ -6688,7 +6688,7 @@ "document" ], "summary": "Create a new document in the index", - "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -6737,7 +6737,7 @@ "document" ], "summary": "Create a new document in the index", - "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -7108,7 +7108,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -7168,7 +7168,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -11168,7 +11168,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -26560,7 +26560,7 @@ "document" ], "summary": "Reindex documents", - "description": "Copy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Copy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "operationId": "reindex", "parameters": [ { diff --git a/output/openapi/elasticsearch-serverless-openapi.json b/output/openapi/elasticsearch-serverless-openapi.json index 318cc2f6c1..7746d1861f 100644 --- a/output/openapi/elasticsearch-serverless-openapi.json +++ b/output/openapi/elasticsearch-serverless-openapi.json @@ -3422,7 +3422,7 @@ "document" ], "summary": "Create a new document in the index", - "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -3471,7 +3471,7 @@ "document" ], "summary": "Create a new document in the index", - "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "You can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -3666,7 +3666,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -3726,7 +3726,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -5854,7 +5854,7 @@ "document" ], "summary": "Create or update a document in an index", - "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Add a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "externalDocs": { "url": "https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html" }, @@ -15400,7 +15400,7 @@ "document" ], "summary": "Reindex documents", - "description": "Copy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Copy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "operationId": "reindex", "parameters": [ { diff --git a/output/schema/schema-serverless.json b/output/schema/schema-serverless.json index 43b38be2c4..c456624ecb 100644 --- a/output/schema/schema-serverless.json +++ b/output/schema/schema-serverless.json @@ -1959,7 +1959,7 @@ "stability": "stable" } }, - "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "docId": "docs-index", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-index_.html", @@ -2789,7 +2789,7 @@ "stability": "stable" } }, - "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "docId": "docs-index", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-index_.html", @@ -7997,7 +7997,7 @@ "stability": "stable" } }, - "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "docId": "docs-reindex", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-reindex.html", @@ -16027,7 +16027,7 @@ } } }, - "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "generics": [ { "name": "TDocument", @@ -20030,7 +20030,7 @@ } } }, - "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "generics": [ { "name": "TDocument", @@ -36159,7 +36159,7 @@ } ] }, - "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "inherits": { "type": { "name": "RequestBase", diff --git a/output/schema/schema.json b/output/schema/schema.json index 27d034d1ea..05e0da9b54 100644 --- a/output/schema/schema.json +++ b/output/schema/schema.json @@ -4206,7 +4206,7 @@ "stability": "stable" } }, - "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "docId": "docs-index", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-index_.html", @@ -6128,7 +6128,7 @@ "stability": "stable" } }, - "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "docId": "docs-index", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-index_.html", @@ -14841,7 +14841,7 @@ "stability": "stable" } }, - "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "docId": "docs-reindex", "docTag": "document", "docUrl": "https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-reindex.html", @@ -23575,7 +23575,7 @@ } } }, - "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", + "description": "Create a new document in the index.\n\nYou can index a new JSON document with the `//_doc/` or `//_create/<_id>` APIs\nUsing `_create` guarantees that the document is indexed only if it does not already exist.\nIt returns a 409 response when a document with a same ID already exists in the index.\nTo update an existing document, you must use the `//_doc/` API.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add a document using the `PUT //_create/<_id>` or `POST //_create/<_id>` request formats, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.", "generics": [ { "name": "TDocument", @@ -28465,7 +28465,7 @@ } } }, - "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n * ** Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", + "description": "Create or update a document in an index.\n\nAdd a JSON document to the specified data stream or index and make it searchable.\nIf the target is an index and the document already exists, the request updates the document and increments its version.\n\nNOTE: You cannot use this API to send update requests for existing documents in a data stream.\n\nIf the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:\n\n* To add or overwrite a document using the `PUT //_doc/<_id>` request format, you must have the `create`, `index`, or `write` index privilege.\n* To add a document using the `POST //_doc/` request format, you must have the `create_doc`, `create`, `index`, or `write` index privilege.\n* To automatically create a data stream or index with this API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.\n\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nNOTE: Replica shards might not all be started when an indexing operation returns successfully.\nBy default, only the primary is required. Set `wait_for_active_shards` to change this default behavior.\n\n**Automatically create data streams and indices**\n\nIf the request's target doesn't exist and matches an index template with a `data_stream` definition, the index operation automatically creates the data stream.\n\nIf the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.\n\nNOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.\n\nIf no mapping exists, the index operation creates a dynamic mapping.\nBy default, new fields and objects are automatically added to the mapping if needed.\n\nAutomatic index creation is controlled by the `action.auto_create_index` setting.\nIf it is `true`, any index can be created automatically.\nYou can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to `false` to turn off automatic index creation entirely.\nSpecify a comma-separated list of patterns you want to allow or prefix each pattern with `+` or `-` to indicate whether it should be allowed or blocked.\nWhen a list is specified, the default behaviour is to disallow.\n\nNOTE: The `action.auto_create_index` setting affects the automatic creation of indices only.\nIt does not affect the creation of data streams.\n\n**Optimistic concurrency control**\n\nIndex operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the `if_seq_no` and `if_primary_term` parameters.\nIf a mismatch is detected, the operation will result in a `VersionConflictException` and a status code of `409`.\n\n**Routing**\n\nBy default, shard placement — or routing — is controlled by using a hash of the document's ID value.\nFor more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the `routing` parameter.\n\nWhen setting up explicit mapping, you can also use the `_routing` field to direct the index operation to extract the routing value from the document itself.\nThis does come at the (very minimal) cost of an additional document parsing pass.\nIf the `_routing` mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.\n\nNOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.\n\n**Distributed**\n\nThe index operation is directed to the primary shard based on its route and performed on the actual node containing this shard.\nAfter the primary shard completes the operation, if needed, the update is distributed to applicable replicas.\n\n**Active shards**\n\nTo improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.\nIf the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.\nBy default, write operations only wait for the primary shards to be active before proceeding (that is to say `wait_for_active_shards` is `1`).\nThis default can be overridden in the index settings dynamically by setting `index.write.wait_for_active_shards`.\nTo alter this behavior per operation, use the `wait_for_active_shards request` parameter.\n\nValid values are all or any positive integer up to the total number of configured copies per shard in the index (which is `number_of_replicas`+1).\nSpecifying a negative value or a number greater than the number of shard copies will throw an error.\n\nFor example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).\nIf you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.\nThis means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.\nIf `wait_for_active_shards` is set on the request to `3` (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.\nThis requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.\nHowever, if you set `wait_for_active_shards` to `all` (or to `4`, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.\nThe operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.\n\nIt is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.\nAfter the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.\nThe `_shards` section of the API response reveals the number of shard copies on which replication succeeded and failed.\n\n**No operation (noop) updates**\n\nWhen updating a document by using this API, a new version of the document is always created even if the document hasn't changed.\nIf this isn't acceptable use the `_update` API with `detect_noop` set to `true`.\nThe `detect_noop` option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.\n\nThere isn't a definitive rule for when noop updates aren't acceptable.\nIt's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.\n\n**Versioning**\n\nEach indexed document is given a version number.\nBy default, internal versioning is used that starts at 1 and increments with each update, deletes included.\nOptionally, the version number can be set to an external value (for example, if maintained in a database).\nTo enable this functionality, `version_type` should be set to `external`.\nThe value provided must be a numeric, long value greater than or equal to 0, and less than around `9.2e+18`.\n\nNOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations.\nIf no version is provided, the operation runs without any version checks.\n\nWhen using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document.\nIf true, the document will be indexed and the new version number used.\nIf the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:\n\n```\nPUT my-index-000001/_doc/1?version=2&version_type=external\n{\n \"user\": {\n \"id\": \"elkbee\"\n }\n}\n\nIn this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.\nIf the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).\n\nA nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.\nEven the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.", "generics": [ { "name": "TDocument", @@ -32628,7 +32628,7 @@ } ] }, - "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n** Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", + "description": "Reindex documents.\n\nCopy documents from a source to a destination.\nYou can copy all documents to the destination index or reindex a subset of the documents.\nThe source can be any existing index, alias, or data stream.\nThe destination must differ from the source.\nFor example, you cannot reindex a data stream into itself.\n\nIMPORTANT: Reindex requires `_source` to be enabled for all documents in the source.\nThe destination should be configured as wanted before calling the reindex API.\nReindex does not copy the settings from the source or its associated template.\nMappings, shard counts, and replicas, for example, must be configured ahead of time.\n\nIf the Elasticsearch security features are enabled, you must have the following security privileges:\n\n* The `read` index privilege for the source data stream, index, or alias.\n* The `write` index privilege for the destination data stream, index, or index alias.\n* To automatically create a data stream or index with a reindex API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege for the destination data stream, index, or alias.\n* If reindexing from a remote cluster, the `source.remote.user` must have the `monitor` cluster privilege and the `read` index privilege for the source data stream, index, or alias.\n\nIf reindexing from a remote cluster, you must explicitly allow the remote host in the `reindex.remote.whitelist` setting.\nAutomatic data stream creation requires a matching index template with data stream enabled.\n\nThe `dest` element can be configured like the index API to control optimistic concurrency control.\nOmitting `version_type` or setting it to `internal` causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.\n\nSetting `version_type` to `external` causes Elasticsearch to preserve the `version` from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.\n\nSetting `op_type` to `create` causes the reindex API to create only missing documents in the destination.\nAll existing documents will cause a version conflict.\n\nIMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an `op_type` of `create`.\nA reindex can only add new documents to a destination data stream.\nIt cannot update existing documents in a destination data stream.\n\nBy default, version conflicts abort the reindex process.\nTo continue reindexing if there are conflicts, set the `conflicts` request body property to `proceed`.\nIn this case, the response includes a count of the version conflicts that were encountered.\nNote that the handling of other error types is unaffected by the `conflicts` property.\nAdditionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.\n\nNOTE: The reindex API makes no effort to handle ID collisions.\nThe last document written will \"win\" but the order isn't usually predictable so it is not a good idea to rely on this behavior.\nInstead, make sure that IDs are unique by using a script.\n\n**Running reindex asynchronously**\n\nIf the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.\nElasticsearch creates a record of this task as a document at `_tasks/`.\n\n**Reindex from multiple sources**\n\nIf you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.\nThat way you can resume the process if there are any errors by removing the partially completed source and starting over.\nIt also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.\n\nFor example, you can use a bash script like this:\n\n```\nfor index in i1 i2 i3 i4 i5; do\n curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{\n \"source\": {\n \"index\": \"'$index'\"\n },\n \"dest\": {\n \"index\": \"'$index'-reindexed\"\n }\n }'\ndone\n```\n\n**Throttling**\n\nSet `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.\nRequests are throttled by padding each batch with a wait time.\nTo turn off throttling, set `requests_per_second` to `-1`.\n\nThe throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.\nThe padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.\nBy default the batch size is `1000`, so if `requests_per_second` is set to `500`:\n\n```\ntarget_time = 1000 / 500 per second = 2 seconds\nwait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds\n```\n\nSince the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.\nThis is \"bursty\" instead of \"smooth\".\n\n**Slicing**\n\nReindex supports sliced scroll to parallelize the reindexing process.\nThis parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nYou can slice a reindex request manually by providing a slice ID and total number of slices to each request.\nYou can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.\nThe `slices` parameter specifies the number of slices to use.\n\nAdding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:\n\n* You can see these requests in the tasks API. These sub-requests are \"child\" tasks of the task for the request with slices.\n* Fetching the status of the task for the request with `slices` only contains the status of completed slices.\n* These sub-requests are individually addressable for things like cancellation and rethrottling.\n* Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.\n* Canceling the request with `slices` will cancel each sub-request.\n* Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.\n* Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.\n* Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.\n\nIf slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.\nIf slicing manually or otherwise tuning automatic slicing, use the following guidelines.\n\nQuery performance is most efficient when the number of slices is equal to the number of shards in the index.\nIf that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.\nSetting slices higher than the number of shards generally does not improve efficiency and adds overhead.\n\nIndexing performance scales linearly across available resources with the number of slices.\n\nWhether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.\n\n**Modify documents during reindexing**\n\nLike `_update_by_query`, reindex operations support a script that modifies the document.\nUnlike `_update_by_query`, the script is allowed to modify the document's metadata.\n\nJust as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.\nFor example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This \"no operation\" will be reported in the `noop` counter in the response body.\nSet `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.\nThe deletion will be reported in the `deleted` counter in the response body.\nSetting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.\n\nThink of the possibilities! Just be careful; you are able to change:\n\n* `_id`\n* `_index`\n* `_version`\n* `_routing`\n\nSetting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.\nIt will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.\n\n**Reindex from remote**\n\nReindex supports reindexing from a remote Elasticsearch cluster.\nThe `host` parameter must contain a scheme, host, port, and optional path.\nThe `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.\nBe sure to use HTTPS when using basic authentication or the password will be sent in plain text.\nThere are a range of settings available to configure the behavior of the HTTPS connection.\n\nWhen using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.\nRemote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.\nIt can be set to a comma delimited list of allowed remote host and port combinations.\nScheme is ignored; only the host and port are used.\nFor example:\n\n```\nreindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*\"]\n```\n\nThe list of allowed hosts must be configured on any nodes that will coordinate the reindex.\nThis feature should work with remote clusters of any version of Elasticsearch.\nThis should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.\n\nWARNING: Elasticsearch does not support forward compatibility across major versions.\nFor example, you cannot reindex from a 7.x cluster into a 6.x cluster.\n\nTo enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.\n\nNOTE: Reindexing from remote clusters does not support manual or automatic slicing.\n\nReindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.\nIf the remote index includes very large documents you'll need to use a smaller batch size.\nIt is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.\nBoth default to 30 seconds.\n\n**Configuring SSL parameters**\n\nReindex from remote supports configurable SSL settings.\nThese must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.\nIt is not possible to configure SSL in the body of the reindex request.", "inherits": { "type": { "name": "RequestBase",