From 2f49457b3af44bd2fcc0d4b7fca4d21aeb640640 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 31 Oct 2023 15:19:10 +0000 Subject: [PATCH] [OSCI][DOCs] Replacing 'indices' terms for 'indexes' terms ONLY for description texts (#5353) * Fixing documentation for Wildcard in term-level queries section for Query DSL Signed-off-by: Samuel Valdes Gutierrez * replacing 'indices' term for 'indexes' term ONLY for description texts (not variables, links or properties) Signed-off-by: Samuel Valdes Gutierrez * Update creating-custom-workloads.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * updating changes suggested by Naarcha-AWS Signed-off-by: Samuel Valdes Gutierrez * updating changes suggested by Naarcha-AWS Signed-off-by: Samuel Valdes Gutierrez * Rename _benchmark/workloads/index.md to _benchmark/workloads/reference/index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Rename _benchmark/workloads/indices.md to _benchmark/workloads/reference/indices.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Samuel Valdes Gutierrez Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> (cherry picked from commit 7012af124a3fe719f80794714fed211d415f6009) Signed-off-by: github-actions[bot] --- _api-reference/cat/cat-aliases.md | 2 +- _api-reference/cat/cat-indices.md | 4 +- _api-reference/count.md | 4 +- _api-reference/document-apis/bulk.md | 6 +- .../document-apis/delete-by-query.md | 10 +- _api-reference/popular-api.md | 8 +- _benchmark/user-guide/concepts.md | 12 +- .../user-guide/creating-custom-workloads.md | 127 ++++++++++-------- _benchmark/workloads/reference/index.md | 109 +++++++++++++++ _benchmark/workloads/reference/indices.md | 30 +++++ _dashboards/sm-dashboards.md | 21 ++- .../configuration/sources/opensearch.md | 22 +-- .../access-control/default-action-groups.md | 6 +- _security/access-control/permissions.md | 6 +- .../multi-tenancy/multi-tenancy-config.md | 5 +- about.md | 12 +- 16 files changed, 267 insertions(+), 117 deletions(-) create mode 100644 _benchmark/workloads/reference/index.md create mode 100644 _benchmark/workloads/reference/indices.md diff --git a/_api-reference/cat/cat-aliases.md b/_api-reference/cat/cat-aliases.md index 6dcf3ddcf9..9e4407dced 100644 --- a/_api-reference/cat/cat-aliases.md +++ b/_api-reference/cat/cat-aliases.md @@ -53,7 +53,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false. -expand_wildcards | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. +expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. ## Response diff --git a/_api-reference/cat/cat-indices.md b/_api-reference/cat/cat-indices.md index daab13938c..a5a70d8f0e 100644 --- a/_api-reference/cat/cat-indices.md +++ b/_api-reference/cat/cat-indices.md @@ -12,7 +12,7 @@ redirect_from: **Introduced 1.0** {: .label .label-purple } -The CAT indices operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on. +The CAT indexes operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on. ## Example @@ -44,7 +44,7 @@ GET _cat/indices ## URL parameters -All CAT indices URL parameters are optional. +All CAT indexes URL parameters are optional. In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: diff --git a/_api-reference/count.md b/_api-reference/count.md index 6a61a93866..3e777a413e 100644 --- a/_api-reference/count.md +++ b/_api-reference/count.md @@ -2,7 +2,7 @@ layout: default title: Count nav_order: 21 -redirect_from: +redirect_from: - /opensearch/rest-api/count/ --- @@ -61,7 +61,7 @@ GET _count ``` {% include copy-curl.html %} -Alternatively, you could use the [cat indices]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream. +Alternatively, you could use the [cat indexes]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream. {: .note } diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 1ea163f6fe..48ae0e2902 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -57,7 +57,7 @@ refresh | Enum | Whether to refresh the affected shards after performing the ind require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. routing | String | Routes the request to the specified shard. timeout | Time | How long to wait for the request to return. Default `1m`. -type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indices. +type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. {% comment %}_source | List | asdf _source_excludes | list | asdf @@ -114,7 +114,7 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If { "update": { "_index": "movies", "_id": "tt0816711" } } { "doc" : { "title": "World War Z" } } ``` - + It can also include a script or upsert for more complex document updates. - Script @@ -122,7 +122,7 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If { "update": { "_index": "movies", "_id": "tt0816711" } } { "script" : { "source": "ctx._source.title = \"World War Z\"" } } ``` - + - Upsert ```json { "update": { "_index": "movies", "_id": "tt0816711" } } diff --git a/_api-reference/document-apis/delete-by-query.md b/_api-reference/document-apis/delete-by-query.md index eedf79f9f6..b205ed760f 100644 --- a/_api-reference/document-apis/delete-by-query.md +++ b/_api-reference/document-apis/delete-by-query.md @@ -3,7 +3,7 @@ layout: default title: Delete by query parent: Document APIs nav_order: 40 -redirect_from: +redirect_from: - /opensearch/rest-api/document-apis/delete-by-query/ --- @@ -39,16 +39,16 @@ All URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- | :--- -<index> | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices. -allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indices. Default is `true`. +<index> | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes. +allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. analyzer | String | The analyzer to use in the query string. analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false. conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`. default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. df | String | The default field in case a field prefix is not provided in the query string. -expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`. +expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. from | Integer | The starting index to search from. Default is 0. -ignore_unavailable | Boolean | Specifies whether to include missing or closed indices in the response. Default is false. +ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false. lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents. preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on. diff --git a/_api-reference/popular-api.md b/_api-reference/popular-api.md index c258943c3c..2191b75666 100644 --- a/_api-reference/popular-api.md +++ b/_api-reference/popular-api.md @@ -81,14 +81,14 @@ POST _bulk ``` -## List all indices +## List all indexes ``` GET _cat/indices?v&expand_wildcards=all ``` -## Open or close all indices that match a pattern +## Open or close all indexes that match a pattern ``` POST my-logs*/_open @@ -96,7 +96,7 @@ POST my-logs*/_close ``` -## Delete all indices that match a pattern +## Delete all indexes that match a pattern ``` DELETE my-logs* @@ -119,7 +119,7 @@ GET _cat/aliases?v ``` -## Search an index or all indices that match a pattern +## Search an index or all indexes that match a pattern ``` GET my-logs/_search?q=test diff --git a/_benchmark/user-guide/concepts.md b/_benchmark/user-guide/concepts.md index 265f698b56..81a71d008f 100644 --- a/_benchmark/user-guide/concepts.md +++ b/_benchmark/user-guide/concepts.md @@ -94,8 +94,8 @@ A workload usually includes the following elements: - [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload. - [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload. -- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations. -- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized. +- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations. +- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized. ### Indices @@ -105,9 +105,9 @@ To create an index, specify its `name`. To add definitions to your index, use th The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters: -- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. -- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. -- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. +- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. +- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. +- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. - `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents. ### Operations @@ -116,7 +116,7 @@ The `operations` element lists the OpenSearch API operations performed by the wo ### Schedule -The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`: +The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`: ```json "schedule": [ diff --git a/_benchmark/user-guide/creating-custom-workloads.md b/_benchmark/user-guide/creating-custom-workloads.md index 6ad284c2d1..17c2a69cfa 100644 --- a/_benchmark/user-guide/creating-custom-workloads.md +++ b/_benchmark/user-guide/creating-custom-workloads.md @@ -8,44 +8,53 @@ redirect_from: /benchmark/creating-custom-workloads/ # Creating custom workloads -OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options: - -- [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster) -- [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster) +OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options: + +- [Creating custom workloads](#creating-custom-workloads) + - [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster) + - [Prerequisites](#prerequisites) + - [Customizing the workload](#customizing-the-workload) + - [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster) + - [Invoking your custom workload](#invoking-your-custom-workload) + - [Advanced options](#advanced-options) + - [Test mode](#test-mode) + - [Adding variance to test procedures](#adding-variance-to-test-procedures) + - [Separate operations and test procedures](#separate-operations-and-test-procedures) + - [Next steps](#next-steps) ## Creating a workload from an existing cluster -If you already have an OpenSearch cluster with indexed data, use the following steps to create a custom workload for your cluster. +If you already have an OpenSearch cluster with indexed data, use the following steps to create a custom workload for your cluster. ### Prerequisites -Before creating a custom workload, make sure you have the following prerequisites: +Before creating a custom workload, make sure you have the following prerequisites: - An OpenSearch cluster with an index that contains 1000 or more documents. If your cluster's index does not contain at least 1000 documents, the workload can still run tests, however, you cannot run workloads using `--test-mode`. -- You must have the correct permissions to access your OpenSearch cluster. For more information about cluster permissions, see [Permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/). +- You must have the correct permissions to access your OpenSearch cluster. For more information about cluster permissions, see [Permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/). ### Customizing the workload -To begin creating a custom workload, use the `opensearch-benchmark create-workload` command. +To begin creating a custom workload, use the `opensearch-benchmark create-workload` command. ``` opensearch-benchmark create-workload \ --workload="" \ --target-hosts="" \ --client-options="basic_auth_user:'',basic_auth_password:''" \ ---indices="" \ +--indices="" \ --output-path="" ``` Replace the following options in the preceding example with information specific to your existing cluster: - `--workload`: A custom name for your custom workload. -- `--target-hosts:` A comma-separated list of host:port pairs from which the cluster extracts data. -- `--client-options`: The basic authentication client options that OpenSearch Benchmark uses to access the cluster. -- `--indices`: One or more indexes inside your OpenSearch cluster that contain data. -- `--output-path`: The directory in which OpenSearch Benchmark creates the workload and its configuration files. +- `--target-hosts:` A comma-separated list of host:port pairs from which the cluster extracts data. +- `--client-options`: The basic authentication client options that OpenSearch Benchmark uses to access the cluster. +- `--indices`: One or more indexes inside your OpenSearch cluster that contain data. +- `--output-path`: The directory in which OpenSearch Benchmark creates the workload and its configuration files. -The following example response creates a workload named `movies` from a cluster with an index named `movies-info`. The `movies-info` index contains over 2,000 documents. +The following example response creates a workload named `movies` from a cluster with an index named `movies-info`. The `movies-info` index contains over 2,000 documents. ``` ____ _____ __ ____ __ __ @@ -68,13 +77,13 @@ Extracting documents for index [movies]... 2000/2000 docs [10 ------------------------------- ``` -As part of workload creation, OpenSearch Benchmark generates the following files. You can access them in the directory specified by the `--output-path` option. +As part of workload creation, OpenSearch Benchmark generates the following files. You can access them in the directory specified by the `--output-path` option. -- `workload.json`: Contains general workload specifications. -- `.json`: Contains mappings and settings for the extracted indexes. -- `-documents.json`: Contains the sources of every document from the extracted indexes. Any sources suffixed with `-1k` encompass only a fraction of the document corpus of the workload and are only used when running the workload in test mode. +- `workload.json`: Contains general workload specifications. +- `.json`: Contains mappings and settings for the extracted indexes. +- `-documents.json`: Contains the sources of every document from the extracted indexes. Any sources suffixed with `-1k` encompass only a fraction of the document corpus of the workload and are only used when running the workload in test mode. -By default, OpenSearch Benchmark does not contain a reference to generate queries. Because you have the best understanding of your data, we recommend adding a query to `workload.json` that matches your index's specifications. Use the following `match_all` query as an example of a query added to your workload: +By default, OpenSearch Benchmark does not contain a reference to generate queries. Because you have the best understanding of your data, we recommend adding a query to `workload.json` that matches your index's specifications. Use the following `match_all` query as an example of a query added to your workload: ```json { @@ -100,17 +109,17 @@ If you want to create a custom workload but do not have an existing OpenSearch c To build a workload with source files, create a directory for your workload and perform the following steps: -1. Build a `-documents.json` file that contains rows of documents that comprise the document corpora of the workload and houses all data to be ingested and queried into the cluster. The following example shows the first few rows of a `movies-documents.json` file that contains rows of documents about famous movies: +1. Build a `-documents.json` file that contains rows of documents that comprise the document corpora of the workload and houses all data to be ingested and queried into the cluster. The following example shows the first few rows of a `movies-documents.json` file that contains rows of documents about famous movies: ```json - # First few rows of movies-documents.json + # First few rows of movies-documents.json {"title": "Back to the Future", "director": "Robert Zemeckis", "revenue": "$212,259,762 USD", "rating": "8.5 out of 10", "image_url": "https://imdb.com/images/32"} {"title": "Avengers: Endgame", "director": "Anthony and Joe Russo", "revenue": "$2,800,000,000 USD", "rating": "8.4 out of 10", "image_url": "https://imdb.com/images/2"} {"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "revenue": "$173,000,000 USD", "rating": "8.1 out of 10", "image_url": "https://imdb.com/images/65"} {"title": "The Godfather: Part II", "director": "Francis Ford Coppola", "revenue": "$48,000,000 USD", "rating": "9 out of 10", "image_url": "https://imdb.com/images/7"} ``` -2. In the same directory, build a `index.json` file. The workload uses this file as a reference for data mappings and index settings for the documents contained in `-documents.json`. The following example creates mappings and settings specific to the `movie-documents.json` data from the previous step: +2. In the same directory, build a `index.json` file. The workload uses this file as a reference for data mappings and index settings for the documents contained in `-documents.json`. The following example creates mappings and settings specific to the `movie-documents.json` data from the previous step: ```json { @@ -140,21 +149,21 @@ To build a workload with source files, create a directory for your workload and } ``` -3. Next, build a `workload.json` file that provides a high-level overview of your workload and determines how your workload runs benchmark tests. The `workload.json` file contains the following sections: - - - `indices`: Defines the name of the index to be created in your OpenSearch cluster using the mappings from the workload's `index.json` file created in the previous step. - - `corpora`: Defines the corpora and the source file, including the: - - `document-count`: The number of documents in `-documents.json`. To get an accurate number of documents, run `wc -l -documents.json`. - - `uncompressed-bytes`: The number of bytes inside the index. To get an accurate number of bytes, run `stat -f %z -documents.json` on macOS or `stat -c %s -documents.json` on GNU/Linux. Alternatively, run `ls -lrt | grep -documents.json`. +3. Next, build a `workload.json` file that provides a high-level overview of your workload and determines how your workload runs benchmark tests. The `workload.json` file contains the following sections: + + - `indices`: Defines the name of the index to be created in your OpenSearch cluster using the mappings from the workload's `index.json` file created in the previous step. + - `corpora`: Defines the corpora and the source file, including the: + - `document-count`: The number of documents in `-documents.json`. To get an accurate number of documents, run `wc -l -documents.json`. + - `uncompressed-bytes`: The number of bytes inside the index. To get an accurate number of bytes, run `stat -f %z -documents.json` on macOS or `stat -c %s -documents.json` on GNU/Linux. Alternatively, run `ls -lrt | grep -documents.json`. - `schedule`: Defines the sequence of operations and available test procedures for the workload. - The following example `workload.json` file provides the entry point for the `movies` workload. The `indices` section creates an index called `movies`. The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including: +The following example `workload.json` file provides the entry point for the `movies` workload. The `indices` section creates an index called `movies`. The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including: - - Deleting any current index named `movies`. - - Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`. - - Verifying that the cluster is in good health and can ingest the new index. - - Ingesting the data corpora from `workload.json` into the cluster. - - Querying the results. +- Deleting any current index named `movies`. +- Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`. +- Verifying that the cluster is in good health and can ingest the new index. +- Ingesting the data corpora from `workload.json` into the cluster. +- Querying the results. ```json { @@ -230,15 +239,25 @@ To build a workload with source files, create a directory for your workload and } ``` -4. For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory: +The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including: - ``` - opensearch-benchmark list workloads --workload-path= - ``` +- Deleting any current index named `movies`. +- Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`. + - Verifying that the cluster is in good health and can ingest the new index. + - Ingesting the data corpora from `workload.json` into the cluster. + - Querying the results. + + + +For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory: + +``` +opensearch-benchmark list workloads --workload-path= +``` ## Invoking your custom workload -Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. +Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. ``` opensearch-benchmark execute_test \ @@ -256,7 +275,7 @@ You can enhance your custom workload's functionality with the following advanced ### Test mode -If you want run the test in test mode to make sure your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1000 documents from each index provided and runs query operations against them. +If you want run the test in test mode to make sure your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1000 documents from each index provided and runs query operations against them. To use test mode, create a `-documents-1k.json` file that contains the first 1000 documents from `-documents.json` using the following command: @@ -277,13 +296,13 @@ opensearch-benchmark execute_test \ ### Adding variance to test procedures -After using your custom workload several times, you might want to use the same workload but perform the workload's operations in a different order. Instead of creating a new workload or reorganizing the procedures directly, you can provide test procedures to vary workload operations. +After using your custom workload several times, you might want to use the same workload but perform the workload's operations in a different order. Instead of creating a new workload or reorganizing the procedures directly, you can provide test procedures to vary workload operations. -To add variance to your workload operations, go to your `workload.json` file and replace the `schedule` section with a `test_procedures` array, as shown in the following example. Each item in the array contains the following: +To add variance to your workload operations, go to your `workload.json` file and replace the `schedule` section with a `test_procedures` array, as shown in the following example. Each item in the array contains the following: -- `name`: The name of the test procedure. -- `default`: When set to `true`, OpenSearch Benchmark defaults to the test procedure specified as `default` in the workload if no other test procedures are specified. -- `schedule`: All the operations the test procedure will run. +- `name`: The name of the test procedure. +- `default`: When set to `true`, OpenSearch Benchmark defaults to the test procedure specified as `default` in the workload if no other test procedures are specified. +- `schedule`: All the operations the test procedure will run. ```json @@ -347,11 +366,11 @@ To add variance to your workload operations, go to your `workload.json` file and ### Separate operations and test procedures -If you want to make your `workload.json` file more readable, you can separate your operations and test procedures into different directories and reference the path to each in `workload.json`. To separate operations and procedures, perform the following steps: +If you want to make your `workload.json` file more readable, you can separate your operations and test procedures into different directories and reference the path to each in `workload.json`. To separate operations and procedures, perform the following steps: -1. Add all test procedures to a single file. You can give the file any name. Because the `movies` workload in the preceding contains and index task and queries, this step names the test procedures file `index-and-query.json`. -2. Add all operations to a file named `operations.json`. -3. Reference the new files in `workloads.json` by adding the following syntax, replacing `parts` with the relative path to each file, as shown in the following example: +1. Add all test procedures to a single file. You can give the file any name. Because the `movies` workload in the preceding contains and index task and queries, this step names the test procedures file `index-and-query.json`. +2. Add all operations to a file named `operations.json`. +3. Reference the new files in `workloads.json` by adding the following syntax, replacing `parts` with the relative path to each file, as shown in the following example: ```json "operations": [ @@ -365,11 +384,5 @@ If you want to make your `workload.json` file more readable, you can separate yo ## Next steps -- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). -- To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. - - - - - - +- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). +- To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. diff --git a/_benchmark/workloads/reference/index.md b/_benchmark/workloads/reference/index.md new file mode 100644 index 0000000000..234fb7f964 --- /dev/null +++ b/_benchmark/workloads/reference/index.md @@ -0,0 +1,109 @@ +--- +layout: default +title: Workload reference +nav_order: 60 +has_children: true +--- + +# OpenSearch Benchmark workload reference + +A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following: + +- One or more data streams that are ingested into indexes +- A set of queries and operations that are invoked as part of the benchmark + +This section provides a list of options and examples you can use when customizing or using a workload. + +For more information about what comprises a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts#anatomy-of-a-workload). + + +## Workload examples + +If you want to try certain workloads before creating your own, use the following examples. + +### Running unthrottled + +In the following example, OpenSearch Benchmark runs an unthrottled bulk index operation for 1 hour against the `movies` index: + +```json +{ + "description": "Tutorial benchmark for OpenSearch Benchmark", + "indices": [ + { + "name": "movies", + "body": "index.json" + } + ], + "corpora": [ + { + "name": "movies", + "documents": [ + { + "source-file": "movies-documents.json", + "document-count": 11658903, # Fetch document count from command line + "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line + } + ] + } + ], + "schedule": [ + { + "operation": "bulk", + "warmup-time-period": 120, + "time-period": 3600, + "clients": 8 + } +] +} +``` + +### Workload with a single task + +The following workload runs a benchmark with a single task: a `match_all` query. Because no `clients` are indicated, only one client is used. According to the `schedule`, the workload runs the `match_all` query at 10 operations per second with 1 client, uses 100 iterations to warm up, and uses the next 100 iterations to measure the benchmark: + +```json +{ + "description": "Tutorial benchmark for OpenSearch Benchmark", + "indices": [ + { + "name": "movies", + "body": "index.json" + } + ], + "corpora": [ + { + "name": "movies", + "documents": [ + { + "source-file": "movies-documents.json", + "document-count": 11658903, # Fetch document count from command line + "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line + } + ] + } + ], +{ + "schedule": [ + { + "operation": { + "operation-type": "search", + "index": "_all", + "body": { + "query": { + "match_all": {} + } + } + }, + "warmup-iterations": 100, + "iterations": 100, + "target-throughput": 10 + } + ] +} +} +``` + +## Next steps + +- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). +- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. diff --git a/_benchmark/workloads/reference/indices.md b/_benchmark/workloads/reference/indices.md new file mode 100644 index 0000000000..3b7e916b3e --- /dev/null +++ b/_benchmark/workloads/reference/indices.md @@ -0,0 +1,30 @@ +--- +layout: default +title: indices +parent: Workload reference +nav_order: 65 +--- + +# indices + +The `indices` element contains a list of all indexes used in the workload. + +## Example + +```json +"indices": [ + { + "name": "geonames", + "body": "geonames-index.json", + } +] +``` + +## Configuration options + +Use the following options with `indices`: + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`name` | Yes | String | The name of the index template. +`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API. diff --git a/_dashboards/sm-dashboards.md b/_dashboards/sm-dashboards.md index 9325f945a6..3f6cd11d85 100644 --- a/_dashboards/sm-dashboards.md +++ b/_dashboards/sm-dashboards.md @@ -28,13 +28,13 @@ Snapshots have two main uses: ## Creating a repository -Before you create an SM policy, set up a repository for snapshots. +Before you create an SM policy, set up a repository for snapshots. 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 2. In the left panel, under **Snapshot Management**, select **Repositories**. 3. Choose the **Create Repository** button. -4. Enter the repository name, type, and location. -5. (Optional) Select **Advanced Settings** and enter additional settings for this repository as a JSON object. +4. Enter the repository name, type, and location. +5. (Optional) Select **Advanced Settings** and enter additional settings for this repository as a JSON object. #### Example ```json { @@ -87,7 +87,7 @@ You can view, edit, or delete an SM policy on the policy details page. 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**. 1. Click on the **Policy name** of the policy you want to view, edit, or delete.
-The policy settings, snapshot schedule, snapshot retention period, notifications, and last creation and deletion are displayed in the policy details page.
If a snapshot creation or deletion fails, you can view information about the failure in the **Last Creation/Deletion** section. To view the failure message, click on the **cause** in the **Info** column. +The policy settings, snapshot schedule, snapshot retention period, notifications, and last creation and deletion are displayed in the policy details page.
If a snapshot creation or deletion fails, you can view information about the failure in the **Last Creation/Deletion** section. To view the failure message, click on the **cause** in the **Info** column. 1. To edit or delete the SM policy, select the **Edit** or **Delete** button. ## Enable, disable, or delete SM policies @@ -131,7 +131,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshots**. The **Snapshots** tab is selected by default. -1. Select the checkbox next to the snapshot you want to restore. An example is shown in the following image: +1. Select the checkbox next to the snapshot you want to restore. An example is shown in the following image: Snapshots{: .img-fluid} {::nomarkdown}star icon{:/} **Note:** You can only restore snapshots with the status of `Success` or `Partial`. The status of the snapshot is displayed in the **Snapshot status** column. @@ -142,7 +142,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps Restore Snapshot - For more information about the options in the **Restore snapshot** flyout, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). + For more information about the options in the **Restore snapshot** flyout, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). **Ignoring missing indexes** @@ -154,20 +154,20 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps  • Select the **Customize index settings** checkbox to provide new values for the specified index settings. All newly restored indexes will use these values instead of the ones in the snapshot.
 • Select the **Ignore index settings** checkbox to specify the settings in the snapshot to ignore. All newly restored indexes will use the cluster defaults for these settings. - The examples in the following image set `index.number_of_replicas` to `0`, `index.auto_expand_replicas` to `true`, and `index.refresh_interval` and `index.max_script_fields` to the cluster default values for all newly restored indexes. + The examples in the following image set `index.number_of_replicas` to `0`, `index.auto_expand_replicas` to `true`, and `index.refresh_interval` and `index.max_script_fields` to the cluster default values for all newly restored indexes. Custom settings For more information about index settings, see [Index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/). For a list of settings that you cannot change or ignore, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). - + After choosing the options, select the **Restore snapshot** button. -1. (Optional) To monitor the restore progress, select **View restore activities** in the confirmation dialog. You can also monitor the restore progress at any time by selecting the **Restore activities in progress** tab, as shown in the following image. +1. (Optional) To monitor the restore progress, select **View restore activities** in the confirmation dialog. You can also monitor the restore progress at any time by selecting the **Restore activities in progress** tab, as shown in the following image. Restore Activities{: .img-fluid} - You can view the percentage of the job that has been completed in the **Status** column. Once the snapshot restore is complete, the **Status** changes to `Completed (100%)`. + You can view the percentage of the job that has been completed in the **Status** column. Once the snapshot restore is complete, the **Status** changes to `Completed (100%)`. {::nomarkdown}star icon{:/} **Note:** The **Restore activities in progress** panel is not persistent. It displays only the progress of the current restore operation. If multiple restore operations are running, the panel displays the most recent one. {: .note purple} @@ -178,4 +178,3 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps After the restore operation is complete, the restored indexes are listed in the **Indices** panel. To view the indexes, in the left panel, under **Index Management**, choose **Indices**. View Indices{: .img-fluid} - \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sources/opensearch.md b/_data-prepper/pipelines/configuration/sources/opensearch.md index faa5b0b68b..d5397a38b0 100644 --- a/_data-prepper/pipelines/configuration/sources/opensearch.md +++ b/_data-prepper/pipelines/configuration/sources/opensearch.md @@ -1,6 +1,6 @@ --- layout: default -title: opensearch +title: opensearch parent: Sources grand_parent: Pipelines nav_order: 30 @@ -39,7 +39,7 @@ opensearch-source-pipeline: include: - index_name_regex: "test-index-.*" exclude: - - index_name_regex: "\..*" + - index_name_regex: "\..*" scheduling: interval: "PT1H" index_read_count: 2 @@ -103,15 +103,15 @@ Option | Required | Type | Description `aws` | No | Object | The AWS configuration. For more information, see [aws](#aws). `acknowledgments` | No | Boolean | When `true`, enables the `opensearch` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`. `connection` | No | Object | The connection configuration. For more information, see [Connection](#connection). -`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [Indices](#indices). +`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [indexes](#indices). `scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling). `search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options). ### Scheduling -The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`. +The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`. -For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once. +For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once. Use the following options under the `scheduling` configuration. @@ -119,12 +119,12 @@ Option | Required | Type | Description :--- | :--- |:----------------| :--- `index_read_count` | No | Integer | The number of times each index will be processed. Default is `1`. `interval` | No | String | The interval that determines the amount of time between reprocessing. Supports ISO 8601 notation strings, such as "PT20.345S" or "PT15M", as well as simple notation strings for seconds ("60s") and milliseconds ("1500ms"). Defaults to `8h`. -`start_time` | No | String | The time when processing should begin. The source will not start processing until this time. The string must be in ISO 8601 format, such as `2007-12-03T10:15:30.00Z`. The default option starts processing immediately. +`start_time` | No | String | The time when processing should begin. The source will not start processing until this time. The string must be in ISO 8601 format, such as `2007-12-03T10:15:30.00Z`. The default option starts processing immediately. ### indices -The following options help the `opensearch` source determine which indexes are processed from the source cluster using regex patterns. An index will only be processed if it matches one of the `index_name_regex` patterns under the `include` setting and does not match any of the +The following options help the `opensearch` source determine which indexes are processed from the source cluster using regex patterns. An index will only be processed if it matches one of the `index_name_regex` patterns under the `include` setting and does not match any of the patterns under the `exclude` setting. Option | Required | Type | Description @@ -137,7 +137,7 @@ Use the following setting under the `include` and `exclude` options to indicate Option | Required | Type | Description :--- |:----|:-----------------| :--- -`index_name_regex` | Yes | Regex string | The regex pattern to match indexes against. +`index_name_regex` | Yes | Regex string | The regex pattern to match indexes against. ### search_options @@ -145,13 +145,13 @@ Use the following settings under the `search_options` configuration. Option | Required | Type | Description :--- |:---------|:--------| :--- -`batch_size` | No | Integer | The number of documents to read while paginating from OpenSearch. Default is `1000`. +`batch_size` | No | Integer | The number of documents to read while paginating from OpenSearch. Default is `1000`. `search_context_type` | No | Enum | An override for the type of search/pagination to use on indexes. Can be [point_in_time]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#point-in-time-with-search_after)), [scroll]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#scroll-search), or `none`. The `none` option will use the [search_after]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter) parameter. For more information, see [Default Search Behavior](#default-search-behavior). ### Default search behavior -By default, the `opensearch` source will look up the cluster version and distribution to determine -which `search_context_type` to use. For versions and distributions that support [Point in Time](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#point-in-time-with-search_after), `point_in_time` will be used. +By default, the `opensearch` source will look up the cluster version and distribution to determine +which `search_context_type` to use. For versions and distributions that support [Point in Time](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#point-in-time-with-search_after), `point_in_time` will be used. If `point_in_time` is not supported by the cluster, then [scroll](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#scroll-search) will be used. For Amazon OpenSearch Serverless collections, [search_after](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#the-search_after-parameter) will be used because neither `point_in_time` nor `scroll` are supported by collections. ### Connection diff --git a/_security/access-control/default-action-groups.md b/_security/access-control/default-action-groups.md index 79dd39baa1..aeac294721 100644 --- a/_security/access-control/default-action-groups.md +++ b/_security/access-control/default-action-groups.md @@ -43,14 +43,14 @@ Name | Description indices_all | Grants all permissions on the index. Equates to `indices:*`. get | Grants permissions to use `get` and `mget` actions only. read | Grants read permissions such as search, get field mappings, `get`, and `mget`. -write | Grants permissions to create and update documents within *existing indices*. To create new indices, see `create_index`. +write | Grants permissions to create and update documents within *existing indices*. To create new indexes, see `create_index`. delete | Grants permissions to delete documents. crud | Combines the `read`, `write`, and `delete` action groups. Included in the `data_access` action group. search | Grants permissions to search documents. Includes `suggest`. suggest | Grants permissions to use the suggest API. Included in the `read` action group. -create_index | Grants permissions to create indices and mappings. +create_index | Grants permissions to create indexes and mappings. indices_monitor | Grants permissions to execute all index monitoring actions (e.g. recovery, segments info, index stats, and status). index | A more limited version of the `write` action group. data_access | Combines the `crud` action group with `indices:data/*`. manage_aliases | Grants permissions to manage aliases. -manage | Grants all monitoring and administration permissions for indices. +manage | Grants all monitoring and administration permissions for indexes. diff --git a/_security/access-control/permissions.md b/_security/access-control/permissions.md index 50b973e0a7..36dfc2460f 100644 --- a/_security/access-control/permissions.md +++ b/_security/access-control/permissions.md @@ -93,10 +93,10 @@ System index permissions also work with the wildcard to include all variations o * Specifying the full name of a system index limits access to only that index: `.opendistro-alerting-config`. * Specifying a partial name for a system index along with the wildcard provides access to all system indexes that begin with that name: `.opendistro-anomaly-detector*`. * Although not recommended---given the wide-reaching access granted by this role definition---using `*` for the index pattern along with `system:admin/system_index` as an allowed action grants access to all system indexes. - + Entering the wildcard `*` by itself under `allowed_actions` does not automatically grant access to system indexes. The allowed action `system:admin/system_index` must be explicitly added. {: .note } - + The following example shows a role that grants access to all system indexes: ```yml @@ -474,4 +474,4 @@ Allowing access to these endpoints has the potential to trigger operational chan - restapi:admin/rolesmapping - restapi:admin/ssl/certs/info - restapi:admin/ssl/certs/reload -- restapi:admin/tenants \ No newline at end of file +- restapi:admin/tenants diff --git a/_security/multi-tenancy/multi-tenancy-config.md b/_security/multi-tenancy/multi-tenancy-config.md index 8b05386fe5..e6b1e16eb3 100644 --- a/_security/multi-tenancy/multi-tenancy-config.md +++ b/_security/multi-tenancy/multi-tenancy-config.md @@ -136,9 +136,9 @@ _meta: ``` -## Manage OpenSearch Dashboards indices +## Manage OpenSearch Dashboards indexes -The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indices for every other tenant. Each user also has a private tenant, so you might see a large number of indices that follow two patterns: +The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indexes for every other tenant. Each user also has a private tenant, so you might see a large number of indexes that follow two patterns: ``` .kibana__ @@ -149,4 +149,3 @@ The Security plugin scrubs these index names of special characters, so they migh {: .tip } To back up your OpenSearch Dashboards data, [take a snapshot]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/) of all tenant indexes using an index pattern such as `.kibana*`. - diff --git a/about.md b/about.md index 9b81727bb9..7542c3e3ab 100644 --- a/about.md +++ b/about.md @@ -13,7 +13,7 @@ redirect_from: # Introduction to OpenSearch -OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results. +OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results. Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink. @@ -29,9 +29,9 @@ You can run OpenSearch locally on a laptop---its system requirements are minimal In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). -## Indices and documents +## indexes and documents -OpenSearch organizes data into *indices*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this: +OpenSearch organizes data into *indexes*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this: ```json { @@ -55,14 +55,14 @@ When you add the document to an index, OpenSearch adds some metadata, such as th } ``` -Indices also contain mappings and settings: +Indexes also contain mappings and settings: - A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`. - Settings include data like the index name, creation date, and number of shards. ## Primary and replica shards -OpenSearch splits indices into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually. +OpenSearch splits indexes into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually. By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload. @@ -93,4 +93,4 @@ To delete the document: DELETE https://://_doc/ ``` -You can change most OpenSearch settings using the REST API, modify indices, check the health of the cluster, get statistics---almost everything. +You can change most OpenSearch settings using the REST API, modify indexes, check the health of the cluster, get statistics---almost everything.