From 3167840299063af8f8a35c10b1f487bc83002500 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Wed, 6 Dec 2023 15:16:30 -0600 Subject: [PATCH 01/21] Add operations and schedule sections Signed-off-by: Naarcha-AWS --- _benchmark/reference/workloads/operations.md | 11 ++ _benchmark/reference/workloads/schedule.md | 173 +++++++++++++++++++ 2 files changed, 184 insertions(+) create mode 100644 _benchmark/reference/workloads/operations.md create mode 100644 _benchmark/reference/workloads/schedule.md diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md new file mode 100644 index 0000000000..050c2c7065 --- /dev/null +++ b/_benchmark/reference/workloads/operations.md @@ -0,0 +1,11 @@ +--- +layout: default +title: operations +parent: Workload reference +grand_parent: OpenSearch Benchmark Reference +nav_order: 100 +--- + +# operations + +The `operations` element contains a list of all operations that are available when specifying a schedule. \ No newline at end of file diff --git a/_benchmark/reference/workloads/schedule.md b/_benchmark/reference/workloads/schedule.md new file mode 100644 index 0000000000..f2b19bd388 --- /dev/null +++ b/_benchmark/reference/workloads/schedule.md @@ -0,0 +1,173 @@ +--- +layout: default +title: schedule +parent: Workload reference +grand_parent: OpenSearch Benchmark Reference +nav_order: 100 +--- + +# schedule + +The schedule element contains a list of a tasks, which are operations supported by OpenSearch Benchmark (OSB), run by the workload during the benchmark test. + +## Usage + +The `schedule` element can define tasks in the following ways: + +### Using the operations element + +The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` is needed. `match-all-query` requires a query `body` and `operation-type`. + +Operations defined in the `operations` element can be reused more than once in the schedule: + +```yml +{ + "operations": [ + { + "name": "force-merge", + "operation-type": "force-merge" + }, + { + "name": "match-all-query", + "operation-type": "search", + "body": { + "query": { + "match_all": {} + } + } + } + ], + "schedule": [ + { + "operation": "force-merge", + "clients": 1 + }, + { + "operation": "match-all-query", + "clients": 4, + "warmup-iterations": 1000, + "iterations": 1000, + "target-throughput": 100 + } + ] +} +``` + +### Defining operations inline + +If you don't want reuse an operation in the schedule, you can also define operations inside the `schedule` element, as shown in the following example: + +```yml +{ + "schedule": [ + { + "operation": { + "name": "force-merge", + "operation-type": "force-merge" + }, + "clients": 1 + }, + { + "operation": { + "name": "match-all-query", + "operation-type": "search", + "body": { + "query": { + "match_all": {} + } + } + }, + "clients": 4, + "warmup-iterations": 1000, + "iterations": 1000, + "target-throughput": 100 + } + ] +} +``` + +## Task options + +Each task contains the following options. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`operation` | Yes | List | Refers to either the name of an operation, defined in the `operations` element, or includes the entire operation inline. +`name` | No | String | Specifies a unique name for the task when multiple tasks use the same operation. +`tags` | No | String | Unique identifiers that can be used to filter between tasks.clients (optional, defaults to 1): The number of clients that should execute a task concurrently. +`clients` | No | Integer | The number of clients that concurrently run the task. Default is `1`. + +### Target options + +OpenSearch Benchmark requires one of the following options when running a task. + +`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that this is a throughput benchmark and runs the task as fast as possible. This useful batch operations, where it is more important to achieve the best throughput as opposed to better latency. When defined, the target specifies the number of requests per second over all clients. For example, if you specify `target-throughput: 1000` with 8 clients, it means that each client will issue 125 (= 1000 / 8) requests per second. +`target-interval` | No | Interval | Defines an internal of less 1 / target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). +`ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. + +## Parallel tasks + +The `parallel` element runs tasks wrapped inside the element concurrently. + +When running tasks in parallel, each task requires the `client` option to make sure clients inside your benchmark are reserved for that task. Otherwise, when the client option is specified inside the parallel element without a connection to the task, the benchmark will use that number of clients for all tasks. + +### Usage + +In the following example, `parallel-task-1` and `parallel-task-2` execute a `bulk` operation concurrently: + +```yml +{ + "name": "parallel-any", + "description": "Track completed-by property", + "schedule": [ + { + "parallel": { + "tasks": [ + { + "name": "parellel-task-1", + "operation": { + "operation-type": "bulk", + "bulk-size": 1000 + }, + "clients": 8 + }, + { + "name": "parellel-task-2", + "operation": { + "operation-type": "bulk", + "bulk-size": 500 + }, + "clients": 8 + } + ] + } + } + ] +} +``` + +### Options + +The `parallel` element supports all `schedule` parameters, in addition to the following: + +`tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. +`completed-by` | No | String | Allows you define the name of one task in the tasks list, or the value `any`. If a specific task name has been provided then as soon as the named task has completed, the whole parallel task structure is considered completed. If the of value `any` is provided, then any task that completes first renders all other tasks specified in parallel structure complete. If this property is not explicitly defined, the parallel task structure is considered completed as soon as the tasks in the element complete. + +## Iteration-based options + +Iteration-based options allow you to warmup clients before the workload outputs benchmark data. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). + +`iterations` | No | Integer | Defines a default value for all tasks of the parallel element. Default is `1`. +`warmup-iterations` | No | Integer | Number of iterations that each client should execute to warmup the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is `0`. + +## Time-based options + +Use the following time-based options with batch-style operations which may require an additional warmup period, including batch style operations. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`time-period` | No | Duration | The time period in seconds that OpenSearch Benchmark considers for measurement. Usually not required for bulk indexing, since OpenSearch Benchmark will index all documents at according to the `warmup-time-period`. +`ramp-up-time-period` | No | Integer | Determines the number of clients used at the end of the specified time period in seconds, which can help increase load gradually. This prevents load spikes from occurring before the benchmark is warmed up. This property requires a `warmup-time-period` to be set as well, which must be less then the ramp up time period. Default is `0`. +`warmup-time-period` | No | Integer | The time period in seconds to warmup of the benchmark candidate. All response data captured during the warmup period will not appear in the measurement results. + + From 7a3795255a79c66fbfde13220c9e13d86c5bd9f9 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Thu, 7 Dec 2023 14:00:21 -0600 Subject: [PATCH 02/21] Add bulk operation documentation Signed-off-by: Naarcha-AWS --- _benchmark/reference/operations/bulk.md | 71 +++++++++++++++++++ .../{workloads => operations}/operations.md | 7 +- _benchmark/reference/workloads/schedule.md | 9 ++- 3 files changed, 82 insertions(+), 5 deletions(-) create mode 100644 _benchmark/reference/operations/bulk.md rename _benchmark/reference/{workloads => operations}/operations.md (54%) diff --git a/_benchmark/reference/operations/bulk.md b/_benchmark/reference/operations/bulk.md new file mode 100644 index 0000000000..759e4e03c1 --- /dev/null +++ b/_benchmark/reference/operations/bulk.md @@ -0,0 +1,71 @@ +--- +layout: default +title: operations +parents: operations +grand_parent: OpenSearch Benchmark Reference +nav_order: 105 +--- + +# bulk + +The `bulk` operation type allows you run [bulk](/api-reference/document-apis/bulk/) requests as a task. + +## Usage + +The following example shows a `bulk` operations with a `bulk-size` of 5000 documents. + +```yml +{ + "name": "index-append", + "operation-type": "bulk", + "bulk-size": 5000 +} +``` + +## Split documents among clients + +With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. + +Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: + +1. Each client starts at a different point in the corpus. For example, in a track with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. +2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it will move on to the first split of the first document of the second corpus, and so on. + +## Options + +Use the following options to customize the bulk operation. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk requested. +`ingest-percentage` No | Range [0, 100] | Defines using a number between [0, 100], how much of document corpus will be indexed. +`corpora` | No | List | A list of document corpus names that should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. +`indices` | No | List | A list of index names that defines which indexes should be used in the bulk index operation. OpenSearch Benchmark will only select document files that have a matching `target-index`. +`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a bulk-size of 1, you should set batch-size higher. +`pipeline` | No | String | Defines the name of an existing ingest pipeline that should be used. +`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. +`conflict-probability` | No | Percentage | A number between [0, 100] that defines how many of the documents be replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate index ID by itself, instead of relying on OpenSearch’s automatic ID generation. Default is `25%`. +`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index on ID conflicts. Default is `index`, which creates a new index during ID conflicts. +`recency` | No | Number | A number between [0,1] that indicates recency. Recency towards `1` bias conflicting IDs towards more recent IDs. Recency towards 0 considers all IDs for ID conflicts. +`detailed-results` | No | Boolean | Records more detailed [meta-data](#meta-data) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, this might incur additional overhead which can skew measurement results. This property must be set to true for individual bulk request failures to be logged by OpenSearch Benchmark. +`timeout` | No | Duration (In minutes) | Defines the time period that OpenSearch will wait per action until it has finished processing the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Defaults to `1m`. +`refresh` No | String | Controls OpenSearch's refresh behavior for bulk requests using the `refresh` bulk API query parameter. Valid values are `true`, where OpenSearch refreshes target shards in the background; `wait_for`, OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, where OpenSearch uses the default refresh behavior. + +## Meta-data + +The `bulk` operations always return the following meta-data: + +- `index`: The name of the affected index. If an index cannot be derived, returns `null`. +- `weight`: An operation-agnostic representation of the bulk size denoted by `units`. +- `unit`: The unit in which to interpret `weight`. +- `success`: A Boolean indicating whether the `bulk` request succeeded. +- `success-count`: The number of successfully processed bulk items for this request. This value will only be determined in case of errors or if the `bulk-size` has been specified in the documents. +- `error-count`: The number of failed bulk items for this request. +- `took`: The value of the `took` property in the bulk response. + +If `detailed-results` is `true` the following meta-data is also returned: + +- `ops`: A nested document with the operation name as key, such as `index`, `update`, or `delete` and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. +- `shards_histogram`: An array of hashes where each hash has two keys: `item-count` which contains the number of items to which a shard distribution applies, and `shards` contains another hash with the actual distribution of `total`, `successful`, and `failed` shards. +- `bulk-request-size-bytes`: The total size of the bulk requests body in bytes. +- `total-document-size-bytes`: The total size of all documents within the bulk request body in bytes. \ No newline at end of file diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/operations/operations.md similarity index 54% rename from _benchmark/reference/workloads/operations.md rename to _benchmark/reference/operations/operations.md index 050c2c7065..a4e5a1a553 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/operations/operations.md @@ -1,11 +1,12 @@ --- layout: default title: operations -parent: Workload reference -grand_parent: OpenSearch Benchmark Reference +has_children: true +parent: OpenSearch Benchmark Reference nav_order: 100 --- # operations -The `operations` element contains a list of all operations that are available when specifying a schedule. \ No newline at end of file +The `operations` element contains a list of all operations that are available when specifying a schedule. + diff --git a/_benchmark/reference/workloads/schedule.md b/_benchmark/reference/workloads/schedule.md index f2b19bd388..781df96ef5 100644 --- a/_benchmark/reference/workloads/schedule.md +++ b/_benchmark/reference/workloads/schedule.md @@ -155,14 +155,19 @@ The `parallel` element supports all `schedule` parameters, in addition to the fo ## Iteration-based options -Iteration-based options allow you to warmup clients before the workload outputs benchmark data. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). +Iteration-based options determine the number of times an operation should fun. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- `iterations` | No | Integer | Defines a default value for all tasks of the parallel element. Default is `1`. `warmup-iterations` | No | Integer | Number of iterations that each client should execute to warmup the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is `0`. ## Time-based options -Use the following time-based options with batch-style operations which may require an additional warmup period, including batch style operations. +Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal with batch-style operations which may require an additional warmup period, including batch style operations. + +To configure a time-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- From 39118583ad623b060548ec175c3f8424e1601a95 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Tue, 12 Dec 2023 14:55:33 -0600 Subject: [PATCH 03/21] Add operations Signed-off-by: Naarcha-AWS --- _benchmark/reference/operations/bulk.md | 2 +- _benchmark/reference/operations/operations.md | 262 ++++++++++++++++++ .../{schedule.md => test-procedures.md} | 37 ++- 3 files changed, 289 insertions(+), 12 deletions(-) rename _benchmark/reference/workloads/{schedule.md => test-procedures.md} (81%) diff --git a/_benchmark/reference/operations/bulk.md b/_benchmark/reference/operations/bulk.md index 759e4e03c1..65ab14a8d2 100644 --- a/_benchmark/reference/operations/bulk.md +++ b/_benchmark/reference/operations/bulk.md @@ -24,7 +24,7 @@ The following example shows a `bulk` operations with a `bulk-size` of 5000 docum ## Split documents among clients -With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. +With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: diff --git a/_benchmark/reference/operations/operations.md b/_benchmark/reference/operations/operations.md index a4e5a1a553..c22976af82 100644 --- a/_benchmark/reference/operations/operations.md +++ b/_benchmark/reference/operations/operations.md @@ -10,3 +10,265 @@ nav_order: 100 The `operations` element contains a list of all operations that are available when specifying a schedule. +## bulk + +The `bulk` operation type allows you run [bulk](/api-reference/document-apis/bulk/) requests as a task. + +### Usage + +The following example shows a `bulk` operations with a `bulk-size` of 5000 documents. + +```yml +{ + "name": "index-append", + "operation-type": "bulk", + "bulk-size": 5000 +} +``` + +### Split documents among clients + +With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. + +Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: + +1. Each client starts at a different point in the corpus. For example, in a track with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. +2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it will move on to the first split of the first document of the second corpus, and so on. + +### Options + +Use the following options to customize the bulk operation. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk requested. +`ingest-percentage` No | Range [0, 100] | Defines using a number between [0, 100], how much of document corpus will be indexed. +`corpora` | No | List | A list of document corpus names that should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. +`indices` | No | List | A list of index names that defines which indexes should be used in the bulk index operation. OpenSearch Benchmark will only select document files that have a matching `target-index`. +`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a bulk-size of 1, you should set batch-size higher. +`pipeline` | No | String | Defines the name of an existing ingest pipeline that should be used. +`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. +`conflict-probability` | No | Percentage | A number between [0, 100] that defines how many of the documents be replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate index ID by itself, instead of relying on OpenSearch’s automatic ID generation. Default is `25%`. +`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index on ID conflicts. Default is `index`, which creates a new index during ID conflicts. +`recency` | No | Number | A number between [0,1] that indicates recency. Recency towards `1` bias conflicting IDs towards more recent IDs. Recency towards 0 considers all IDs for ID conflicts. +`detailed-results` | No | Boolean | Records more detailed [meta-data](#meta-data) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, this might incur additional overhead which can skew measurement results. This property must be set to true for individual bulk request failures to be logged by OpenSearch Benchmark. +`timeout` | No | Duration (In minutes) | Defines the time period that OpenSearch will wait per action until it has finished processing the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Defaults to `1m`. +`refresh` No | String | Controls OpenSearch's refresh behavior for bulk requests using the `refresh` bulk API query parameter. Valid values are `true`, where OpenSearch refreshes target shards in the background; `wait_for`, OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, where OpenSearch uses the default refresh behavior. + +### Meta-data + +The `bulk` operations always returns the following meta-data: + +- `index`: The name of the affected index. If an index cannot be derived, returns `null`. +- `weight`: An operation-agnostic representation of the bulk size denoted by `units`. +- `unit`: The unit in which to interpret `weight`. +- `success`: A Boolean indicating whether the `bulk` request succeeded. +- `success-count`: The number of successfully processed bulk items for this request. This value will only be determined in case of errors or if the `bulk-size` has been specified in the documents. +- `error-count`: The number of failed bulk items for this request. +- `took`: The value of the `took` property in the bulk response. + +If `detailed-results` is `true` the following meta-data is also returned: + +- `ops`: A nested document with the operation name as key, such as `index`, `update`, or `delete` and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. +- `shards_histogram`: An array of hashes where each hash has two keys: `item-count` which contains the number of items to which a shard distribution applies, and `shards` contains another hash with the actual distribution of `total`, `successful`, and `failed` shards. +- `bulk-request-size-bytes`: The total size of the bulk requests body in bytes. +- `total-document-size-bytes`: The total size of all documents within the bulk request body in bytes. + +## create-index + +The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: + +- Create all indexes specified in the workloads `indices` section. +- Creates one specific index defined in the operation itself. + +### Usage + +The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards: + +```yml +{ + "name": "create-all-indices", + "operation-type": "create-index", + "settings": { + "index.number_of_shards": 1 + }, + "request-params": { + "wait_for_active_shards": "true" + } +} +``` + +The next example creates a new index, with all index setting specified in the body of the operation: + +```yml +{ + "name": "create-an-index", + "operation-type": "create-index", + "index": "people", + "body": { + "settings": { + "index.number_of_shards": 0 + }, + "mappings": { + "docs": { + "properties": { + "name": { + "type": "text" + } + } + } + } + } +} +``` + +### Options + +Use the following options when creating all indexes from the `indices` section of a workload. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in `indices` section of the workload. +`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. + +Use the following options when creating a single index in the operation. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`index` | Yes | String | The name of the index. +`body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/) +`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. + +### Meta-data + +The `create-index` operation returns the following meta-data: + +`weight`: The number of indexes created by the operation. +`unit`: Always “ops”. +`success`: A Boolean indicating whether the operation has succeeded. + +## delete-index + +The `delete-index` runs the [Delete Index API](api-reference/index-apis/delete-index/). Like the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or you can delete one or more indexes based on string passed in the `index` setting. + +### Usage + +The following example deletes all indexes found in `indices` section of the workload: + +```yml +{ + "name": "delete-all-indices", + "operation-type": "delete-index" +} +``` + +The following example deletes all `logs_*` indexes: + +```yml +{ + "name": "delete-logs", + "operation-type": "delete-index", + "index": "logs-*", + "only-if-exists": false, + "request-params": { + "expand_wildcards": "all", + "allow_no_indices": "true", + "ignore_unavailable": "true" + } +} +``` + +### Options + +Use the following options when deleting all indexes indicated in the `indices` section of the workload. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`only-if-exists` | No | Boolean | Decides whether an index should only be deleted in the index exists. Default is `true`. +`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. + +Use the following options if you want to delete one or more indexes based on pattern indicated in the `index` option: + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`index` | Yes | String | The name of the index or indexes you want to delete. +`only-if-exists` | No | Boolean | Decides whether an index should only be deleted in the index exists. Default is `true`. +`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. + +### Meta-data + +The `delete-index` operation returns the following meta-data. + +`weight`: The number of indexes created by the operation. +`unit`: Always “ops”. +`success`: A Boolean indicating whether the operation has succeeded. + +## cluster-health + +The `cluster-health` operation runs the [Cluster Health API](/api-reference/cluster-apis/cluster-health/), which checks the cluster health status returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. + + +### Usage + +The following example creates a `cluster-health` operation which checks for a `green` health status on the any `log-*` indexes: + +```yml +{ + "name": "check-cluster-green", + "operation-type": "cluster-health", + "index": "logs-*", + "request-params": { + "wait_for_status": "green", + "wait_for_no_relocating_shards": "true" + }, + "retry-until-success": true +} + +``` + +### Options + +Use the following options with the `cluster-health` operation. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`index` | Yes | String | The name of the index or indexes you want to delete. +`request-params` | No | List of settings | Contains any request parameters allowed by Cluster Health API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. + +### Meta-data + +The `cluster-health` operation returns the following meta-data. + +- `weight`: Always 1. +- `unit`: Always “ops”. +- `success`: A Boolean which indicates whether the operation has succeeded. +- `cluster-status`: Current cluster status. +- `relocating-shards`: The number of shards currently relocating to a different node. + +## refresh + +The `refresh` operations runs the Refresh API. This `operation` returns no meta-data. + +### Usage + +The following example refreshes all `logs-*` indexes: + +```yml +{ + "name": "refresh", + "operation-type": "refresh", + "index": "logs-*" +} +``` + +### Options + +The `refresh` operation uses the following options. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`index` | No | String | The name of the index(es) or data streams to refresh. + +## search + +The `search` operation diff --git a/_benchmark/reference/workloads/schedule.md b/_benchmark/reference/workloads/test-procedures.md similarity index 81% rename from _benchmark/reference/workloads/schedule.md rename to _benchmark/reference/workloads/test-procedures.md index 781df96ef5..dfdb8c751b 100644 --- a/_benchmark/reference/workloads/schedule.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -1,20 +1,35 @@ --- layout: default -title: schedule +title: test_procedures parent: Workload reference grand_parent: OpenSearch Benchmark Reference nav_order: 100 --- -# schedule +# test_procedures + +If your workload only defines one benchmarking scenario specify the schedule on top-level. Use the challenge element if you want to specify additional properties like a name or a description. You can think of a challenge as a benchmarking scenario. If you have multiple challenges, you can define an array of challenges. + +This section contains one or more challenges which describe the benchmark scenarios for this data set. A challenge can reference all operations that are defined in the operations section. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces so that the name is easy to enter on the command line. +`description` | No | String | A human readable description of the test procedure. +`user-info` | No | String | A message that is printed at the beginning of the test intended to notify the user about important information related to the test, such as deprecations. +`default` | No | Boolean | When set to `true`, OpenSearch Benchmark selects this test procedure by default if the user did not specify another `test-procedure` on the command line. If your workload only defines one challenge, it is implicitly selected as default, otherwise you need to define `"default": true` on exactly one challenge. +[schedule](#schedule) | Yes | Array | Defines the workload. + + +## schedule The schedule element contains a list of a tasks, which are operations supported by OpenSearch Benchmark (OSB), run by the workload during the benchmark test. -## Usage +### Usage The `schedule` element can define tasks in the following ways: -### Using the operations element +#### Using the operations element The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` is needed. `match-all-query` requires a query `body` and `operation-type`. @@ -53,7 +68,7 @@ Operations defined in the `operations` element can be reused more than once in t } ``` -### Defining operations inline +#### Defining operations inline If you don't want reuse an operation in the schedule, you can also define operations inside the `schedule` element, as shown in the following example: @@ -86,7 +101,7 @@ If you don't want reuse an operation in the schedule, you can also define operat } ``` -## Task options +### Task options Each task contains the following options. @@ -105,13 +120,13 @@ OpenSearch Benchmark requires one of the following options when running a task. `target-interval` | No | Interval | Defines an internal of less 1 / target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. -## Parallel tasks +### Parallel tasks The `parallel` element runs tasks wrapped inside the element concurrently. When running tasks in parallel, each task requires the `client` option to make sure clients inside your benchmark are reserved for that task. Otherwise, when the client option is specified inside the parallel element without a connection to the task, the benchmark will use that number of clients for all tasks. -### Usage +#### Usage In the following example, `parallel-task-1` and `parallel-task-2` execute a `bulk` operation concurrently: @@ -146,14 +161,14 @@ In the following example, `parallel-task-1` and `parallel-task-2` execute a `bul } ``` -### Options +#### Options The `parallel` element supports all `schedule` parameters, in addition to the following: `tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. `completed-by` | No | String | Allows you define the name of one task in the tasks list, or the value `any`. If a specific task name has been provided then as soon as the named task has completed, the whole parallel task structure is considered completed. If the of value `any` is provided, then any task that completes first renders all other tasks specified in parallel structure complete. If this property is not explicitly defined, the parallel task structure is considered completed as soon as the tasks in the element complete. -## Iteration-based options +### Iteration-based options Iteration-based options determine the number of times an operation should fun. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. @@ -163,7 +178,7 @@ Parameter | Required | Type | Description `iterations` | No | Integer | Defines a default value for all tasks of the parallel element. Default is `1`. `warmup-iterations` | No | Integer | Number of iterations that each client should execute to warmup the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is `0`. -## Time-based options +### Time-based options Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal with batch-style operations which may require an additional warmup period, including batch style operations. From 0adadb5e5e7cd70c63b4af726d220264fbdb4043 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Wed, 13 Dec 2023 14:20:16 -0600 Subject: [PATCH 04/21] Place operations under workloads. Add remaining operations. Signed-off-by: Naarcha-AWS --- _benchmark/reference/operations/bulk.md | 71 ------------------- .../{operations => workloads}/operations.md | 63 ++++++++++++++-- 2 files changed, 56 insertions(+), 78 deletions(-) delete mode 100644 _benchmark/reference/operations/bulk.md rename _benchmark/reference/{operations => workloads}/operations.md (79%) diff --git a/_benchmark/reference/operations/bulk.md b/_benchmark/reference/operations/bulk.md deleted file mode 100644 index 65ab14a8d2..0000000000 --- a/_benchmark/reference/operations/bulk.md +++ /dev/null @@ -1,71 +0,0 @@ ---- -layout: default -title: operations -parents: operations -grand_parent: OpenSearch Benchmark Reference -nav_order: 105 ---- - -# bulk - -The `bulk` operation type allows you run [bulk](/api-reference/document-apis/bulk/) requests as a task. - -## Usage - -The following example shows a `bulk` operations with a `bulk-size` of 5000 documents. - -```yml -{ - "name": "index-append", - "operation-type": "bulk", - "bulk-size": 5000 -} -``` - -## Split documents among clients - -With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. - -Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: - -1. Each client starts at a different point in the corpus. For example, in a track with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. -2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it will move on to the first split of the first document of the second corpus, and so on. - -## Options - -Use the following options to customize the bulk operation. - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk requested. -`ingest-percentage` No | Range [0, 100] | Defines using a number between [0, 100], how much of document corpus will be indexed. -`corpora` | No | List | A list of document corpus names that should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. -`indices` | No | List | A list of index names that defines which indexes should be used in the bulk index operation. OpenSearch Benchmark will only select document files that have a matching `target-index`. -`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a bulk-size of 1, you should set batch-size higher. -`pipeline` | No | String | Defines the name of an existing ingest pipeline that should be used. -`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. -`conflict-probability` | No | Percentage | A number between [0, 100] that defines how many of the documents be replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate index ID by itself, instead of relying on OpenSearch’s automatic ID generation. Default is `25%`. -`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index on ID conflicts. Default is `index`, which creates a new index during ID conflicts. -`recency` | No | Number | A number between [0,1] that indicates recency. Recency towards `1` bias conflicting IDs towards more recent IDs. Recency towards 0 considers all IDs for ID conflicts. -`detailed-results` | No | Boolean | Records more detailed [meta-data](#meta-data) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, this might incur additional overhead which can skew measurement results. This property must be set to true for individual bulk request failures to be logged by OpenSearch Benchmark. -`timeout` | No | Duration (In minutes) | Defines the time period that OpenSearch will wait per action until it has finished processing the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Defaults to `1m`. -`refresh` No | String | Controls OpenSearch's refresh behavior for bulk requests using the `refresh` bulk API query parameter. Valid values are `true`, where OpenSearch refreshes target shards in the background; `wait_for`, OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, where OpenSearch uses the default refresh behavior. - -## Meta-data - -The `bulk` operations always return the following meta-data: - -- `index`: The name of the affected index. If an index cannot be derived, returns `null`. -- `weight`: An operation-agnostic representation of the bulk size denoted by `units`. -- `unit`: The unit in which to interpret `weight`. -- `success`: A Boolean indicating whether the `bulk` request succeeded. -- `success-count`: The number of successfully processed bulk items for this request. This value will only be determined in case of errors or if the `bulk-size` has been specified in the documents. -- `error-count`: The number of failed bulk items for this request. -- `took`: The value of the `took` property in the bulk response. - -If `detailed-results` is `true` the following meta-data is also returned: - -- `ops`: A nested document with the operation name as key, such as `index`, `update`, or `delete` and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. -- `shards_histogram`: An array of hashes where each hash has two keys: `item-count` which contains the number of items to which a shard distribution applies, and `shards` contains another hash with the actual distribution of `total`, `successful`, and `failed` shards. -- `bulk-request-size-bytes`: The total size of the bulk requests body in bytes. -- `total-document-size-bytes`: The total size of all documents within the bulk request body in bytes. \ No newline at end of file diff --git a/_benchmark/reference/operations/operations.md b/_benchmark/reference/workloads/operations.md similarity index 79% rename from _benchmark/reference/operations/operations.md rename to _benchmark/reference/workloads/operations.md index c22976af82..03aefc5bc4 100644 --- a/_benchmark/reference/operations/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -1,9 +1,9 @@ --- layout: default title: operations -has_children: true -parent: OpenSearch Benchmark Reference -nav_order: 100 +parent: Workload reference +grand_parent: OpenSearch Benchmark Reference +nav_order: 110 --- # operations @@ -137,7 +137,7 @@ Parameter | Required | Type | Description :--- | :--- | :--- | :--- `index` | Yes | String | The name of the index. `body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/) -`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. ### Meta-data @@ -185,7 +185,7 @@ Use the following options when deleting all indexes indicated in the `indices` s Parameter | Required | Type | Description :--- | :--- | :--- | :--- `only-if-exists` | No | Boolean | Decides whether an index should only be deleted in the index exists. Default is `true`. -`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. Use the following options if you want to delete one or more indexes based on pattern indicated in the `index` option: @@ -233,7 +233,7 @@ Use the following options with the `cluster-health` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- `index` | Yes | String | The name of the index or indexes you want to delete. -`request-params` | No | List of settings | Contains any request parameters allowed by Cluster Health API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. ### Meta-data @@ -271,4 +271,53 @@ Parameter | Required | Type | Description ## search -The `search` operation +The `search` operation runs the [Search API](/api-reference/search/), which gives you the ability to run queries in OpenSearch Benchmark indexes. + +### Usage + +The follow example runs a `match_all` query inside the `search` operation: + +```yml +{ + "name": "default", + "operation-type": "search", + "body": { + "query": { + "match_all": {} + } + }, + "request-params": { + "_source_include": "some_field", + "analyze_wildcard": "false" + } +} +``` + +### Options + +The `search` operation uses the following options. + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`index` | No | String | The name of the index(es) or data streams that query targets. This options is only needed when the `indices` section contains more than one index. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. To query against all indexes in the workload, specify `"index": "_all"`. +`cache` | No | Boolean | Whether to use the query request cache. OpenSearch Benchmark defines no value. The default depends on the benchmark candidate settings and OpenSearch version. +`request-params` | No | List of settings | Contains any request parameters allowed by the Search API. +`body` | Yes | Request body | The query body that indicates which query to use and the query parameters. +`detailed-results` | No | Boolean | Records more detailed meta-data about queries. When `true`, OpenSearch Benchmark might incur additional overhead to return the detailed results, which can skew measurement results. This option does not work with `scroll` queries. +`results-per-page` | No | Integer | The number of documents to retrieve per page. This maps to the Search API’s `size` parameter, and can be used for scroll and non-scroll searches. Default is 10. + +### Meta-data + +The following meta-data is always returned: +- `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. +- `unit`: The unit in which to interpret weight. Always “ops” for regular queries and “pages” for scroll queries. +- `success`: A Boolean indicating whether the query has succeeded. + +If `detailed-results` is set to `true`, the following meta-data is also returned: +- `hits`: The total number of hits for this query. +- `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). +- `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. + + took: Value of the the took property in the query response. For scroll queries, this value is the sum of all took values in query responses. + + From a2261ea95eab9d574704cba578b2cf2d42a83cea Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Wed, 13 Dec 2023 14:22:28 -0600 Subject: [PATCH 05/21] Fix link Signed-off-by: Naarcha-AWS --- _benchmark/reference/workloads/operations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 03aefc5bc4..f61ee0ff8d 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -205,7 +205,7 @@ The `delete-index` operation returns the following meta-data. ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](/api-reference/cluster-apis/cluster-health/), which checks the cluster health status returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. ### Usage From 209035d5592ddc838d752a2d533264473b17c6a2 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Dec 2023 09:44:55 -0600 Subject: [PATCH 06/21] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 12 +++++------ .../reference/workloads/test-procedures.md | 20 +++++++++---------- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index f61ee0ff8d..967cd27b42 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -32,7 +32,7 @@ With multiple `clients`, OpenSearch Benchmark splits each document based on the Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: -1. Each client starts at a different point in the corpus. For example, in a track with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. +1. Each client starts at a different point in the corpus. For example, in a workload with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. 2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it will move on to the first split of the first document of the second corpus, and so on. ### Options @@ -47,7 +47,7 @@ Parameter | Required | Type | Description `indices` | No | List | A list of index names that defines which indexes should be used in the bulk index operation. OpenSearch Benchmark will only select document files that have a matching `target-index`. `batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a bulk-size of 1, you should set batch-size higher. `pipeline` | No | String | Defines the name of an existing ingest pipeline that should be used. -`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. +`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. `conflict-probability` | No | Percentage | A number between [0, 100] that defines how many of the documents be replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate index ID by itself, instead of relying on OpenSearch’s automatic ID generation. Default is `25%`. `on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index on ID conflicts. Default is `index`, which creates a new index during ID conflicts. `recency` | No | Number | A number between [0,1] that indicates recency. Recency towards `1` bias conflicting IDs towards more recent IDs. Recency towards 0 considers all IDs for ID conflicts. @@ -184,7 +184,7 @@ Use the following options when deleting all indexes indicated in the `indices` s Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`only-if-exists` | No | Boolean | Decides whether an index should only be deleted in the index exists. Default is `true`. +`only-if-exists` | No | Boolean | Decides whether an index should only be deleted if the index exists. Default is `true`. `request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. Use the following options if you want to delete one or more indexes based on pattern indicated in the `index` option: @@ -192,7 +192,7 @@ Use the following options if you want to delete one or more indexes based on pat Parameter | Required | Type | Description :--- | :--- | :--- | :--- `index` | Yes | String | The name of the index or indexes you want to delete. -`only-if-exists` | No | Boolean | Decides whether an index should only be deleted in the index exists. Default is `true`. +`only-if-exists` | No | Boolean | Decides whether an index should only be deleted if the index exists. Default is `true`. `request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. ### Meta-data @@ -205,7 +205,7 @@ The `delete-index` operation returns the following meta-data. ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. ### Usage @@ -318,6 +318,6 @@ If `detailed-results` is set to `true`, the following meta-data is also returned - `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). - `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. - took: Value of the the took property in the query response. For scroll queries, this value is the sum of all took values in query responses. + - `took`: The value of the the `took` property in the query response. For scroll queries, this value is the sum of all took values in query responses. diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index dfdb8c751b..9ed0ff26b6 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -10,7 +10,7 @@ nav_order: 100 If your workload only defines one benchmarking scenario specify the schedule on top-level. Use the challenge element if you want to specify additional properties like a name or a description. You can think of a challenge as a benchmarking scenario. If you have multiple challenges, you can define an array of challenges. -This section contains one or more challenges which describe the benchmark scenarios for this data set. A challenge can reference all operations that are defined in the operations section. +This section contains one or more test procedures which describe the benchmark scenarios for this data set. A test procedure can reference all operations that are defined in the operations section. Parameter | Required | Type | Description :--- | :--- | :--- | :--- @@ -133,7 +133,7 @@ In the following example, `parallel-task-1` and `parallel-task-2` execute a `bul ```yml { "name": "parallel-any", - "description": "Track completed-by property", + "description": "Workload completed-by property", "schedule": [ { "parallel": { @@ -166,28 +166,28 @@ In the following example, `parallel-task-1` and `parallel-task-2` execute a `bul The `parallel` element supports all `schedule` parameters, in addition to the following: `tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. -`completed-by` | No | String | Allows you define the name of one task in the tasks list, or the value `any`. If a specific task name has been provided then as soon as the named task has completed, the whole parallel task structure is considered completed. If the of value `any` is provided, then any task that completes first renders all other tasks specified in parallel structure complete. If this property is not explicitly defined, the parallel task structure is considered completed as soon as the tasks in the element complete. +`completed-by` | No | String | Allows you to define the name of one task in the tasks list, or the value `any`. If `completed-by` is set to the name of one of the tasks in the list, the parallel task structure is considered complete once that specific task has been completed. If `completed-by` is set to `any`, the parallel task structure is considered complete when any of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the parallel task structure is considered complete as soon as all of the tasks in the list has been completed. ### Iteration-based options -Iteration-based options determine the number of times an operation should fun. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. +Iteration-based options determine the number of times an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`iterations` | No | Integer | Defines a default value for all tasks of the parallel element. Default is `1`. -`warmup-iterations` | No | Integer | Number of iterations that each client should execute to warmup the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is `0`. +`iterations` | No | Integer | The number of times a client should execute an operation. These are included in the measured results. Default is 1. +`warmup-iterations` | No | Integer | The number of times a client should execute an operation for the purpose of warming up the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is 0. ### Time-based options -Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal with batch-style operations which may require an additional warmup period, including batch style operations. +Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal for batch-style operations which may require an additional warmup period. To configure a time-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`time-period` | No | Duration | The time period in seconds that OpenSearch Benchmark considers for measurement. Usually not required for bulk indexing, since OpenSearch Benchmark will index all documents at according to the `warmup-time-period`. -`ramp-up-time-period` | No | Integer | Determines the number of clients used at the end of the specified time period in seconds, which can help increase load gradually. This prevents load spikes from occurring before the benchmark is warmed up. This property requires a `warmup-time-period` to be set as well, which must be less then the ramp up time period. Default is `0`. -`warmup-time-period` | No | Integer | The time period in seconds to warmup of the benchmark candidate. All response data captured during the warmup period will not appear in the measurement results. +`time-period` | No | Integer | The time period in seconds that OpenSearch Benchmark considers for measurement. This is not required for bulk-indexing as OpenSearch Benchmark will bulk index all documents and naturally measure all samples after the `warmup-time-period` specified. +`ramp-up-time-period` | No | Integer | The time period in seconds in which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. +`warmup-time-period` | No | Integer | The time period in seconds to warmup the benchmark candidate. All response data captured during the warmup period will not appear in the measurement results. From f0e52ddef37e55266b653fb8a1619f31a7df1a0d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Dec 2023 09:45:09 -0600 Subject: [PATCH 07/21] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/test-procedures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 9ed0ff26b6..b528868a81 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -18,7 +18,7 @@ Parameter | Required | Type | Description `description` | No | String | A human readable description of the test procedure. `user-info` | No | String | A message that is printed at the beginning of the test intended to notify the user about important information related to the test, such as deprecations. `default` | No | Boolean | When set to `true`, OpenSearch Benchmark selects this test procedure by default if the user did not specify another `test-procedure` on the command line. If your workload only defines one challenge, it is implicitly selected as default, otherwise you need to define `"default": true` on exactly one challenge. -[schedule](#schedule) | Yes | Array | Defines the workload. +[schedule](#schedule) | Yes | Array | Defines the order in which tasks in the workload are run. ## schedule From bb571b2288a130a95f557b7e719ec540ed0dfa86 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Dec 2023 09:46:19 -0600 Subject: [PATCH 08/21] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/test-procedures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index b528868a81..73391286e7 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -8,7 +8,7 @@ nav_order: 100 # test_procedures -If your workload only defines one benchmarking scenario specify the schedule on top-level. Use the challenge element if you want to specify additional properties like a name or a description. You can think of a challenge as a benchmarking scenario. If you have multiple challenges, you can define an array of challenges. +If your workload only defines one benchmarking scenario specify the schedule on top-level. Use the `test-procedures` element if you want to specify additional properties like a name or a description. You can think of a test procedure as a benchmarking scenario. If you have multiple test procedures, you can define an array of challenges. This section contains one or more test procedures which describe the benchmark scenarios for this data set. A test procedure can reference all operations that are defined in the operations section. From f7df355b21bf072ee641f842773264fb5c17a228 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Dec 2023 09:46:31 -0600 Subject: [PATCH 09/21] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/test-procedures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 73391286e7..39159e562e 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -17,7 +17,7 @@ Parameter | Required | Type | Description `name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces so that the name is easy to enter on the command line. `description` | No | String | A human readable description of the test procedure. `user-info` | No | String | A message that is printed at the beginning of the test intended to notify the user about important information related to the test, such as deprecations. -`default` | No | Boolean | When set to `true`, OpenSearch Benchmark selects this test procedure by default if the user did not specify another `test-procedure` on the command line. If your workload only defines one challenge, it is implicitly selected as default, otherwise you need to define `"default": true` on exactly one challenge. +`default` | No | Boolean | When set to `true`, OpenSearch Benchmark selects this test procedure by default if the user did not specify another `test-procedure` on the command line. If your workload only defines one test procedure, it is implicitly selected as default, otherwise you need to define `"default": true` on exactly one challenge. [schedule](#schedule) | Yes | Array | Defines the order in which tasks in the workload are run. From 7d1b4825f031825ff9a2525e15ce9f9b8f8f6744 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Dec 2023 09:47:25 -0600 Subject: [PATCH 10/21] Update test-procedures.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../reference/workloads/test-procedures.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 39159e562e..b65b65d18f 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -120,6 +120,16 @@ OpenSearch Benchmark requires one of the following options when running a task. `target-interval` | No | Interval | Defines an internal of less 1 / target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. +### Iteration-based options + +Iteration-based options determine the number of times an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. + + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`iterations` | No | Integer | The number of times a client should execute an operation. These are included in the measured results. Default is 1. +`warmup-iterations` | No | Integer | The number of times a client should execute an operation for the purpose of warming up the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is 0. + ### Parallel tasks The `parallel` element runs tasks wrapped inside the element concurrently. @@ -168,16 +178,6 @@ The `parallel` element supports all `schedule` parameters, in addition to the fo `tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. `completed-by` | No | String | Allows you to define the name of one task in the tasks list, or the value `any`. If `completed-by` is set to the name of one of the tasks in the list, the parallel task structure is considered complete once that specific task has been completed. If `completed-by` is set to `any`, the parallel task structure is considered complete when any of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the parallel task structure is considered complete as soon as all of the tasks in the list has been completed. -### Iteration-based options - -Iteration-based options determine the number of times an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. - - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`iterations` | No | Integer | The number of times a client should execute an operation. These are included in the measured results. Default is 1. -`warmup-iterations` | No | Integer | The number of times a client should execute an operation for the purpose of warming up the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is 0. - ### Time-based options Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal for batch-style operations which may require an additional warmup period. From d7e74633961166decf3ee600566d0ebed79c7963 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 08:43:14 -0600 Subject: [PATCH 11/21] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 130 +++++++++--------- .../reference/workloads/test-procedures.md | 43 +++--- 2 files changed, 86 insertions(+), 87 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 967cd27b42..6f23357fe3 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -12,11 +12,11 @@ The `operations` element contains a list of all operations that are available wh ## bulk -The `bulk` operation type allows you run [bulk](/api-reference/document-apis/bulk/) requests as a task. +The `bulk` operation type allows for running [bulk](/api-reference/document-apis/bulk/) requests as a task. ### Usage -The following example shows a `bulk` operations with a `bulk-size` of 5000 documents. +The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents. ```yml { @@ -28,50 +28,50 @@ The following example shows a `bulk` operations with a `bulk-size` of 5000 docum ### Split documents among clients -With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. +With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. Multiple `clients` parallelizes the bulk index operations, but doesn't preserve the ingestion order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. -Additionally, if there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: +If there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: 1. Each client starts at a different point in the corpus. For example, in a workload with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. -2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it will move on to the first split of the first document of the second corpus, and so on. +2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it moves to the first split of the first document of the second corpus, and so on. ### Options -Use the following options to customize the bulk operation. +Use the following options to customize the `bulk` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk requested. -`ingest-percentage` No | Range [0, 100] | Defines using a number between [0, 100], how much of document corpus will be indexed. -`corpora` | No | List | A list of document corpus names that should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. -`indices` | No | List | A list of index names that defines which indexes should be used in the bulk index operation. OpenSearch Benchmark will only select document files that have a matching `target-index`. -`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a bulk-size of 1, you should set batch-size higher. -`pipeline` | No | String | Defines the name of an existing ingest pipeline that should be used. -`conflicts` | No | String | The type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are ‘sequential’, a document ID is replaced with a sequentially increasing ID, and ‘random’, where a document ID is replaced with a random document ID. -`conflict-probability` | No | Percentage | A number between [0, 100] that defines how many of the documents be replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate index ID by itself, instead of relying on OpenSearch’s automatic ID generation. Default is `25%`. -`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index on ID conflicts. Default is `index`, which creates a new index during ID conflicts. -`recency` | No | Number | A number between [0,1] that indicates recency. Recency towards `1` bias conflicting IDs towards more recent IDs. Recency towards 0 considers all IDs for ID conflicts. -`detailed-results` | No | Boolean | Records more detailed [meta-data](#meta-data) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, this might incur additional overhead which can skew measurement results. This property must be set to true for individual bulk request failures to be logged by OpenSearch Benchmark. -`timeout` | No | Duration (In minutes) | Defines the time period that OpenSearch will wait per action until it has finished processing the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Defaults to `1m`. -`refresh` No | String | Controls OpenSearch's refresh behavior for bulk requests using the `refresh` bulk API query parameter. Valid values are `true`, where OpenSearch refreshes target shards in the background; `wait_for`, OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, where OpenSearch uses the default refresh behavior. - -### Meta-data - -The `bulk` operations always returns the following meta-data: - -- `index`: The name of the affected index. If an index cannot be derived, returns `null`. +`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk request. +`ingest-percentage` | No | Range [0, 100] | Defines the portion of the document corpus to be indexed. Valid values are a range between 0 and 100. +`corpora` | No | List | Defines which document corpus names should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. +`indices` | No | List | Defines which indexes should be used in the bulk index operation. OpenSearch Benchmark only selects document files that have a matching `target-index`. +`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads simultaneously. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a `bulk-size` of `1`, you should set `batch-size` higher. +`pipeline` | No | String | Defines which existing ingest pipeline to use. +`conflicts` | No | String | Defines the type of index `conflicts` to simulate. If not specified, none are simulated. Valid values are ‘sequential’, in which a document ID is replaced with a sequentially increasing ID, and ‘random’, in which a document ID is replaced with a random document ID. +`conflict-probability` | No | Percentage | Defines how many of the documents are replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate the index ID itself, instead of using OpenSearch's automatic ID generation. Valid values are numbers between 0 and 100. Default is `25%`. +`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index for ID conflicts. Default is `index`, which creates a new index during ID conflicts. +`recency` | No | Number | Uses a number between 0 and 1 to indicate recency. Recency toward `1` bias conflicting IDs toward more recent IDs. Recency toward 0 considers all IDs for ID conflicts. +`detailed-results` | No | Boolean | Records more detailed [metadata](#metadata) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, additional overhead may be incurred, which can skew measurement results. This property must be set to `true` so that OpenSearch Benchmark logs individual bulk request failures. +`timeout` | No | Duration | Defines the time period (in minutes) that OpenSearch waits per action until completing the processing of the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. Default is `1m`. +`refresh` | No | String | Controls OpenSearch's refresh behavior for bulk requests that use the `refresh` bulk API query parameter. Valid values are `true`, in which OpenSearch refreshes target shards in the background; `wait_for`, in which OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, in which OpenSearch uses the default refresh behavior. + +### Metadata + +The `bulk` operation always returns the following metadata: + +- `index`: The name of the affected index. If an index cannot be derived, it returns `null`. - `weight`: An operation-agnostic representation of the bulk size denoted by `units`. - `unit`: The unit in which to interpret `weight`. - `success`: A Boolean indicating whether the `bulk` request succeeded. -- `success-count`: The number of successfully processed bulk items for this request. This value will only be determined in case of errors or if the `bulk-size` has been specified in the documents. -- `error-count`: The number of failed bulk items for this request. +- `success-count`: The number of successfully processed bulk items for this request. This value is determined when there are errors or when the `bulk-size` has been specified in the documents. +- `error-count`: The number of failed bulk items for the request. - `took`: The value of the `took` property in the bulk response. -If `detailed-results` is `true` the following meta-data is also returned: +If `detailed-results` is `true`, the following metadata is returned: - `ops`: A nested document with the operation name as key, such as `index`, `update`, or `delete` and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. -- `shards_histogram`: An array of hashes where each hash has two keys: `item-count` which contains the number of items to which a shard distribution applies, and `shards` contains another hash with the actual distribution of `total`, `successful`, and `failed` shards. -- `bulk-request-size-bytes`: The total size of the bulk requests body in bytes. +- `shards_histogram`: An array of hashes where each hash has two keys. The `item-count` key contains the number of items to which a shard distribution applies. The `shards` key contains a hash with the actual distribution of `total`, `successful`, and `failed` shards. +- `bulk-request-size-bytes`: The total size of the bulk request body in bytes. - `total-document-size-bytes`: The total size of all documents within the bulk request body in bytes. ## create-index @@ -98,7 +98,7 @@ The following example creates all indexes defined in the `indices` section of th } ``` -The next example creates a new index, with all index setting specified in the body of the operation: +The following example creates a new index with all index settings specified in the operation body: ```yml { @@ -129,27 +129,27 @@ Use the following options when creating all indexes from the `indices` section o Parameter | Required | Type | Description :--- | :--- | :--- | :--- `settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in `indices` section of the workload. -`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. Use the following options when creating a single index in the operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | Yes | String | The name of the index. +`index` | Yes | String | The index name. `body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/) -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. -### Meta-data +### Metadata -The `create-index` operation returns the following meta-data: +The `create-index` operation returns the following metadata: `weight`: The number of indexes created by the operation. -`unit`: Always “ops”. +`unit`: Always `ops`, for the number of operations inside the workload. `success`: A Boolean indicating whether the operation has succeeded. ## delete-index -The `delete-index` runs the [Delete Index API](api-reference/index-apis/delete-index/). Like the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or you can delete one or more indexes based on string passed in the `index` setting. +The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. ### Usage @@ -184,20 +184,20 @@ Use the following options when deleting all indexes indicated in the `indices` s Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`only-if-exists` | No | Boolean | Decides whether an index should only be deleted if the index exists. Default is `true`. -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`only-if-exists` | No | Boolean | Decides whether an index should be deleted only if the index exists. Default is `true`. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. -Use the following options if you want to delete one or more indexes based on pattern indicated in the `index` option: +Use the following options if you want to delete one or more indexes based on the pattern indicated in the `index` option: Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | Yes | String | The name of the index or indexes you want to delete. -`only-if-exists` | No | Boolean | Decides whether an index should only be deleted if the index exists. Default is `true`. -`request-params` | No | List of settings | Contains any request parameters allowed by Create Index API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`index` | Yes | String | The index or indexes you want to delete. +`only-if-exists` | No | Boolean | Decides whether an index should be deleted only if the index exists. Default is `true`. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. -### Meta-data +### Metadata -The `delete-index` operation returns the following meta-data. +The `delete-index` operation returns the following metadata. `weight`: The number of indexes created by the operation. `unit`: Always “ops”. @@ -205,12 +205,12 @@ The `delete-index` operation returns the following meta-data. ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according the parameters set in the `request-params` option. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when an the health check fails. +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. ### Usage -The following example creates a `cluster-health` operation which checks for a `green` health status on the any `log-*` indexes: +The following example creates a `cluster-health` operation that checks for a `green` health status on the any `log-*` indexes: ```yml { @@ -232,22 +232,22 @@ Use the following options with the `cluster-health` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | Yes | String | The name of the index or indexes you want to delete. -`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark will not attempt to serialize the parameters and pass them as is. +`index` | Yes | String | The index or indexes you want to delete. +`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. ### Meta-data -The `cluster-health` operation returns the following meta-data. +The `cluster-health` operation returns the following metadata. - `weight`: Always 1. - `unit`: Always “ops”. -- `success`: A Boolean which indicates whether the operation has succeeded. +- `success`: A Boolean that indicates whether the operation has succeeded. - `cluster-status`: Current cluster status. - `relocating-shards`: The number of shards currently relocating to a different node. ## refresh -The `refresh` operations runs the Refresh API. This `operation` returns no meta-data. +The `refresh` operations runs the Refresh API. The `operation` returns no metadata. ### Usage @@ -267,15 +267,15 @@ The `refresh` operation uses the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | No | String | The name of the index(es) or data streams to refresh. +`index` | No | String | The name of indexes or data streams to refresh. ## search -The `search` operation runs the [Search API](/api-reference/search/), which gives you the ability to run queries in OpenSearch Benchmark indexes. +The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. ### Usage -The follow example runs a `match_all` query inside the `search` operation: +The following example runs a `match_all` query inside the `search` operation: ```yml { @@ -299,25 +299,25 @@ The `search` operation uses the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | No | String | The name of the index(es) or data streams that query targets. This options is only needed when the `indices` section contains more than one index. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. To query against all indexes in the workload, specify `"index": "_all"`. -`cache` | No | Boolean | Whether to use the query request cache. OpenSearch Benchmark defines no value. The default depends on the benchmark candidate settings and OpenSearch version. +`index` | No | String | The indexes or data streams targeted by the query. The option is needed only when the `indices` section contains two or more indexes. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. Specify `"index": "_all"` to query against all indexes in the workload. +`cache` | No | Boolean | Specifies whether to use the query request cache. OpenSearch Benchmark defines no value. The default depends on the benchmark candidate settings and the OpenSearch version. `request-params` | No | List of settings | Contains any request parameters allowed by the Search API. -`body` | Yes | Request body | The query body that indicates which query to use and the query parameters. -`detailed-results` | No | Boolean | Records more detailed meta-data about queries. When `true`, OpenSearch Benchmark might incur additional overhead to return the detailed results, which can skew measurement results. This option does not work with `scroll` queries. -`results-per-page` | No | Integer | The number of documents to retrieve per page. This maps to the Search API’s `size` parameter, and can be used for scroll and non-scroll searches. Default is 10. +`body` | Yes | Request body | Indicates which query to use and the query parameters. +`detailed-results` | No | Boolean | Records more detailed metadata about queries. When set to `true`, additional overhead may be incurred, which can skew measurement results. The option does not work with `scroll` queries. +`results-per-page` | No | Integer | Specifies the number of documents to retrieve per page. This maps to the Search API `size` parameter and can be used for scroll and non-scroll searches. Default is 10. -### Meta-data +### Metadata -The following meta-data is always returned: +The following metadata is always returned: - `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. - `unit`: The unit in which to interpret weight. Always “ops” for regular queries and “pages” for scroll queries. - `success`: A Boolean indicating whether the query has succeeded. -If `detailed-results` is set to `true`, the following meta-data is also returned: +If `detailed-results` is set to `true`, the following metadata is also returned: - `hits`: The total number of hits for this query. - `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). - `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. - - `took`: The value of the the `took` property in the query response. For scroll queries, this value is the sum of all took values in query responses. + - `took`: The value of the `took` property in the query response. For scroll queries, the value is the sum of all `took` values in the query responses. diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index b65b65d18f..684af2167f 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -8,30 +8,30 @@ nav_order: 100 # test_procedures -If your workload only defines one benchmarking scenario specify the schedule on top-level. Use the `test-procedures` element if you want to specify additional properties like a name or a description. You can think of a test procedure as a benchmarking scenario. If you have multiple test procedures, you can define an array of challenges. +If your workload only defines one benchmarking scenario, specify the schedule on top-level. Use the `test-procedures` element to specify additional properties, such as a name or description. A test procedure is like a benchmarking scenario. If you have multiple test procedures, you can define a variety of challenges. -This section contains one or more test procedures which describe the benchmark scenarios for this data set. A test procedure can reference all operations that are defined in the operations section. +The following table lists test procedures for the benchmark scenarios in this dataset. A test procedure can reference all operations that are defined in the operations section. Parameter | Required | Type | Description :--- | :--- | :--- | :--- `name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces so that the name is easy to enter on the command line. `description` | No | String | A human readable description of the test procedure. -`user-info` | No | String | A message that is printed at the beginning of the test intended to notify the user about important information related to the test, such as deprecations. -`default` | No | Boolean | When set to `true`, OpenSearch Benchmark selects this test procedure by default if the user did not specify another `test-procedure` on the command line. If your workload only defines one test procedure, it is implicitly selected as default, otherwise you need to define `"default": true` on exactly one challenge. -[schedule](#schedule) | Yes | Array | Defines the order in which tasks in the workload are run. +`user-info` | No | String | Outputs a message at the start of the test to notify user about important test-related information, for example, deprecations. +`default` | No | Boolean | When set to `true`, selects the default test procedure if the user did not specify a test procedure on the command line. If the workload only defines one test procedure, it is implicitly selected as default. Otherwise, you must define `"default": true` on exactly one challenge. +[`schedule`](#Schedule) | Yes | Array | Defines the order in which tasks in the workload are run. ## schedule -The schedule element contains a list of a tasks, which are operations supported by OpenSearch Benchmark (OSB), run by the workload during the benchmark test. +The `schedule` element contains a list of a tasks, which are operations supported by OpenSearch Benchmark, that are run by the workload during the benchmark test. ### Usage -The `schedule` element can define tasks in the following ways: +The `schedule` element defines tasks using the following methods described in this section. #### Using the operations element -The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` is needed. `match-all-query` requires a query `body` and `operation-type`. +The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` are needed. The `match-all-query` parameter requires a query `body` and `operation-type`. Operations defined in the `operations` element can be reused more than once in the schedule: @@ -70,7 +70,7 @@ Operations defined in the `operations` element can be reused more than once in t #### Defining operations inline -If you don't want reuse an operation in the schedule, you can also define operations inside the `schedule` element, as shown in the following example: +If you don't want reuse an operation in the schedule, you can define operations inside the `schedule` element, as shown in the following example: ```yml { @@ -109,15 +109,15 @@ Parameter | Required | Type | Description :--- | :--- | :--- | :--- `operation` | Yes | List | Refers to either the name of an operation, defined in the `operations` element, or includes the entire operation inline. `name` | No | String | Specifies a unique name for the task when multiple tasks use the same operation. -`tags` | No | String | Unique identifiers that can be used to filter between tasks.clients (optional, defaults to 1): The number of clients that should execute a task concurrently. -`clients` | No | Integer | The number of clients that concurrently run the task. Default is `1`. +`tags` | No | String | Unique identifiers that can be used to filter between `tasks.clients`, or the number of clients that should execute a task concurrently. Default is 1. +`clients` | No | Integer | Specifies the number of clients that concurrently run the task. Default is `1`. ### Target options -OpenSearch Benchmark requires one of the following options when running a task. +OpenSearch Benchmark requires one of the following options when running a task: -`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that this is a throughput benchmark and runs the task as fast as possible. This useful batch operations, where it is more important to achieve the best throughput as opposed to better latency. When defined, the target specifies the number of requests per second over all clients. For example, if you specify `target-throughput: 1000` with 8 clients, it means that each client will issue 125 (= 1000 / 8) requests per second. -`target-interval` | No | Interval | Defines an internal of less 1 / target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). +`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving best throughput is preferred over better latency. When defined, the target specifies the number of requests per second over all clients. For example, if you specify `target-throughput: 1000` with eight clients, each client issues 125 (= 1000 / 8) requests per second. +`target-interval` | No | Interval | Defines an interval of less 1 divided by the target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. ### Iteration-based options @@ -127,14 +127,14 @@ Iteration-based options determine the number of times an operation should run. I Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`iterations` | No | Integer | The number of times a client should execute an operation. These are included in the measured results. Default is 1. -`warmup-iterations` | No | Integer | The number of times a client should execute an operation for the purpose of warming up the benchmark candidate. Warmup iterations will not show up in the measurement results. Default is 0. +`iterations` | No | Integer | Specifies the number of times a client should execute an operation. These are included in the measured results. Default is `1`. +`warmup-iterations` | No | Integer | Specifies the number of times a client should execute an operation for warming up the benchmark candidate. The `warmup-iterations` do not appear in the measurement results. Default is `0`. ### Parallel tasks The `parallel` element runs tasks wrapped inside the element concurrently. -When running tasks in parallel, each task requires the `client` option to make sure clients inside your benchmark are reserved for that task. Otherwise, when the client option is specified inside the parallel element without a connection to the task, the benchmark will use that number of clients for all tasks. +When running tasks in parallel, each task requires the `client` option to make sure clients inside your benchmark are reserved for that task. Otherwise, when the `client` option is specified inside the `parallel` element without a connection to the task, the benchmark uses that number of clients for all tasks. #### Usage @@ -176,7 +176,7 @@ In the following example, `parallel-task-1` and `parallel-task-2` execute a `bul The `parallel` element supports all `schedule` parameters, in addition to the following: `tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. -`completed-by` | No | String | Allows you to define the name of one task in the tasks list, or the value `any`. If `completed-by` is set to the name of one of the tasks in the list, the parallel task structure is considered complete once that specific task has been completed. If `completed-by` is set to `any`, the parallel task structure is considered complete when any of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the parallel task structure is considered complete as soon as all of the tasks in the list has been completed. +`completed-by` | No | String | Allows you to define the name of one task in the tasks list or the value `any`. If `completed-by` is set to the name of one task in the list, the `parallel-task` structure is considered complete once that specific task has been completed. If `completed-by` is set to `any`, the `parallel-task` structure is considered complete when any of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the `parallel-task` structure is considered complete as soon as all the tasks in the list have been completed. ### Time-based options @@ -186,8 +186,7 @@ To configure a time-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`time-period` | No | Integer | The time period in seconds that OpenSearch Benchmark considers for measurement. This is not required for bulk-indexing as OpenSearch Benchmark will bulk index all documents and naturally measure all samples after the `warmup-time-period` specified. -`ramp-up-time-period` | No | Integer | The time period in seconds in which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. -`warmup-time-period` | No | Integer | The time period in seconds to warmup the benchmark candidate. All response data captured during the warmup period will not appear in the measurement results. - +`time-period` | No | Integer | Specifies the time period in seconds that OpenSearch Benchmark considers for measurement. This is not required for bulk indexing because OpenSearch Benchmark bulk indexes all documents and naturally measures all samples after the specified `warmup-time-period`. +`ramp-up-time-period` | No | Integer | Specifies the time period in seconds in which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. +`warmup-time-period` | No | Integer | Specifies the time period in seconds to warm up the benchmark candidate. All response data captured during the warmup period do not appear in the measurement results. From e2fc5b4bfd1c5ac95a0a947d044f61a86bed5a3d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 08:44:04 -0600 Subject: [PATCH 12/21] Update _benchmark/reference/workloads/operations.md Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 6f23357fe3..76991c434c 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -8,7 +8,7 @@ nav_order: 110 # operations -The `operations` element contains a list of all operations that are available when specifying a schedule. +The `operations` element contains a list of all available operations for specifying a schedule. ## bulk From 7ca24af91bf77385e9979c6de04dc93b5bfb185c Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 08:49:40 -0600 Subject: [PATCH 13/21] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 3 +-- _benchmark/reference/workloads/test-procedures.md | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 76991c434c..a53d962b6b 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -235,7 +235,7 @@ Parameter | Required | Type | Description `index` | Yes | String | The index or indexes you want to delete. `request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. -### Meta-data +### Metadata The `cluster-health` operation returns the following metadata. @@ -317,7 +317,6 @@ If `detailed-results` is set to `true`, the following metadata is also returned: - `hits`: The total number of hits for this query. - `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). - `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. - - `took`: The value of the `took` property in the query response. For scroll queries, the value is the sum of all `took` values in the query responses. diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 684af2167f..028dbdb625 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -15,7 +15,7 @@ The following table lists test procedures for the benchmark scenarios in this da Parameter | Required | Type | Description :--- | :--- | :--- | :--- `name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces so that the name is easy to enter on the command line. -`description` | No | String | A human readable description of the test procedure. +`description` | No | String | Describes the test procedure in a human-readable format. `user-info` | No | String | Outputs a message at the start of the test to notify user about important test-related information, for example, deprecations. `default` | No | Boolean | When set to `true`, selects the default test procedure if the user did not specify a test procedure on the command line. If the workload only defines one test procedure, it is implicitly selected as default. Otherwise, you must define `"default": true` on exactly one challenge. [`schedule`](#Schedule) | Yes | Array | Defines the order in which tasks in the workload are run. @@ -124,7 +124,6 @@ OpenSearch Benchmark requires one of the following options when running a task: Iteration-based options determine the number of times an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. - Parameter | Required | Type | Description :--- | :--- | :--- | :--- `iterations` | No | Integer | Specifies the number of times a client should execute an operation. These are included in the measured results. Default is `1`. From 7e6c8686b0000729b90c5e8c896f0b915609a8ee Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 10:47:49 -0600 Subject: [PATCH 14/21] Update operations.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index a53d962b6b..8326667941 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -35,7 +35,7 @@ If there are multiple documents or corpora, OpenSearch Benchmark tries to index 1. Each client starts at a different point in the corpus. For example, in a workload with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. 2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it moves to the first split of the first document of the second corpus, and so on. -### Options +### Configuration options Use the following options to customize the `bulk` operation. @@ -122,7 +122,7 @@ The following example creates a new index with all index settings specified in t } ``` -### Options +### Configuration options Use the following options when creating all indexes from the `indices` section of a workload. @@ -178,7 +178,7 @@ The following example deletes all `logs_*` indexes: } ``` -### Options +### Configuration options Use the following options when deleting all indexes indicated in the `indices` section of the workload. @@ -226,7 +226,7 @@ The following example creates a `cluster-health` operation that checks for a `gr ``` -### Options +### Configuration options Use the following options with the `cluster-health` operation. @@ -261,7 +261,7 @@ The following example refreshes all `logs-*` indexes: } ``` -### Options +### Configuration options The `refresh` operation uses the following options. @@ -293,7 +293,7 @@ The following example runs a `match_all` query inside the `search` operation: } ``` -### Options +### Configuration options The `search` operation uses the following options. From cd16240c6cd834839de889a65a223060e6843c36 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 10:48:25 -0600 Subject: [PATCH 15/21] Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 8326667941..0d7f5a11a2 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -310,7 +310,7 @@ Parameter | Required | Type | Description The following metadata is always returned: - `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. -- `unit`: The unit in which to interpret weight. Always “ops” for regular queries and “pages” for scroll queries. +- `unit`: The unit used to interpret weight, which is `ops` for regular queries and `pages` for scroll queries. - `success`: A Boolean indicating whether the query has succeeded. If `detailed-results` is set to `true`, the following metadata is also returned: From 21bbd5fb2b798f6457c20c140a773592d05ada9a Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 10:51:39 -0600 Subject: [PATCH 16/21] Update operations.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index 0d7f5a11a2..acc36430a3 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -200,7 +200,7 @@ Parameter | Required | Type | Description The `delete-index` operation returns the following metadata. `weight`: The number of indexes created by the operation. -`unit`: Always “ops”. +`unit`: Always `ops`, for the number of operations inside the workload. `success`: A Boolean indicating whether the operation has succeeded. ## cluster-health @@ -232,17 +232,17 @@ Use the following options with the `cluster-health` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | Yes | String | The index or indexes you want to delete. +`index` | Yes | String | The index or indexes you want to assess. `request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. ### Metadata The `cluster-health` operation returns the following metadata. -- `weight`: Always 1. -- `unit`: Always “ops”. -- `success`: A Boolean that indicates whether the operation has succeeded. -- `cluster-status`: Current cluster status. +`weight`: The number of indexes the `cluster-health` operation assesses. Alwasys `1`, since the operation runs once per index. +`unit`: Always `ops`, for the number of operations inside the workload. +`success`: A Boolean indicating whether the operation has succeeded. +- `cluster-status`: The current cluster status. - `relocating-shards`: The number of shards currently relocating to a different node. ## refresh @@ -309,11 +309,13 @@ Parameter | Required | Type | Description ### Metadata The following metadata is always returned: + - `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. - `unit`: The unit used to interpret weight, which is `ops` for regular queries and `pages` for scroll queries. - `success`: A Boolean indicating whether the query has succeeded. If `detailed-results` is set to `true`, the following metadata is also returned: + - `hits`: The total number of hits for this query. - `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). - `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. From 015576ae729dc3608a787dcadd76ae2332934eb7 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 11:57:42 -0600 Subject: [PATCH 17/21] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 96 +++++++++---------- .../reference/workloads/test-procedures.md | 48 +++++----- 2 files changed, 72 insertions(+), 72 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index acc36430a3..b89fc281ac 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -12,11 +12,11 @@ The `operations` element contains a list of all available operations for specify ## bulk -The `bulk` operation type allows for running [bulk](/api-reference/document-apis/bulk/) requests as a task. +The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. ### Usage -The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents. +The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents: ```yml { @@ -28,9 +28,9 @@ The following example shows a `bulk` operation type with a `bulk-size` of `5000` ### Split documents among clients -With multiple `clients`, OpenSearch Benchmark splits each document based on the number of clients set. Multiple `clients` parallelizes the bulk index operations, but doesn't preserve the ingestion order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other indexes starting from the middle. +When you have multiple `clients`, OpenSearch Benchmark splits each document based on the set number of clients. Having multiple `clients` parallelizes the bulk index operations but doesn't preserve the ingestion order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other client indexes the document starting from the middle. -If there are multiple documents or corpora, OpenSearch Benchmark tries to index all documents in parallel in two ways: +If there are multiple documents or corpora, OpenSearch Benchmark attempts to index all documents in parallel in two ways: 1. Each client starts at a different point in the corpus. For example, in a workload with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus. 2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it moves to the first split of the first document of the second corpus, and so on. @@ -41,45 +41,45 @@ Use the following options to customize the `bulk` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`bulk-size` | Yes | Number | Sets the number of documents ingested in the bulk request. -`ingest-percentage` | No | Range [0, 100] | Defines the portion of the document corpus to be indexed. Valid values are a range between 0 and 100. +`bulk-size` | Yes | Number | Specifies the number of documents to be ingested in the bulk request. +`ingest-percentage` | No | Range [0, 100] | Defines the portion of the document corpus to be indexed. Valid values are numbers between 0 and 100. `corpora` | No | List | Defines which document corpus names should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you don’t want to index all of them during the bulk request. `indices` | No | List | Defines which indexes should be used in the bulk index operation. OpenSearch Benchmark only selects document files that have a matching `target-index`. -`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads simultaneously. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a `bulk-size` of `1`, you should set `batch-size` higher. +`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads simultaneously. This is an expert setting and is only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a `bulk-size` of `1`, you should set a higher `batch-size`. `pipeline` | No | String | Defines which existing ingest pipeline to use. -`conflicts` | No | String | Defines the type of index `conflicts` to simulate. If not specified, none are simulated. Valid values are ‘sequential’, in which a document ID is replaced with a sequentially increasing ID, and ‘random’, in which a document ID is replaced with a random document ID. -`conflict-probability` | No | Percentage | Defines how many of the documents are replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate the index ID itself, instead of using OpenSearch's automatic ID generation. Valid values are numbers between 0 and 100. Default is `25%`. +`conflicts` | No | String | Defines the type of index `conflicts` to simulate. If not specified, none are simulated. Valid values are ‘sequential’, which replaces a document ID with a sequentially increasing document ID, and ‘random’, which replaces a document ID with a random document ID. +`conflict-probability` | No | Percentage | Defines how many of the documents are replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate the index ID itself instead of using OpenSearch's automatic ID generation. Valid values are numbers between 0 and 100. Default is `25%`. `on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index for ID conflicts. Default is `index`, which creates a new index during ID conflicts. -`recency` | No | Number | Uses a number between 0 and 1 to indicate recency. Recency toward `1` bias conflicting IDs toward more recent IDs. Recency toward 0 considers all IDs for ID conflicts. +`recency` | No | Number | Uses a number between 0 and 1 to indicate recency. A recency closer to `1` biases conflicting IDs toward more recent IDs. A recency closer to 0 considers all IDs for ID conflicts. `detailed-results` | No | Boolean | Records more detailed [metadata](#metadata) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, additional overhead may be incurred, which can skew measurement results. This property must be set to `true` so that OpenSearch Benchmark logs individual bulk request failures. -`timeout` | No | Duration | Defines the time period (in minutes) that OpenSearch waits per action until completing the processing of the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. Default is `1m`. -`refresh` | No | String | Controls OpenSearch's refresh behavior for bulk requests that use the `refresh` bulk API query parameter. Valid values are `true`, in which OpenSearch refreshes target shards in the background; `wait_for`, in which OpenSearch blocks bulk requests until affected shards have been refreshed; and `false`, in which OpenSearch uses the default refresh behavior. +`timeout` | No | Duration | Defines the amount of time (in minutes) that OpenSearch waits per action until completing the processing of the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. Default is `1m`. +`refresh` | No | String | Controls OpenSearch refresh behavior for bulk requests that use the `refresh` bulk API query parameter. Valid values are `true`, which refreshes target shards in the background; `wait_for`, which blocks bulk requests until affected shards have been refreshed; and `false`, which uses the default refresh behavior. ### Metadata The `bulk` operation always returns the following metadata: - `index`: The name of the affected index. If an index cannot be derived, it returns `null`. -- `weight`: An operation-agnostic representation of the bulk size denoted by `units`. -- `unit`: The unit in which to interpret `weight`. +- `weight`: An operation-agnostic representation of the bulk size, denoted by `units`. +- `unit`: The unit used to interpret `weight`. - `success`: A Boolean indicating whether the `bulk` request succeeded. -- `success-count`: The number of successfully processed bulk items for this request. This value is determined when there are errors or when the `bulk-size` has been specified in the documents. +- `success-count`: The number of successfully processed bulk items for the request. This value is determined when there are errors or when the `bulk-size` has been specified in the documents. - `error-count`: The number of failed bulk items for the request. - `took`: The value of the `took` property in the bulk response. If `detailed-results` is `true`, the following metadata is returned: -- `ops`: A nested document with the operation name as key, such as `index`, `update`, or `delete` and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. -- `shards_histogram`: An array of hashes where each hash has two keys. The `item-count` key contains the number of items to which a shard distribution applies. The `shards` key contains a hash with the actual distribution of `total`, `successful`, and `failed` shards. -- `bulk-request-size-bytes`: The total size of the bulk request body in bytes. -- `total-document-size-bytes`: The total size of all documents within the bulk request body in bytes. +- `ops`: A nested document with the operation name as its key, such as `index`, `update`, or `delete`, and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items. +- `shards_histogram`: An array of hashes, each of which has two keys. The `item-count` key contains the number of items to which a shard distribution applies. The `shards` key contains a hash with the actual distribution of `total`, `successful`, and `failed` shards. +- `bulk-request-size-bytes`: The total size of the bulk request body, in bytes. +- `total-document-size-bytes`: The total size of all documents within the bulk request body, in bytes. ## create-index The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: -- Create all indexes specified in the workloads `indices` section. -- Creates one specific index defined in the operation itself. +- Creating all indexes specified in the workloads `indices` section +- Creating one specific index defined within the operation itself ### Usage @@ -128,7 +128,7 @@ Use the following options when creating all indexes from the `indices` section o Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in `indices` section of the workload. +`settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in the `indices` section of the workload. `request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. Use the following options when creating a single index in the operation. @@ -136,24 +136,24 @@ Use the following options when creating a single index in the operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- `index` | Yes | String | The index name. -`body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/) -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. +`body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/). +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. ### Metadata The `create-index` operation returns the following metadata: `weight`: The number of indexes created by the operation. -`unit`: Always `ops`, for the number of operations inside the workload. +`unit`: Always `ops`, indicating the number of operations inside the workload. `success`: A Boolean indicating whether the operation has succeeded. ## delete-index -The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. +The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. ### Usage -The following example deletes all indexes found in `indices` section of the workload: +The following example deletes all indexes found in the `indices` section of the workload: ```yml { @@ -185,19 +185,19 @@ Use the following options when deleting all indexes indicated in the `indices` s Parameter | Required | Type | Description :--- | :--- | :--- | :--- `only-if-exists` | No | Boolean | Decides whether an index should be deleted only if the index exists. Default is `true`. -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. -Use the following options if you want to delete one or more indexes based on the pattern indicated in the `index` option: +Use the following options if you want to delete one or more indexes based on the pattern indicated in the `index` option. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | Yes | String | The index or indexes you want to delete. -`only-if-exists` | No | Boolean | Decides whether an index should be deleted only if the index exists. Default is `true`. -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. +`index` | Yes | String | The index or indexes that you want to delete. +`only-if-exists` | No | Boolean | Decides whether an index should be deleted when the index exists. Default is `true`. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. ### Metadata -The `delete-index` operation returns the following metadata. +The `delete-index` operation returns the following metadata: `weight`: The number of indexes created by the operation. `unit`: Always `ops`, for the number of operations inside the workload. @@ -205,12 +205,12 @@ The `delete-index` operation returns the following metadata. ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. ### Usage -The following example creates a `cluster-health` operation that checks for a `green` health status on the any `log-*` indexes: +The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes: ```yml { @@ -233,11 +233,11 @@ Use the following options with the `cluster-health` operation. Parameter | Required | Type | Description :--- | :--- | :--- | :--- `index` | Yes | String | The index or indexes you want to assess. -`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. ### Metadata -The `cluster-health` operation returns the following metadata. +The `cluster-health` operation returns the following metadata: `weight`: The number of indexes the `cluster-health` operation assesses. Alwasys `1`, since the operation runs once per index. `unit`: Always `ops`, for the number of operations inside the workload. @@ -247,7 +247,7 @@ The `cluster-health` operation returns the following metadata. ## refresh -The `refresh` operations runs the Refresh API. The `operation` returns no metadata. +The `refresh` operation runs the Refresh API. The `operation` returns no metadata. ### Usage @@ -267,7 +267,7 @@ The `refresh` operation uses the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | No | String | The name of indexes or data streams to refresh. +`index` | No | String | The names of the indexes or data streams to refresh. ## search @@ -299,26 +299,26 @@ The `search` operation uses the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`index` | No | String | The indexes or data streams targeted by the query. The option is needed only when the `indices` section contains two or more indexes. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. Specify `"index": "_all"` to query against all indexes in the workload. +`index` | No | String | The indexes or data streams targeted by the query. This option is needed only when the `indices` section contains two or more indexes. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. Specify `"index": "_all"` to query against all indexes in the workload. `cache` | No | Boolean | Specifies whether to use the query request cache. OpenSearch Benchmark defines no value. The default depends on the benchmark candidate settings and the OpenSearch version. `request-params` | No | List of settings | Contains any request parameters allowed by the Search API. -`body` | Yes | Request body | Indicates which query to use and the query parameters. -`detailed-results` | No | Boolean | Records more detailed metadata about queries. When set to `true`, additional overhead may be incurred, which can skew measurement results. The option does not work with `scroll` queries. -`results-per-page` | No | Integer | Specifies the number of documents to retrieve per page. This maps to the Search API `size` parameter and can be used for scroll and non-scroll searches. Default is 10. +`body` | Yes | Request body | Indicates which query and query parameters to use. +`detailed-results` | No | Boolean | Records more detailed metadata about queries. When set to `true`, additional overhead may be incurred, which can skew measurement results. This option does not work with `scroll` queries. +`results-per-page` | No | Integer | Specifies the number of documents to retrieve per page. This maps to the Search API `size` parameter and can be used for scroll and non-scroll searches. Default is `10`. ### Metadata The following metadata is always returned: -- `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. +- `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries. - `unit`: The unit used to interpret weight, which is `ops` for regular queries and `pages` for scroll queries. - `success`: A Boolean indicating whether the query has succeeded. If `detailed-results` is set to `true`, the following metadata is also returned: -- `hits`: The total number of hits for this query. -- `hits_relation`: whether hits is accurate (eq) or a lower bound of the actual hit count (gte). -- `timed_out`: Whether the query has timed out. For scroll queries, this flag is true if the flag was true for any of the queries issued. - - `took`: The value of the `took` property in the query response. For scroll queries, the value is the sum of all `took` values in the query responses. +- `hits`: The total number of hits for the query. +- `hits_relation`: Whether the number of hits is accurate (eq) or a lower bound of the actual hit count (gte). +- `timed_out`: Whether the query has timed out. For scroll queries, this flag is `true` if the flag was `true` for any of the queries issued. + - `took`: The value of the `took` property in the query response. For scroll queries, the value is the sum of all `took` values in all query responses. diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 028dbdb625..a8d5dccaf9 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -8,17 +8,17 @@ nav_order: 100 # test_procedures -If your workload only defines one benchmarking scenario, specify the schedule on top-level. Use the `test-procedures` element to specify additional properties, such as a name or description. A test procedure is like a benchmarking scenario. If you have multiple test procedures, you can define a variety of challenges. +If your workload only defines one benchmarking scenario, specify the schedule at the top level. Use the `test-procedures` element to specify additional properties, such as a name or description. A test procedure is like a benchmarking scenario. If you have multiple test procedures, you can define a variety of challenges. -The following table lists test procedures for the benchmark scenarios in this dataset. A test procedure can reference all operations that are defined in the operations section. +The following table lists test procedures for the benchmarking scenarios in this dataset. A test procedure can reference all operations that are defined in the operations section. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces so that the name is easy to enter on the command line. +`name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces; this ensures that the name can be easily entered on the command line. `description` | No | String | Describes the test procedure in a human-readable format. -`user-info` | No | String | Outputs a message at the start of the test to notify user about important test-related information, for example, deprecations. -`default` | No | Boolean | When set to `true`, selects the default test procedure if the user did not specify a test procedure on the command line. If the workload only defines one test procedure, it is implicitly selected as default. Otherwise, you must define `"default": true` on exactly one challenge. -[`schedule`](#Schedule) | Yes | Array | Defines the order in which tasks in the workload are run. +`user-info` | No | String | Outputs a message at the start of the test to notify you about important test-related information, for example, deprecations. +`default` | No | Boolean | When set to `true`, selects the default test procedure if you did not specify a test procedure on the command line. If the workload only defines one test procedure, it is implicitly selected as the default. Otherwise, you must define `"default": true` on exactly one challenge. +[`schedule`](#Schedule) | Yes | Array | Defines the order in which workload tasks are run. ## schedule @@ -27,13 +27,13 @@ The `schedule` element contains a list of a tasks, which are operations supporte ### Usage -The `schedule` element defines tasks using the following methods described in this section. +The `schedule` element defines tasks using the methods described in this section. #### Using the operations element The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` are needed. The `match-all-query` parameter requires a query `body` and `operation-type`. -Operations defined in the `operations` element can be reused more than once in the schedule: +Operations defined in the `operations` element can be reused in the schedule more than once: ```yml { @@ -70,7 +70,7 @@ Operations defined in the `operations` element can be reused more than once in t #### Defining operations inline -If you don't want reuse an operation in the schedule, you can define operations inside the `schedule` element, as shown in the following example: +If you don't want to reuse an operation in the schedule, you can define operations inside the `schedule` element, as shown in the following example: ```yml { @@ -107,33 +107,33 @@ Each task contains the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`operation` | Yes | List | Refers to either the name of an operation, defined in the `operations` element, or includes the entire operation inline. +`operation` | Yes | List | Either refers to the name of an operation, defined in the `operations` element, or includes the entire operation inline. `name` | No | String | Specifies a unique name for the task when multiple tasks use the same operation. -`tags` | No | String | Unique identifiers that can be used to filter between `tasks.clients`, or the number of clients that should execute a task concurrently. Default is 1. -`clients` | No | Integer | Specifies the number of clients that concurrently run the task. Default is `1`. +`tags` | No | String | Unique identifiers that can be used to filter between `tasks.clients` or the number of clients that should execute a task concurrently. Default is 1. +`clients` | No | Integer | Specifies the number of clients that will run the task concurrently. Default is `1`. ### Target options OpenSearch Benchmark requires one of the following options when running a task: -`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving best throughput is preferred over better latency. When defined, the target specifies the number of requests per second over all clients. For example, if you specify `target-throughput: 1000` with eight clients, each client issues 125 (= 1000 / 8) requests per second. -`target-interval` | No | Interval | Defines an interval of less 1 divided by the target-throughput (in seconds) less than one operation per second. Define either target-throughput or target-interval but not both (otherwise Rally will raise an error). +`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving better throughput is preferred over better latency. When defined, the target specifies the number of requests per second across all clients. For example, if you specify `target-throughput: 1000` with 8 clients, each client issues 125 (= 1000 / 8) requests per second. +`target-interval` | No | Interval | Defines an interval of 1 divided by the target-throughput (in seconds) when the `target-throughput` is less than one operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error. `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. ### Iteration-based options -Iteration-based options determine the number of times an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. +Iteration-based options determine the number of times that an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`iterations` | No | Integer | Specifies the number of times a client should execute an operation. These are included in the measured results. Default is `1`. -`warmup-iterations` | No | Integer | Specifies the number of times a client should execute an operation for warming up the benchmark candidate. The `warmup-iterations` do not appear in the measurement results. Default is `0`. +`iterations` | No | Integer | Specifies the number of times that a client should execute an operation. All iterations are included in the measured results. Default is `1`. +`warmup-iterations` | No | Integer | Specifies the number of times that a client should execute an operation in order to warm up the benchmark candidate. The `warmup-iterations` do not appear in the measurement results. Default is `0`. ### Parallel tasks -The `parallel` element runs tasks wrapped inside the element concurrently. +The `parallel` element concurrently runs tasks wrapped inside the element. -When running tasks in parallel, each task requires the `client` option to make sure clients inside your benchmark are reserved for that task. Otherwise, when the `client` option is specified inside the `parallel` element without a connection to the task, the benchmark uses that number of clients for all tasks. +When running tasks in parallel, each task requires the `client` option in order to ensure that clients inside your benchmark are reserved for that task. Otherwise, when the `client` option is specified inside the `parallel` element without a connection to the task, the benchmark uses that number of clients for all tasks. #### Usage @@ -172,20 +172,20 @@ In the following example, `parallel-task-1` and `parallel-task-2` execute a `bul #### Options -The `parallel` element supports all `schedule` parameters, in addition to the following: +The `parallel` element supports all `schedule` parameters, in addition to the following options. `tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently. -`completed-by` | No | String | Allows you to define the name of one task in the tasks list or the value `any`. If `completed-by` is set to the name of one task in the list, the `parallel-task` structure is considered complete once that specific task has been completed. If `completed-by` is set to `any`, the `parallel-task` structure is considered complete when any of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the `parallel-task` structure is considered complete as soon as all the tasks in the list have been completed. +`completed-by` | No | String | Allows you to define the name of one task in the task list or the value `any`. If `completed-by` is set to the name of one task in the list, the `parallel-task` structure is considered to be complete once that specific task has been completed. If `completed-by` is set to `any`, the `parallel-task` structure is considered to be complete when any one of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the `parallel-task` structure is considered to be complete as soon as all of the tasks in the list have been completed. ### Time-based options -Time-based options determines the duration of time, in seconds, that operations should run for. This is ideal for batch-style operations which may require an additional warmup period. +Time-based options determine the duration of time, in seconds, for which operations should run. This is ideal for batch-style operations, which may require an additional warmup period. To configure a time-based schedule, use the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`time-period` | No | Integer | Specifies the time period in seconds that OpenSearch Benchmark considers for measurement. This is not required for bulk indexing because OpenSearch Benchmark bulk indexes all documents and naturally measures all samples after the specified `warmup-time-period`. -`ramp-up-time-period` | No | Integer | Specifies the time period in seconds in which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. +`time-period` | No | Integer | Specifies the time period, in seconds, that OpenSearch Benchmark considers for measurement. This is not required for bulk indexing because OpenSearch Benchmark bulk indexes all documents and naturally measures all samples after the specified `warmup-time-period`. +`ramp-up-time-period` | No | Integer | Specifies the time period, in seconds, during which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. `warmup-time-period` | No | Integer | Specifies the time period in seconds to warm up the benchmark candidate. All response data captured during the warmup period do not appear in the measurement results. From 3e5a717c86ee73c3a4992971d76d9831db91652d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 12:11:46 -0600 Subject: [PATCH 18/21] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/operations.md | 4 ++-- _benchmark/reference/workloads/test-procedures.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index b89fc281ac..dc8fa7e47d 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -129,7 +129,7 @@ Use the following options when creating all indexes from the `indices` section o Parameter | Required | Type | Description :--- | :--- | :--- | :--- `settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in the `indices` section of the workload. -`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and pass them as is. +`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. Use the following options when creating a single index in the operation. @@ -184,7 +184,7 @@ Use the following options when deleting all indexes indicated in the `indices` s Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`only-if-exists` | No | Boolean | Decides whether an index should be deleted only if the index exists. Default is `true`. +`only-if-exists` | No | Boolean | Decides whether an existing index should be deleted. Default is `true`. `request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state. Use the following options if you want to delete one or more indexes based on the pattern indicated in the `index` option. diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index a8d5dccaf9..8edf0e3b8a 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -187,5 +187,5 @@ Parameter | Required | Type | Description :--- | :--- | :--- | :--- `time-period` | No | Integer | Specifies the time period, in seconds, that OpenSearch Benchmark considers for measurement. This is not required for bulk indexing because OpenSearch Benchmark bulk indexes all documents and naturally measures all samples after the specified `warmup-time-period`. `ramp-up-time-period` | No | Integer | Specifies the time period, in seconds, during which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation. -`warmup-time-period` | No | Integer | Specifies the time period in seconds to warm up the benchmark candidate. All response data captured during the warmup period do not appear in the measurement results. +`warmup-time-period` | No | Integer | Specifies the amount of time, in seconds, to warm up the benchmark candidate. None of the response data captured during the warmup period appears in the measurement results. From c8b909489e2ad134e4861319416929c4b1b70fb4 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 12:15:44 -0600 Subject: [PATCH 19/21] Update _benchmark/reference/workloads/test-procedures.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/test-procedures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 8edf0e3b8a..511033e6f1 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -117,7 +117,7 @@ Parameter | Required | Type | Description OpenSearch Benchmark requires one of the following options when running a task: `target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving better throughput is preferred over better latency. When defined, the target specifies the number of requests per second across all clients. For example, if you specify `target-throughput: 1000` with 8 clients, each client issues 125 (= 1000 / 8) requests per second. -`target-interval` | No | Interval | Defines an interval of 1 divided by the target-throughput (in seconds) when the `target-throughput` is less than one operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error. +`target-interval` | No | Interval | Defines an interval of 1 divided by the target-throughput (in seconds) when the `target-throughput` is less than 1 operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error. `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. ### Iteration-based options From 17bbd276faff27e37077b7a52dc7d79d0febf09d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Dec 2023 12:17:08 -0600 Subject: [PATCH 20/21] Update test-procedures.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _benchmark/reference/workloads/test-procedures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index 511033e6f1..dd37494a02 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -117,7 +117,7 @@ Parameter | Required | Type | Description OpenSearch Benchmark requires one of the following options when running a task: `target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving better throughput is preferred over better latency. When defined, the target specifies the number of requests per second across all clients. For example, if you specify `target-throughput: 1000` with 8 clients, each client issues 125 (= 1000 / 8) requests per second. -`target-interval` | No | Interval | Defines an interval of 1 divided by the target-throughput (in seconds) when the `target-throughput` is less than 1 operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error. +`target-interval` | No | Interval | Defines an interval of 1 divided by the `target-throughput` (in seconds) when the `target-throughput` is less than 1 operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error. `ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag. ### Iteration-based options From 37b7711fd05c9a5bb1c49ee1ba8e5530a8103e76 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Wed, 20 Dec 2023 12:31:15 -0600 Subject: [PATCH 21/21] Change nav order Signed-off-by: Naarcha-AWS --- _benchmark/reference/workloads/operations.md | 2 +- _benchmark/reference/workloads/test-procedures.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index dc8fa7e47d..332a80ee2f 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -3,7 +3,7 @@ layout: default title: operations parent: Workload reference grand_parent: OpenSearch Benchmark Reference -nav_order: 110 +nav_order: 100 --- # operations diff --git a/_benchmark/reference/workloads/test-procedures.md b/_benchmark/reference/workloads/test-procedures.md index dd37494a02..440ed123ae 100644 --- a/_benchmark/reference/workloads/test-procedures.md +++ b/_benchmark/reference/workloads/test-procedures.md @@ -3,7 +3,7 @@ layout: default title: test_procedures parent: Workload reference grand_parent: OpenSearch Benchmark Reference -nav_order: 100 +nav_order: 110 --- # test_procedures