Skip to content

Commit

Permalink
Search pipeline GA documentation (#4553)
Browse files Browse the repository at this point in the history
* Search pipeline GA documentation

Signed-off-by: Fanit Kolchina <[email protected]>

* Add ad-hoc pipelines and ignore failure flag

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Update script-processor.md

Signed-off-by: kolchfa-aws <[email protected]>

* Update _search-plugins/search-pipelines/index.md

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Update _search-plugins/search-pipelines/index.md

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
2 people authored and vagimeli committed Dec 20, 2023
1 parent 2b51a31 commit 0b8e728
Show file tree
Hide file tree
Showing 4 changed files with 234 additions and 25 deletions.
4 changes: 1 addition & 3 deletions _search-plugins/search-pipelines/filter-query-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Filter query processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `filter_query` search request processor intercepts a search request and applies an additional query to the request, filtering the results. This is useful when you don't want to rewrite existing queries in your application but need additional filtering of the results.

## Request fields
Expand All @@ -23,6 +20,7 @@ Field | Data type | Description
`query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

## Example

Expand Down
246 changes: 231 additions & 15 deletions _search-plugins/search-pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,8 @@ has_toc: false

# Search pipelines

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application.

## Enabling search pipelines

Search pipeline functionality is disabled by default. To enable it, edit the configuration in `opensearch.yml` and then restart your cluster:

1. Navigate to the OpenSearch config directory.
1. Open the `opensearch.yml` configuration file.
1. Add `opensearch.experimental.feature.search_pipeline.enabled: true` and save the configuration file.
1. Restart your cluster.

## Terminology

The following is a list of search pipeline terminology:
Expand Down Expand Up @@ -123,7 +111,7 @@ Search pipelines are stored in the cluster state. To create a search pipeline, y

#### Example request

The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages:
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:

```json
PUT /_search/pipeline/my_pipeline
Expand All @@ -140,11 +128,79 @@ PUT /_search/pipeline/my_pipeline
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}

### Ignoring processor failures

By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:

```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```

If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).

## Using a temporary search pipeline for a request

As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for only the current query:

```json
POST /my-index/_search
{
"query" : {
"match" : {
"text_field" : "some search text"
}
},
"pipeline" : {
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
}
```
{% include copy-curl.html %}

With this syntax, the pipeline does not persist and is used only for the query for which it is specified.

## Retrieving search pipelines

To retrieve the details of an existing search pipeline, use the Search Pipeline API.
Expand Down Expand Up @@ -201,7 +257,7 @@ GET /_search/pipeline/my*

## Using a search pipeline

To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:

```json
GET /my_index/_search?search_pipeline=my_pipeline
Expand Down Expand Up @@ -393,4 +449,164 @@ The response contains the pipeline version:
}
}
```
</details>
</details>

## Search pipeline metrics

To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/):

```json
GET /_nodes/stats/search_pipeline
```
{% include copy-curl.html %}

The response contains statistics for all search pipelines:

```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"CpvTK7KuRD6Oww8TTp8g2Q" : {
"timestamp" : 1689007282929,
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipeline" : {
"total_request" : {
"count" : 5,
"time_in_millis" : 158,
"current" : 0,
"failed" : 0
},
"total_response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"pipelines" : {
"public_info" : {
"request" : {
"count" : 3,
"time_in_millis" : 71,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 0,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 4,
"time_in_millis" : 2,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [ ]
},
"guest_pipeline" : {
"request" : {
"count" : 2,
"time_in_millis" : 87,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"script" : {
"type" : "script",
"stats" : {
"count" : 2,
"time_in_millis" : 86,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 3,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [
{
"rename_field" : {
"type" : "rename_field",
"stats" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
}
}
}
]
}
}
}
}
}
}
```

For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline).
4 changes: 1 addition & 3 deletions _search-plugins/search-pipelines/rename-field-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Rename field processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `rename_field` search response processor intercepts a search response and renames the specified field. This is useful when your index and your application use different names for the same field. For example, if you rename a field in your index, the `rename_field` processor can change the new name to the old one before sending the response to your application.

## Request fields
Expand All @@ -24,6 +21,7 @@ Field | Data type | Description
`target_field` | String | The new field name. Required.
`tag` | String | The processor's identifier.
`description` | String | A description of the processor.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

## Example

Expand Down
5 changes: 1 addition & 4 deletions _search-plugins/search-pipelines/script-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Script processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `script` search request processor intercepts a search request and adds an inline Painless script that is run on incoming requests. The script can only run on the following request fields:

- `from`
Expand All @@ -37,6 +34,7 @@ Field | Data type | Description
`lang` | String | The script language. Optional. Only `painless` is supported.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

## Example

Expand All @@ -57,4 +55,3 @@ PUT /_search/pipeline/explain_one_result
}
```
{% include copy-curl.html %}

0 comments on commit 0b8e728

Please sign in to comment.