Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search pipeline GA documentation #4553

Merged
merged 7 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Filter query processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `filter_query` search request processor intercepts a search request and applies an additional query to the request, filtering the results. This is useful when you don't want to rewrite existing queries in your application but need additional filtering of the results.

## Request fields
Expand All @@ -23,6 +20,7 @@ Field | Data type | Description
`query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
natebower marked this conversation as resolved.
Show resolved Hide resolved

## Example

Expand Down
246 changes: 231 additions & 15 deletions _search-plugins/search-pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,8 @@ has_toc: false

# Search pipelines

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application.

## Enabling search pipelines

Search pipeline functionality is disabled by default. To enable it, edit the configuration in `opensearch.yml` and then restart your cluster:

1. Navigate to the OpenSearch config directory.
1. Open the `opensearch.yml` configuration file.
1. Add `opensearch.experimental.feature.search_pipeline.enabled: true` and save the configuration file.
1. Restart your cluster.

## Terminology

The following is a list of search pipeline terminology:
Expand Down Expand Up @@ -123,7 +111,7 @@ Search pipelines are stored in the cluster state. To create a search pipeline, y

#### Example request

The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages:
The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:

```json
PUT /_search/pipeline/my_pipeline
Expand All @@ -140,11 +128,79 @@ PUT /_search/pipeline/my_pipeline
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}

### Ignoring processor failures

By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:

```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```

If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check if there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Using a temporary search pipeline for a request

As an alternative to creating a search pipeline, you can define a temporary search pipeline to be used for the current query only:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
POST /my-index/_search
{
"query" : {
"match" : {
"text_field" : "some search text"
}
},
"pipeline" : {
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
}
```
{% include copy-curl.html %}

With this syntax, the pipeline does not persist and is used only for the query for which it is specified.

## Retrieving search pipelines

To retrieve the details of an existing search pipeline, use the Search Pipeline API.
Expand Down Expand Up @@ -201,7 +257,7 @@ GET /_search/pipeline/my*

## Using a search pipeline

To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:

```json
GET /my_index/_search?search_pipeline=my_pipeline
Expand Down Expand Up @@ -393,4 +449,164 @@ The response contains the pipeline version:
}
}
```
</details>
</details>

## Search pipeline metrics

To view search pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/):

```json
GET /_nodes/stats/search_pipeline
```
{% include copy-curl.html %}

The response contains statistics for all search pipelines:

```json
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "runTask",
"nodes" : {
"CpvTK7KuRD6Oww8TTp8g2Q" : {
"timestamp" : 1689007282929,
"name" : "runTask-0",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes" : {
"testattr" : "test",
"shard_indexing_pressure_enabled" : "true"
},
"search_pipeline" : {
"total_request" : {
"count" : 5,
"time_in_millis" : 158,
"current" : 0,
"failed" : 0
},
"total_response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"pipelines" : {
"public_info" : {
"request" : {
"count" : 3,
"time_in_millis" : 71,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 0,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 4,
"time_in_millis" : 2,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [ ]
},
"guest_pipeline" : {
"request" : {
"count" : 2,
"time_in_millis" : 87,
"current" : 0,
"failed" : 0
},
"response" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
},
"request_processors" : [
{
"script" : {
"type" : "script",
"stats" : {
"count" : 2,
"time_in_millis" : 86,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query:abc" : {
"type" : "filter_query",
"stats" : {
"count" : 1,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
},
{
"filter_query" : {
"type" : "filter_query",
"stats" : {
"count" : 3,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
}
}
}
],
"response_processors" : [
{
"rename_field" : {
"type" : "rename_field",
"stats" : {
"count" : 2,
"time_in_millis" : 1,
"current" : 0,
"failed" : 0
}
}
}
]
}
}
}
}
}
}
```

For descriptions of each field in the response, see the [Nodes Stats search pipeline section]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#search_pipeline).
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Rename field processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `rename_field` search response processor intercepts a search response and renames the specified field. This is useful when your index and your application use different names for the same field. For example, if you rename a field in your index, the `rename_field` processor can change the new name to the old one before sending the response to your application.

## Request fields
Expand All @@ -24,6 +21,7 @@ Field | Data type | Description
`target_field` | String | The new field name. Required.
`tag` | String | The processor's identifier.
`description` | String | A description of the processor.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
natebower marked this conversation as resolved.
Show resolved Hide resolved

## Example

Expand Down
5 changes: 1 addition & 4 deletions _search-plugins/search-pipelines/script-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ grand_parent: Search

# Script processor

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion in the [OpenSearch forum](https://forum.opensearch.org/t/rfc-search-pipelines/12099).
{: .warning}

The `script` search request processor intercepts a search request and adds an inline Painless script that is run on incoming requests. The script can only run on the following request fields:

- `from`
Expand All @@ -37,6 +34,7 @@ Field | Data type | Description
`lang` | String | The script language. Optional. Only `painless` is supported.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
natebower marked this conversation as resolved.
Show resolved Hide resolved

## Example

Expand All @@ -57,4 +55,3 @@ PUT /_search/pipeline/explain_one_result
}
```
{% include copy-curl.html %}