Skip to content

Commit

Permalink
Add Split Response Processor to 2.17 Search Pipeline docs
Browse files Browse the repository at this point in the history
Signed-off-by: Daniel Widdis <[email protected]>
  • Loading branch information
dbwiddis committed Aug 24, 2024
1 parent bb5eeeb commit 7250eb6
Show file tree
Hide file tree
Showing 2 changed files with 237 additions and 0 deletions.
1 change: 1 addition & 0 deletions _search-plugins/search-pipelines/search-processors.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Processor | Description | Earliest available version
[`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16
[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.17
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12


Expand Down
236 changes: 236 additions & 0 deletions _search-plugins/search-pipelines/split-processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
layout: default
title: Split
nav_order: 140
has_children: false
parent: Search processors
grand_parent: Search pipelines
---

# Split processor
Introduced 2.17
{: .label .label-purple }

The `split` processor splits a string field into an array of substrings based on a specified delimiter.

## Request fields

The following table lists all available request fields.

Field | Data type | Description
:--- | :--- | :---
`field` | String | The field containing the string to be split. Required.
`separator` | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
`preserve_trailing` | Boolean | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`.
`target_field` | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place.
`tag` | String | The processor's identifier.
`description` | String | A description of the processor.
`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

## Example

The following example demonstrates using a search pipeline with a `split` processor.

### Setup

Create an index named `my_index` and index a document containing the field `message`:

```json
POST /my_index/_doc/1
{
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
```
{% include copy-curl.html %}

### Creating a search pipeline

The following request creates a search pipeline with a `split` response processor that splits the `message` field and stores the results in the `split_message` field:

```json
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"split": {
"field": "message",
"separator": ", ",
"target_field": "split_message"
}
}
]
}
```
{% include copy-curl.html %}

### Using a search pipeline

Search for documents in `my_index` without a search pipeline:

```json
GET /my_index/_search
```
{% include copy-curl.html %}

The response contains the field `message`:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
}
]
}
}
```
</details>

To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:

```json
GET /my_index/_search?search_pipeline=my_pipeline
```
{% include copy-curl.html %}

The `message` field is split and the results are stored in the `split_message` field:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
```
</details>

You can also use the `fields` option to search for specific fields in a document:

```json
POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
"fields": ["visibility", "message"]
}
```
{% include copy-curl.html %}

In the response, the `message` field is split and the results are stored in the `split_message` field:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
},
"fields": {
"visibility": [
"public"
],
"message": [
"ingest, search, visualize, and analyze data"
],
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
```
</details>

0 comments on commit 7250eb6

Please sign in to comment.