Skip to content

Commit

Permalink
Add Conversational Search changes for 2.11 (#5195)
Browse files Browse the repository at this point in the history
* Add Conversational Search changes for 2.11

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update conversational-search.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
  • Loading branch information
3 people authored and vagimeli committed Oct 13, 2023
1 parent ee98a84 commit 24e5a39
Showing 1 changed file with 54 additions and 12 deletions.
66 changes: 54 additions & 12 deletions _ml-commons-plugin/conversational-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Conversational search is an experimental machine learning (ML) feature that enab
Currently, conversational search uses two systems to synthesize documents:

- [Conversation memory](#conversation-memory)
- [Retreival Augmented Generation (RAG) pipeline](#rag-pipeline)
- [Retrieval Augmented Generation (RAG) pipeline](#rag-pipeline)

## Conversation memory

Conversation memory consists of a simple CRUD-life API comprising two resources: **Conversations** and **Interactions**. Conversations are made up of interactions. An interaction represents a pair of messages: a human input and an artificial intelligence (AI) response. You cannot create any interactions until you've created a conversation.
Conversation memory consists of a simple CRUD-like API comprising two resources: **Conversations** and **Interactions**. Conversations are made up of interactions. An interaction represents a pair of messages: a human input and an artificial intelligence (AI) response. You cannot create any interactions until you've created a conversation.

To make it easier to build and debug applications that use conversation memory, `conversation-meta` and `conversation-interactions` are stored in two system indexes.

Expand All @@ -40,6 +40,7 @@ In the `conversation-meta` index, you can customize the `name` field to make it
}
}
```
{% include copy-curl.html %}

### `conversation-interactions` index

Expand Down Expand Up @@ -72,6 +73,7 @@ The `conversation-interactions` index creates a clean interaction abstraction an
}
}
```
{% include copy-curl.html %}

## Working with conversations and interactions

Expand All @@ -88,6 +90,7 @@ PUT /_cluster/settings
}
}
```
{% include copy-curl.html %}

After conversation memory is enabled, you can use the Memory API to create a conversation.

Expand All @@ -101,6 +104,7 @@ POST /_plugins/_ml/memory/conversation
"name": Example conversation
}
```
{% include copy-curl.html %}

The Memory API responds with the conversation ID, as shown in the following example response:

Expand All @@ -123,6 +127,7 @@ POST /_plugins/_ml/memory/conversation/4of2c9nhoIuhcr
A JSON or other semi-structured response"
}
```
{% include copy-curl.html %}

The Memory API then responds with an interaction ID, as shown in the following example response:

Expand All @@ -137,6 +142,7 @@ You can get a list of conversations using the following Memory API operation:
```json
GET /_plugins/_ml/memory/conversation?max_results=3&next_token=0
```
{% include copy-curl.html %}

Use the following path parameters to customize your results.

Expand Down Expand Up @@ -176,6 +182,7 @@ To see a list of interactions in a conversation, enter the `conversation_id` at
```json
GET /_plugins/_ml/memory/conversation/4of2c9nhoIuhcr
```
{% include copy-curl.html %}

The Memory API returns the following interaction information:

Expand All @@ -202,11 +209,12 @@ The Memory API returns the following interaction information:

### Deleting conversations

To delete a conversation, use the `DELETE` operation, as showing in the following example:
To delete a conversation, use the `DELETE` operation, as shown in the following example:

```json
DELETE /_plugins/_ml/memory/conversation/4of2c9nhoIuhcr
```
{% include copy-curl.html %}

The Memory API responds with the following:

Expand All @@ -216,7 +224,10 @@ The Memory API responds with the following:

## RAG pipeline

RAG is a technique that retrieves documents from an index, passes them through a seq2seq model, such as an LLM, and then generates more factual outputs.
RAG is a technique that retrieves documents from an index, passes them through a seq2seq model, such as an LLM, and then supplements the static LLM information with the dynamically retrieved data in context.

As of OpenSearch 2.11, the RAG technique has only been tested with OpenAI models and the Anthropic Claude model on Amazon Bedrock.
{: .warning}

### Enabling RAG

Expand All @@ -228,6 +239,7 @@ PUT /_cluster/settings
"persistent": {"plugins.ml_commons.rag_pipline_feature_enabled": "true"}
}
```
{% include copy-curl.html %}

### Connecting the model

Expand Down Expand Up @@ -268,16 +280,18 @@ Use the following steps to set up an HTTP connector using the OpenAI GPT 3.5 mod

2. Create a new model group for the connected model. You'll use the `model_group_id` returned by the Register API to register the model:

```json
POST /_plugins/_ml/model_group/_register
{
```json
POST /_plugins/_ml/model_group/_register
{
"name": "public_model_group",
"description": "This is a public model group"
}
}
```
{% include copy-curl.html %}

3. Register and deploy the model using the `connector_id` from the Connector API response in Step 1 and the `model_group_id` returned in Step 2:

```java
```json
POST /_plugins/_ml/models/_register
{
"name": "openAI-gpt-3.5-turbo",
Expand All @@ -287,18 +301,21 @@ Use the following steps to set up an HTTP connector using the OpenAI GPT 3.5 mod
"connector_id": "f5-iSYoBu0R6vVqGI3PA"
}
```
{% include copy-curl.html %}

4. With the model registered, use the `task_id` returned in the registration response to get the `model_id`. You'll use the `model_id` to deploy the model to OpenSearch:

```json
GET /_plugins/_ml/tasks/<task_id>
```
{% include copy-curl.html %}

5. Using the `model_id` from step 4, deploy the model:

```
```json
POST /_plugins/_ml/models/<model_id>/_deploy
```
{% include copy-curl.html %}

### Setting up the pipeline

Expand All @@ -313,12 +330,17 @@ PUT /_search/pipeline/<pipeline_name>
"tag": "openai_pipeline_demo",
"description": "Demo pipeline Using OpenAI Connector",
"model_id": "<model_id>",
"context_field_list": ["text"]
"context_field_list": ["text"],
"system_prompt": "You are a helpful assistant",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"
}
}
]
}
```
{% include copy-curl.html %}

### Context field list

`context_field_list` is the list of fields in document sources that the pipeline uses as context for the RAG. For example, when `context_field_list` parses through the following document, the pipeline sends the `text` field from the response to OpenAI model:

Expand All @@ -335,6 +357,19 @@ PUT /_search/pipeline/<pipeline_name>

You can customize `context_field_list` in your RAG pipeline to send any fields that exist in your documents to the LLM.

### RAG parameter options

Use the following options when setting up a RAG pipeline under the `retrieval_augmented_generation` argument.

Parameter | Required | Description
:--- | :--- | :---
`tag` | No | A tag to help identify the pipeline.
`description` | Yes | A description of the pipeline.
`model_id` | Yes | The ID of the model used in the pipeline.
`context_field_list` | Yes | The list of fields in document sources that the pipeline uses as context for the RAG. For more information, see [Context Field List](#context-field-list).
`system_prompt` | No | The message sent to the LLM with a `system` role. This is the message the user sees when the LLM receives an interaction.
`user_instructions` | No | An additional message sent by the LLM with a `user` role. This parameter allows for further customization of what the user receives when interacting with the LLM.

### Using the pipeline

Using the pipeline is similar to submitting [search queries]({{site.url}}{{site.baseurl}}/api-reference/search/#example) to OpenSearch, as shown in the following example:
Expand All @@ -347,11 +382,15 @@ GET /<index_name>/_search?search_pipeline=<pipeline_name>
"generative_qa_parameters": {
"llm_model": "gpt-3.5-turbo",
"llm_question": "Was Abraham Lincoln a good politician",
"conversation_id": "_ikaSooBHvd8_FqDUOjZ"
"conversation_id": "_ikaSooBHvd8_FqDUOjZ",
"context_size": 5,
"interaction_size": 5,
"timeout": 15
}
}
}
```
{% include copy-curl.html %}

The RAG search query uses the following request objects under the `generative_qa_parameters` option.

Expand All @@ -360,6 +399,9 @@ Parameter | Required | Description
`llm_question` | Yes | The question the LLM must answer.
`llm_model` | No | Overrides the original model set in the connection in cases where you want to use a different model (for example, GPT 4 instead of GPT 3.5). This option is required if a default model is not set during pipeline creation.
`conversation_id` | No | Integrates conversation memory into your RAG pipeline by adding the 10 most recent conversations into the context of the search query to the LLM.
`context_size` | No | The number of search results sent to the LLM. This is typically needed in order to meet the token size limit, which can vary by model. Alternatively, you can use the `size` parameter in the Search API to control the amount of information sent to the LLM.
`interaction_size` | No | The number of interactions sent to the LLM. Similarly to the number of search results, this affects the total number of tokens seen by the LLM. When not set, the pipeline uses the default interaction size of `10`.
`timeout` | No | The number of seconds that the pipeline waits for the remote model using a connector to respond. Default is `30`.

If your LLM includes a set token limit, set the `size` field in your OpenSearch query to limit the number of documents used in the search response. Otherwise, the RAG pipeline will send every document in the search results to the LLM.

Expand Down

0 comments on commit 24e5a39

Please sign in to comment.