Skip to content

Commit

Permalink
Merge branch 'main' into default-params
Browse files Browse the repository at this point in the history
  • Loading branch information
kolchfa-aws authored Mar 29, 2024
2 parents 71b1706 + d676a79 commit 7bb6edd
Show file tree
Hide file tree
Showing 46 changed files with 912 additions and 304 deletions.
2 changes: 2 additions & 0 deletions .github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ Levenshtein
[Oo]nboarding
pebibyte
[Pp]erformant
[Pp]laintext
[Pp]luggable
[Pp]reconfigure
[Pp]refetch
Expand All @@ -92,6 +93,7 @@ pebibyte
[Pp]reprocess
[Pp]retrain
[Pp]seudocode
[Quantiz](e|ation|ing|er)
[Rr]ebalance
[Rr]ebalancing
[Rr]edownload
Expand Down
2 changes: 1 addition & 1 deletion _automating-configurations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ redirect_from: /automating-configurations/
---

# Automating configurations
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }

You can automate complex OpenSearch setup and preprocessing tasks by providing templates for common use cases. For example, automating machine learning (ML) setup tasks streamlines the use of OpenSearch ML offerings.
Expand Down
8 changes: 1 addition & 7 deletions _dashboards/dashboards-assistant/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,11 @@ has_children: false
has_toc: false
---

This is an experimental feature and is not recommended for use in a production environment. For updates on the feature's progress or to leave feedback, go to the [`dashboards-assistant` repository](https://github.com/opensearch-project/dashboards-assistant) on GitHub or the associated [OpenSearch forum thread](https://forum.opensearch.org/t/feedback-opensearch-assistant/16741).
{: .warning}

Note that machine learning models are probabilistic and that some may perform better than others, so the OpenSearch Assistant may occasionally produce inaccurate information. We recommend evaluating outputs for accuracy as appropriate to your use case, including reviewing the output or combining it with other verification factors.
{: .important}

# OpenSearch Assistant for OpenSearch Dashboards
Introduced 2.12
**Introduced 2.13**
{: .label .label-purple }

The OpenSearch Assistant toolkit helps you create AI-powered assistants for OpenSearch Dashboards without requiring you to have specialized query tools or skills.
Expand Down Expand Up @@ -49,9 +46,6 @@ A screenshot of the interface is shown in the following image.

<img width="700" src="{{site.url}}{{site.baseurl}}/images/dashboards/opensearch-assistant-full-frame.png" alt="OpenSearch Assistant interface">

For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
{: .note}

## Configuring OpenSearch Assistant

You can use the OpenSearch Dashboards interface to configure OpenSearch Assistant. Go to the [Getting started guide](https://github.com/opensearch-project/dashboards-assistant/blob/main/GETTING_STARTED_GUIDE.md) for step-by-step instructions. For the chatbot template, go to the [Flow Framework plugin](https://github.com/opensearch-project/flow-framework) documentation. You can modify this template to use your own model and customize the chatbot tools.
Expand Down
1 change: 0 additions & 1 deletion _data-prepper/pipelines/configuration/sinks/opensearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ Option | Required | Type | Description
`document_root_key` | No | String | The key in the event that will be used as the root in the document. The default is the root of the event. If the key does not exist, then the entire event is written as the document. If `document_root_key` is of a basic value type, such as a string or integer, then the document will have a structure of `{"data": <value of the document_root_key>}`.
`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).

<!-- vale on -->

## aws
Expand Down
114 changes: 3 additions & 111 deletions _ingest-pipelines/processors/text-chunking.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,119 +157,11 @@ The response confirms that, in addition to the `passage_text` field, the process
}
```

Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of the [neural sparse search documentation]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).

## Chaining text chunking and embedding processors

You can use a `text_chunking` processor as a preprocessing step for a `text_embedding` or `sparse_encoding` processor in order to obtain embeddings for each chunked passage.

**Prerequisites**

Follow the steps outlined in the [pretrained model documentation]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) to register an embedding model.

**Step 1: Create a pipeline**

The following example request creates an ingest pipeline that converts the text in the `passage_text` field into chunked passages, which will be stored in the `passage_chunk` field. The text in the `passage_chunk` field is then converted into text embeddings, and the embeddings are stored in the `passage_embedding` field:

```json
PUT _ingest/pipeline/text-chunking-embedding-ingest-pipeline
{
"description": "A text chunking and embedding ingest pipeline",
"processors": [
{
"text_chunking": {
"algorithm": {
"fixed_token_length": {
"token_limit": 10,
"overlap_rate": 0.2,
"tokenizer": "standard"
}
},
"field_map": {
"passage_text": "passage_chunk"
}
}
},
{
"text_embedding": {
"model_id": "LMLPWY4BROvhdbtgETaI",
"field_map": {
"passage_chunk": "passage_chunk_embedding"
}
}
}
]
}
```
{% include copy-curl.html %}

**Step 2 (Optional): Test the pipeline**

It is recommended that you test your pipeline before ingesting documents.
{: .tip}

To test the pipeline, run the following query:

```json
POST _ingest/pipeline/text-chunking-embedding-ingest-pipeline/_simulate
{
"docs": [
{
"_index": "testindex",
"_id": "1",
"_source":{
"passage_text": "This is an example document to be chunked. The document contains a single paragraph, two sentences and 24 tokens by standard tokenizer in OpenSearch."
}
}
]
}
```
{% include copy-curl.html %}

#### Response

The response confirms that, in addition to the `passage_text` and `passage_chunk` fields, the processor has generated text embeddings for each of the three passages in the `passage_chunk_embedding` field. The embedding vectors are stored in the `knn` field for each chunk:

```json
{
"docs": [
{
"doc": {
"_index": "testindex",
"_id": "1",
"_source": {
"passage_chunk_embedding": [
{
"knn": [...]
},
{
"knn": [...]
},
{
"knn": [...]
}
],
"passage_text": "This is an example document to be chunked. The document contains a single paragraph, two sentences and 24 tokens by standard tokenizer in OpenSearch.",
"passage_chunk": [
"This is an example document to be chunked. The document ",
"The document contains a single paragraph, two sentences and 24 ",
"and 24 tokens by standard tokenizer in OpenSearch."
]
},
"_ingest": {
"timestamp": "2024-03-20T03:04:49.144054Z"
}
}
}
]
}
```

Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of the [neural sparse search documentation]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
Once you have created an ingest pipeline, you need to create an index for document ingestion. To learn more, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).

## Cascaded text chunking processors

You can chain multiple chunking processors together. For example, to split documents into paragraphs, apply the `delimiter` algorithm and specify the parameter as `\n\n`. To prevent a paragraph from exceeding the token limit, append another chunking processor that uses the `fixed_token_length` algorithm. You can configure the ingest pipeline for this example as follows:
You can chain multiple text chunking processors together. For example, to split documents into paragraphs, apply the `delimiter` algorithm and specify the parameter as `\n\n`. To prevent a paragraph from exceeding the token limit, append another text chunking processor that uses the `fixed_token_length` algorithm. You can configure the ingest pipeline for this example as follows:

```json
PUT _ingest/pipeline/text-chunking-cascade-ingest-pipeline
Expand Down Expand Up @@ -309,7 +201,7 @@ PUT _ingest/pipeline/text-chunking-cascade-ingest-pipeline

## Next steps

- For a complete example, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).
- To learn more about semantic search, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
- To learn more about sparse search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
- To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/agents-tools-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,9 @@ nav_order: 10
---

# Agents and tools tutorial
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The following tutorial illustrates creating a flow agent for retrieval-augmented generation (RAG). A flow agent runs its configured tools sequentially, in the order specified. In this example, you'll create an agent with two tools:

1. `VectorDBTool`: The agent will use this tool to retrieve OpenSearch documents relevant to the user question. You'll ingest supplementary information into an OpenSearch index. To facilitate vector search, you'll deploy a text embedding model that translates text into vector embeddings. OpenSearch will translate the ingested documents into embeddings and store them in the index. When you provide a user question to the agent, the agent will construct a query from the question, run vector search on the OpenSearch index, and pass the relevant retrieved documents to the `MLModelTool`.
Expand Down
23 changes: 1 addition & 22 deletions _ml-commons-plugin/agents-tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,9 @@ nav_order: 27
---

# Agents and tools
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

You can automate machine learning (ML) tasks using agents and tools. An _agent_ orchestrates and runs ML models and tools. A _tool_ performs a set of specific tasks. Some examples of tools are the `VectorDBTool`, which supports vector search, and the `CATIndexTool`, which executes the `cat indices` operation. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).

## Agents
Expand Down Expand Up @@ -155,24 +152,6 @@ POST /_plugins/_ml/agents/_register
It is important to provide thorough descriptions of the tools so that the LLM can decide in which situations to use those tools.
{: .tip}

## Enabling the feature

To enable agents and tools, configure the following setting:

```yaml
plugins.ml_commons.agent_framework_enabled: true
```
{% include copy.html %}
For conversational agents, you also need to enable RAG for use in conversational search. To enable RAG, configure the following setting:
```yaml
plugins.ml_commons.rag_pipeline_feature_enabled: true
```
{% include copy.html %}
For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
## Next steps

- For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/agent-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# Agent tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `AgentTool` runs any agent.

## Step 1: Set up an agent for AgentTool to run
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/cat-index-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# CAT Index tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `CatIndexTool` retrieves index information for the OpenSearch cluster, similarly to the [CAT Indices API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/).

## Step 1: Register a flow agent that will run the CatIndexTool
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/index-mapping-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# Index Mapping tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `IndexMappingTool` retrieves mapping and setting information for indexes in your cluster.

## Step 1: Register a flow agent that will run the IndexMappingTool
Expand Down
2 changes: 1 addition & 1 deletion _ml-commons-plugin/agents-tools/tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ redirect_from:
---

# Tools
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }

A _tool_ performs a set of specific tasks. The following table lists all tools that OpenSearch supports.
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/ml-model-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# ML Model tool
**Introduced 2.12**
plugins.ml_commons.rag_pipeline_feature_enabled: true
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `MLModelTool` runs a machine learning (ML) model and returns inference results.

## Step 1: Create a connector for a model
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# Neural Sparse Search tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `NeuralSparseSearchTool` performs sparse vector retrieval. For more information about neural sparse search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).

## Step 1: Register and deploy a sparse encoding model
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/ppl-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,9 @@ grand_parent: Agents and tools
---

# PPL tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `PPLTool` translates natural language into a PPL query. The tool provides an `execute` flag to specify whether to run the query. If you set the flag to `true`, the `PPLTool` runs the query and returns the query and the results.

## Prerequisite
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/rag-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# RAG tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `RAGTool` performs retrieval-augmented generation (RAG). For more information about RAG, see [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).

RAG calls a large language model (LLM) and supplements its knowledge by providing relevant OpenSearch documents along with the user question. To retrieve relevant documents from an OpenSearch index, you'll need a text embedding model that facilitates vector search.
Expand Down
5 changes: 1 addition & 4 deletions _ml-commons-plugin/agents-tools/tools/search-alerts-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# Search Alerts tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `SearchAlertsTool` retrieves information about generated alerts. For more information about alerts, see [Alerting]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/index/).

## Step 1: Register a flow agent that will run the SearchAlertsTool
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@ grand_parent: Agents and tools

<!-- vale off -->
# Search Anomaly Detectors tool
**Introduced 2.12**
**Introduced 2.13**
{: .label .label-purple }
<!-- vale on -->

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).
{: .warning}

The `SearchAnomalyDetectorsTool` retrieves information about anomaly detectors set up on your cluster. For more information about anomaly detectors, see [Anomaly detection]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/).

## Step 1: Register a flow agent that will run the SearchAnomalyDetectorsTool
Expand Down
Loading

0 comments on commit 7bb6edd

Please sign in to comment.