Merge branch 'main' into default-params

opensearch-project · Mar 29, 2024 · 7bb6edd · 7bb6edd
2 parents 71b1706 + d676a79
commit 7bb6edd
Show file tree

Hide file tree

Showing 46 changed files with 912 additions and 304 deletions.
diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
@@ -81,6 +81,7 @@ Levenshtein
 [Oo]nboarding
 pebibyte
 [Pp]erformant
+[Pp]laintext
 [Pp]luggable
 [Pp]reconfigure
 [Pp]refetch
@@ -92,6 +93,7 @@ pebibyte
 [Pp]reprocess
 [Pp]retrain
 [Pp]seudocode
+[Quantiz](e|ation|ing|er)
 [Rr]ebalance
 [Rr]ebalancing
 [Rr]edownload

diff --git a/_automating-configurations/index.md b/_automating-configurations/index.md
@@ -8,7 +8,7 @@ redirect_from: /automating-configurations/
 ---
 
 # Automating configurations
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 
 You can automate complex OpenSearch setup and preprocessing tasks by providing templates for common use cases. For example, automating machine learning (ML) setup tasks streamlines the use of OpenSearch ML offerings.

diff --git a/_dashboards/dashboards-assistant/index.md b/_dashboards/dashboards-assistant/index.md
@@ -6,14 +6,11 @@ has_children: false
 has_toc: false
 ---
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the feature's progress or to leave feedback, go to the [`dashboards-assistant` repository](https://github.com/opensearch-project/dashboards-assistant) on GitHub or the associated [OpenSearch forum thread](https://forum.opensearch.org/t/feedback-opensearch-assistant/16741).
-{: .warning}
-
 Note that machine learning models are probabilistic and that some may perform better than others, so the OpenSearch Assistant may occasionally produce inaccurate information. We recommend evaluating outputs for accuracy as appropriate to your use case, including reviewing the output or combining it with other verification factors.
 {: .important}
 
 # OpenSearch Assistant for OpenSearch Dashboards
-Introduced 2.12
+**Introduced 2.13**
 {: .label .label-purple }
 
 The OpenSearch Assistant toolkit helps you create AI-powered assistants for OpenSearch Dashboards without requiring you to have specialized query tools or skills.
@@ -49,9 +46,6 @@ A screenshot of the interface is shown in the following image.
 
 <img width="700" src="{{site.url}}{{site.baseurl}}/images/dashboards/opensearch-assistant-full-frame.png" alt="OpenSearch Assistant interface">
 
-For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
-{: .note}
-
 ## Configuring OpenSearch Assistant
 
 You can use the OpenSearch Dashboards interface to configure OpenSearch Assistant. Go to the [Getting started guide](https://github.com/opensearch-project/dashboards-assistant/blob/main/GETTING_STARTED_GUIDE.md) for step-by-step instructions. For the chatbot template, go to the [Flow Framework plugin](https://github.com/opensearch-project/flow-framework) documentation. You can modify this template to use your own model and customize the chatbot tools. 

diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md
@@ -91,7 +91,6 @@ Option | Required | Type | Description
 `document_root_key` | No | String  | The key in the event that will be used as the root in the document. The default is the root of the event. If the key does not exist, then the entire event is written as the document. If `document_root_key` is of a basic value type, such as a string or integer, then the document will have a structure of `{"data": <value of the document_root_key>}`.
 `serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
 `serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
-
 <!-- vale on -->
 
 ## aws

diff --git a/_ingest-pipelines/processors/text-chunking.md b/_ingest-pipelines/processors/text-chunking.md
@@ -157,119 +157,11 @@ The response confirms that, in addition to the `passage_text` field, the process
 }
 ```
 
-Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of the [neural sparse search documentation]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
-
-## Chaining text chunking and embedding processors
-
-You can use a `text_chunking` processor as a preprocessing step for a `text_embedding` or `sparse_encoding` processor in order to obtain embeddings for each chunked passage.
-
-**Prerequisites**
-
-Follow the steps outlined in the [pretrained model documentation]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) to register an embedding model.
-
-**Step 1: Create a pipeline**
-
-The following example request creates an ingest pipeline that converts the text in the `passage_text` field into chunked passages, which will be stored in the `passage_chunk` field. The text in the `passage_chunk` field is then converted into text embeddings, and the embeddings are stored in the `passage_embedding` field:
-
-```json
-PUT _ingest/pipeline/text-chunking-embedding-ingest-pipeline
-{
-  "description": "A text chunking and embedding ingest pipeline",
-  "processors": [
-    {
-      "text_chunking": {
-        "algorithm": {
-          "fixed_token_length": {
-            "token_limit": 10,
-            "overlap_rate": 0.2,
-            "tokenizer": "standard"
-          }
-        },
-        "field_map": {
-          "passage_text": "passage_chunk"
-        }
-      }
-    },
-    {
-      "text_embedding": {
-        "model_id": "LMLPWY4BROvhdbtgETaI",
-        "field_map": {
-          "passage_chunk": "passage_chunk_embedding"
-        }
-      }
-    }
-  ]
-}
-```
-{% include copy-curl.html %}
-
-**Step 2 (Optional): Test the pipeline**
-
-It is recommended that you test your pipeline before ingesting documents.
-{: .tip}
-
-To test the pipeline, run the following query:
-
-```json
-POST _ingest/pipeline/text-chunking-embedding-ingest-pipeline/_simulate
-{
-  "docs": [
-    {
-      "_index": "testindex",
-      "_id": "1",
-      "_source":{
-         "passage_text": "This is an example document to be chunked. The document contains a single paragraph, two sentences and 24 tokens by standard tokenizer in OpenSearch."
-      }
-    }
-  ]
-}
-```
-{% include copy-curl.html %}
-
-#### Response
-
-The response confirms that, in addition to the `passage_text` and `passage_chunk` fields, the processor has generated text embeddings for each of the three passages in the `passage_chunk_embedding` field. The embedding vectors are stored in the `knn` field for each chunk:
-
-```json
-{
-  "docs": [
-    {
-      "doc": {
-        "_index": "testindex",
-        "_id": "1",
-        "_source": {
-          "passage_chunk_embedding": [
-            {
-              "knn": [...]
-            },
-            {
-              "knn": [...]
-            },
-            {
-              "knn": [...]
-            }
-          ],
-          "passage_text": "This is an example document to be chunked. The document contains a single paragraph, two sentences and 24 tokens by standard tokenizer in OpenSearch.",
-          "passage_chunk": [
-            "This is an example document to be chunked. The document ",
-            "The document contains a single paragraph, two sentences and 24 ",
-            "and 24 tokens by standard tokenizer in OpenSearch."
-          ]
-        },
-        "_ingest": {
-          "timestamp": "2024-03-20T03:04:49.144054Z"
-        }
-      }
-    }
-  ]
-}
-```
-
-Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of the [neural sparse search documentation]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
+Once you have created an ingest pipeline, you need to create an index for document ingestion. To learn more, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).
 
 ## Cascaded text chunking processors
 
-You can chain multiple chunking processors together. For example, to split documents into paragraphs, apply the `delimiter` algorithm and specify the parameter as `\n\n`. To prevent a paragraph from exceeding the token limit, append another chunking processor that uses the `fixed_token_length` algorithm. You can configure the ingest pipeline for this example as follows:
+You can chain multiple text chunking processors together. For example, to split documents into paragraphs, apply the `delimiter` algorithm and specify the parameter as `\n\n`. To prevent a paragraph from exceeding the token limit, append another text chunking processor that uses the `fixed_token_length` algorithm. You can configure the ingest pipeline for this example as follows:
 
 ```json
 PUT _ingest/pipeline/text-chunking-cascade-ingest-pipeline
@@ -309,7 +201,7 @@ PUT _ingest/pipeline/text-chunking-cascade-ingest-pipeline
 
 ## Next steps
 
+- For a complete example, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).
 - To learn more about semantic search, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/).
 - To learn more about sparse search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
 - To learn more about using models in OpenSearch, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
-- For a comprehensive example, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/).
diff --git a/_ml-commons-plugin/agents-tools/agents-tools-tutorial.md b/_ml-commons-plugin/agents-tools/agents-tools-tutorial.md
@@ -7,12 +7,9 @@ nav_order: 10
 ---
 
 # Agents and tools tutorial
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The following tutorial illustrates creating a flow agent for retrieval-augmented generation (RAG). A flow agent runs its configured tools sequentially, in the order specified. In this example, you'll create an agent with two tools:
 
 1. `VectorDBTool`: The agent will use this tool to retrieve OpenSearch documents relevant to the user question. You'll ingest supplementary information into an OpenSearch index. To facilitate vector search, you'll deploy a text embedding model that translates text into vector embeddings. OpenSearch will translate the ingested documents into embeddings and store them in the index. When you provide a user question to the agent, the agent will construct a query from the question, run vector search on the OpenSearch index, and pass the relevant retrieved documents to the `MLModelTool`.

diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md
@@ -7,12 +7,9 @@ nav_order: 27
 ---
 
 # Agents and tools
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 You can automate machine learning (ML) tasks using agents and tools. An _agent_ orchestrates and runs ML models and tools. A _tool_ performs a set of specific tasks. Some examples of tools are the `VectorDBTool`, which supports vector search, and the `CATIndexTool`, which executes the `cat indices` operation. For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).
 
 ## Agents
@@ -155,24 +152,6 @@ POST /_plugins/_ml/agents/_register
 It is important to provide thorough descriptions of the tools so that the LLM can decide in which situations to use those tools.
 {: .tip}
 
-## Enabling the feature
-
-To enable agents and tools, configure the following setting:
-
-```yaml
-plugins.ml_commons.agent_framework_enabled: true
-```
-{% include copy.html %}
-
-For conversational agents, you also need to enable RAG for use in conversational search. To enable RAG, configure the following setting:
-
-```yaml
-plugins.ml_commons.rag_pipeline_feature_enabled: true
-```
-{% include copy.html %}
-
-For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
-
 ## Next steps
 
 - For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/).

diff --git a/_ml-commons-plugin/agents-tools/tools/agent-tool.md b/_ml-commons-plugin/agents-tools/tools/agent-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # Agent tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `AgentTool` runs any agent.
 
 ## Step 1: Set up an agent for AgentTool to run

diff --git a/_ml-commons-plugin/agents-tools/tools/cat-index-tool.md b/_ml-commons-plugin/agents-tools/tools/cat-index-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # CAT Index tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `CatIndexTool` retrieves index information for the OpenSearch cluster, similarly to the [CAT Indices API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/).
 
 ## Step 1: Register a flow agent that will run the CatIndexTool

diff --git a/_ml-commons-plugin/agents-tools/tools/index-mapping-tool.md b/_ml-commons-plugin/agents-tools/tools/index-mapping-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # Index Mapping tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `IndexMappingTool` retrieves mapping and setting information for indexes in your cluster.
 
 ## Step 1: Register a flow agent that will run the IndexMappingTool

diff --git a/_ml-commons-plugin/agents-tools/tools/index.md b/_ml-commons-plugin/agents-tools/tools/index.md
@@ -10,7 +10,7 @@ redirect_from:
 ---
 
 # Tools
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 
 A _tool_ performs a set of specific tasks. The following table lists all tools that OpenSearch supports.

diff --git a/_ml-commons-plugin/agents-tools/tools/ml-model-tool.md b/_ml-commons-plugin/agents-tools/tools/ml-model-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # ML Model tool
-**Introduced 2.12**
+plugins.ml_commons.rag_pipeline_feature_enabled: true
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `MLModelTool` runs a machine learning (ML) model and returns inference results. 
 
 ## Step 1: Create a connector for a model

diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # Neural Sparse Search tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `NeuralSparseSearchTool` performs sparse vector retrieval. For more information about neural sparse search, see [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
 
 ## Step 1: Register and deploy a sparse encoding model

diff --git a/_ml-commons-plugin/agents-tools/tools/ppl-tool.md b/_ml-commons-plugin/agents-tools/tools/ppl-tool.md
@@ -9,12 +9,9 @@ grand_parent: Agents and tools
 ---
 
 # PPL tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `PPLTool` translates natural language into a PPL query. The tool provides an `execute` flag to specify whether to run the query. If you set the flag to `true`, the `PPLTool` runs the query and returns the query and the results. 
 
 ## Prerequisite

diff --git a/_ml-commons-plugin/agents-tools/tools/rag-tool.md b/_ml-commons-plugin/agents-tools/tools/rag-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # RAG tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `RAGTool` performs retrieval-augmented generation (RAG). For more information about RAG, see [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).
 
 RAG calls a large language model (LLM) and supplements its knowledge by providing relevant OpenSearch documents along with the user question. To retrieve relevant documents from an OpenSearch index, you'll need a text embedding model that facilitates vector search.

diff --git a/_ml-commons-plugin/agents-tools/tools/search-alerts-tool.md b/_ml-commons-plugin/agents-tools/tools/search-alerts-tool.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # Search Alerts tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `SearchAlertsTool` retrieves information about generated alerts. For more information about alerts, see [Alerting]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/index/).
 
 ## Step 1: Register a flow agent that will run the SearchAlertsTool

diff --git a/_ml-commons-plugin/agents-tools/tools/search-anomaly-detectors.md b/_ml-commons-plugin/agents-tools/tools/search-anomaly-detectors.md
@@ -10,13 +10,10 @@ grand_parent: Agents and tools
 
 <!-- vale off -->
 # Search Anomaly Detectors tool
-**Introduced 2.12**
+**Introduced 2.13**
 {: .label .label-purple }
 <!-- vale on -->
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/1161).    
-{: .warning}
-
 The `SearchAnomalyDetectorsTool` retrieves information about anomaly detectors set up on your cluster. For more information about anomaly detectors, see [Anomaly detection]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/).
 
 ## Step 1: Register a flow agent that will run the SearchAnomalyDetectorsTool