forked from opensearch-project/ml-commons
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add tutorial for rerank pipeline with Cohere rerank model
Signed-off-by: Yaliang Wu <[email protected]>
- Loading branch information
Showing
1 changed file
with
306 additions
and
0 deletions.
There are no files selected for viewing
306 changes: 306 additions & 0 deletions
306
docs/tutorials/rerank/rerank_pipeline_with_Cohere_Rerank_model.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,306 @@ | ||
# Topic | ||
|
||
Rerank Pipeline is a feature released in OpenSearch 2.12. It can rerank the search result with the relevance score between search query and each document in search result. | ||
The relevance score will be calculated by some cross-encoder model. | ||
|
||
This tutorial explains how to use [Cohere Rerank](https://docs.cohere.com/reference/rerank-1) model in Rerank Pipeline (todo: add doc link). | ||
|
||
Note: Replace the placeholders that start with `your_` with your own values. | ||
|
||
# Steps | ||
|
||
## 1. Create Cohere Rerank model | ||
|
||
Create connector | ||
``` | ||
POST /_plugins/_ml/connectors/_create | ||
{ | ||
"name": "cohere-rerank", | ||
"description": "The connector to Cohere reanker model", | ||
"version": "1", | ||
"protocol": "http", | ||
"credential": { | ||
"cohere_key": "your_cohere_api_key" | ||
}, | ||
"parameters": { | ||
"model": "rerank-english-v2.0" | ||
}, | ||
"actions": [ | ||
{ | ||
"action_type": "predict", | ||
"method": "POST", | ||
"url": "https://api.cohere.ai/v1/rerank", | ||
"headers": { | ||
"Authorization": "Bearer ${credential.cohere_key}" | ||
}, | ||
"request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n} }", | ||
"pre_process_function": "connector.pre_process.cohere.rerank", | ||
"post_process_function": "connector.post_process.cohere.rerank" | ||
} | ||
] | ||
} | ||
``` | ||
Use the connector id in response to create model | ||
``` | ||
POST /_plugins/_ml/models/_register?deploy=true | ||
{ | ||
"name": "cohere rerank model", | ||
"function_name": "remote", | ||
"description": "test rerank model", | ||
"connector_id": "your_connector_id" | ||
} | ||
``` | ||
Note the model id in response, will use it in following steps. | ||
|
||
Test model with predict API | ||
``` | ||
POST _plugins/_ml/models/LrX5wo0B8vrNLhb9Dhjw/_predict | ||
{ | ||
"parameters": { | ||
"query": "What is the capital of the United States?", | ||
"documents": [ | ||
"Carson City is the capital city of the American state of Nevada.", | ||
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.", | ||
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.", | ||
"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." | ||
], | ||
"top_n": 4 | ||
} | ||
} | ||
``` | ||
Sample response | ||
|
||
Explanation of the response: | ||
1. For each result, the `data` array contains relevance score between each doc and query. | ||
2. The output corresponds to the order of the input documents; the first result of similarity pertains to the first document. | ||
This differs from the default 'Cohere Re-rank' model, which prioritizes documents with higher relevance scores at the top. | ||
The order is changed by the post process function `connector.post_process.cohere.rerank`, this is to keep compatible with Rerank Pipeline. | ||
``` | ||
{ | ||
"inference_results": [ | ||
{ | ||
"output": [ | ||
{ | ||
"name": "similarity", | ||
"data_type": "FLOAT32", | ||
"shape": [ | ||
1 | ||
], | ||
"data": [ | ||
0.10194652 | ||
] | ||
}, | ||
{ | ||
"name": "similarity", | ||
"data_type": "FLOAT32", | ||
"shape": [ | ||
1 | ||
], | ||
"data": [ | ||
0.0721122 | ||
] | ||
}, | ||
{ | ||
"name": "similarity", | ||
"data_type": "FLOAT32", | ||
"shape": [ | ||
1 | ||
], | ||
"data": [ | ||
0.98005307 | ||
] | ||
}, | ||
{ | ||
"name": "similarity", | ||
"data_type": "FLOAT32", | ||
"shape": [ | ||
1 | ||
], | ||
"data": [ | ||
0.27904198 | ||
] | ||
} | ||
], | ||
"status_code": 200 | ||
} | ||
] | ||
} | ||
``` | ||
## 2. Rerank Pipeline | ||
### 2.1 Ingest test data | ||
``` | ||
POST _bulk | ||
{ "index": { "_index": "my-test-data" } } | ||
{ "passage_text" : "Carson City is the capital city of the American state of Nevada." } | ||
{ "index": { "_index": "my-test-data" } } | ||
{ "passage_text" : "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan." } | ||
{ "index": { "_index": "my-test-data" } } | ||
{ "passage_text" : "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district." } | ||
{ "index": { "_index": "my-test-data" } } | ||
{ "passage_text" : "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." } | ||
``` | ||
### 2.2 Create Rerank Pipeline | ||
``` | ||
PUT /_search/pipeline/rerank_pipeline_cohere | ||
{ | ||
"description": "Pipeline for reranking with Cohere Rerank model", | ||
"response_processors": [ | ||
{ | ||
"rerank": { | ||
"ml_opensearch": { | ||
"model_id": "your_model_id_created_in_step1" | ||
}, | ||
"context": { | ||
"document_fields": ["passage_text"] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
### 2.2 Test rerank | ||
|
||
``` | ||
GET my-test-data/_search?search_pipeline=rerank_pipeline_cohere | ||
{ | ||
"query": { | ||
"match_all": {} | ||
}, | ||
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text": "What is the capital of the United States?" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
Response | ||
``` | ||
{ | ||
"took": 0, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 4, | ||
"relation": "eq" | ||
}, | ||
"max_score": 0.98005307, | ||
"hits": [ | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zbUOw40B8vrNLhb9vBif", | ||
"_score": 0.98005307, | ||
"_source": { | ||
"passage_text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zrUOw40B8vrNLhb9vBif", | ||
"_score": 0.27904198, | ||
"_source": { | ||
"passage_text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "y7UOw40B8vrNLhb9vBif", | ||
"_score": 0.10194652, | ||
"_source": { | ||
"passage_text": "Carson City is the capital city of the American state of Nevada." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zLUOw40B8vrNLhb9vBif", | ||
"_score": 0.0721122, | ||
"_source": { | ||
"passage_text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan." | ||
} | ||
} | ||
] | ||
}, | ||
"profile": { | ||
"shards": [] | ||
} | ||
} | ||
``` | ||
Test without Rerank Pipeline: | ||
``` | ||
GET my-test-data/_search | ||
{ | ||
"query": { | ||
"match_all": {} | ||
}, | ||
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text": "What is the capital of the United States?" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
Response: | ||
We can see the response returns ""Carson City is the capital city of the American state of Nevad" at top. | ||
``` | ||
{ | ||
"took": 0, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 4, | ||
"relation": "eq" | ||
}, | ||
"max_score": 1, | ||
"hits": [ | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "y7UOw40B8vrNLhb9vBif", | ||
"_score": 1, | ||
"_source": { | ||
"passage_text": "Carson City is the capital city of the American state of Nevada." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zLUOw40B8vrNLhb9vBif", | ||
"_score": 1, | ||
"_source": { | ||
"passage_text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zbUOw40B8vrNLhb9vBif", | ||
"_score": 1, | ||
"_source": { | ||
"passage_text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district." | ||
} | ||
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "zrUOw40B8vrNLhb9vBif", | ||
"_score": 1, | ||
"_source": { | ||
"passage_text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` |