Skip to content

Commit

Permalink
add tutorial for rerank pipeline with Cohere rerank model
Browse files Browse the repository at this point in the history
Signed-off-by: Yaliang Wu <[email protected]>
  • Loading branch information
ylwu-amzn committed Feb 19, 2024
1 parent 311b971 commit 54f0681
Showing 1 changed file with 306 additions and 0 deletions.
306 changes: 306 additions & 0 deletions docs/tutorials/rerank/rerank_pipeline_with_Cohere_Rerank_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
# Topic

Rerank Pipeline is a feature released in OpenSearch 2.12. It can rerank the search result with the relevance score between search query and each document in search result.
The relevance score will be calculated by some cross-encoder model.

This tutorial explains how to use [Cohere Rerank](https://docs.cohere.com/reference/rerank-1) model in Rerank Pipeline (todo: add doc link).

Note: Replace the placeholders that start with `your_` with your own values.

# Steps

## 1. Create Cohere Rerank model

Create connector
```
POST /_plugins/_ml/connectors/_create
{
"name": "cohere-rerank",
"description": "The connector to Cohere reanker model",
"version": "1",
"protocol": "http",
"credential": {
"cohere_key": "your_cohere_api_key"
},
"parameters": {
"model": "rerank-english-v2.0"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://api.cohere.ai/v1/rerank",
"headers": {
"Authorization": "Bearer ${credential.cohere_key}"
},
"request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n} }",
"pre_process_function": "connector.pre_process.cohere.rerank",
"post_process_function": "connector.post_process.cohere.rerank"
}
]
}
```
Use the connector id in response to create model
```
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "cohere rerank model",
"function_name": "remote",
"description": "test rerank model",
"connector_id": "your_connector_id"
}
```
Note the model id in response, will use it in following steps.

Test model with predict API
```
POST _plugins/_ml/models/LrX5wo0B8vrNLhb9Dhjw/_predict
{
"parameters": {
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
],
"top_n": 4
}
}
```
Sample response

Explanation of the response:
1. For each result, the `data` array contains relevance score between each doc and query.
2. The output corresponds to the order of the input documents; the first result of similarity pertains to the first document.
This differs from the default 'Cohere Re-rank' model, which prioritizes documents with higher relevance scores at the top.
The order is changed by the post process function `connector.post_process.cohere.rerank`, this is to keep compatible with Rerank Pipeline.
```
{
"inference_results": [
{
"output": [
{
"name": "similarity",
"data_type": "FLOAT32",
"shape": [
1
],
"data": [
0.10194652
]
},
{
"name": "similarity",
"data_type": "FLOAT32",
"shape": [
1
],
"data": [
0.0721122
]
},
{
"name": "similarity",
"data_type": "FLOAT32",
"shape": [
1
],
"data": [
0.98005307
]
},
{
"name": "similarity",
"data_type": "FLOAT32",
"shape": [
1
],
"data": [
0.27904198
]
}
],
"status_code": 200
}
]
}
```
## 2. Rerank Pipeline
### 2.1 Ingest test data
```
POST _bulk
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Carson City is the capital city of the American state of Nevada." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." }
```
### 2.2 Create Rerank Pipeline
```
PUT /_search/pipeline/rerank_pipeline_cohere
{
"description": "Pipeline for reranking with Cohere Rerank model",
"response_processors": [
{
"rerank": {
"ml_opensearch": {
"model_id": "your_model_id_created_in_step1"
},
"context": {
"document_fields": ["passage_text"]
}
}
}
]
}
```
### 2.2 Test rerank

```
GET my-test-data/_search?search_pipeline=rerank_pipeline_cohere
{
"query": {
"match_all": {}
},
"ext": {
"rerank": {
"query_context": {
"query_text": "What is the capital of the United States?"
}
}
}
}
```
Response
```
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 0.98005307,
"hits": [
{
"_index": "my-test-data",
"_id": "zbUOw40B8vrNLhb9vBif",
"_score": 0.98005307,
"_source": {
"passage_text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
}
},
{
"_index": "my-test-data",
"_id": "zrUOw40B8vrNLhb9vBif",
"_score": 0.27904198,
"_source": {
"passage_text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
}
},
{
"_index": "my-test-data",
"_id": "y7UOw40B8vrNLhb9vBif",
"_score": 0.10194652,
"_source": {
"passage_text": "Carson City is the capital city of the American state of Nevada."
}
},
{
"_index": "my-test-data",
"_id": "zLUOw40B8vrNLhb9vBif",
"_score": 0.0721122,
"_source": {
"passage_text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan."
}
}
]
},
"profile": {
"shards": []
}
}
```
Test without Rerank Pipeline:
```
GET my-test-data/_search
{
"query": {
"match_all": {}
},
"ext": {
"rerank": {
"query_context": {
"query_text": "What is the capital of the United States?"
}
}
}
}
```
Response:
We can see the response returns ""Carson City is the capital city of the American state of Nevad" at top.
```
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my-test-data",
"_id": "y7UOw40B8vrNLhb9vBif",
"_score": 1,
"_source": {
"passage_text": "Carson City is the capital city of the American state of Nevada."
}
},
{
"_index": "my-test-data",
"_id": "zLUOw40B8vrNLhb9vBif",
"_score": 1,
"_source": {
"passage_text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan."
}
},
{
"_index": "my-test-data",
"_id": "zbUOw40B8vrNLhb9vBif",
"_score": 1,
"_source": {
"passage_text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
}
},
{
"_index": "my-test-data",
"_id": "zrUOw40B8vrNLhb9vBif",
"_score": 1,
"_source": {
"passage_text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
}
}
]
}
}
```

0 comments on commit 54f0681

Please sign in to comment.