[Feature Request] Introduce a rescore search response processor #15631

mingshl · 2024-09-03T22:32:17Z

Is your feature request related to a problem? Please describe

When a field with score is added by a response processor, we try to sort the response by the added field.

in the search response, there are multiple documents in hits. a rescore search response processor can take the score from a field from the document, and sort the search hits based on the field value. Users can opt to remove the field in the documents.

Describe the solution you'd like

The proposed solution as follows:

PUT /_search/pipeline/my_pipeline
{
  "response_processors": [
    {
      "rescore": {
        "field": "similarity_score",
        "sort": "desc",
        "replace_score": true 
        "remove_sort_field" : true
      }
    }
  ]
}

for example:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "diary_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "diary": "how are you",
          "similarity_score": -11.055182
        }
      },
      {
        "_index": "diary_index",
        "_id": "2",
        "_score": 1,
        "_source": {
          "diary": "today is sunny",
          "similarity_score": 8.969885
        }
      },
      {
        "_index": "diary_index",
        "_id": "3",
        "_score": 1,
        "_source": {
          "diary": "today is july fifth",
          "similarity_score": -5.736348
        }
      },
      {
        "_index": "diary_index",
        "_id": "4",
        "_score": 1,
        "_source": {
          "diary": "it is winter",
          "similarity_score": -10.045217
        }
      }
    ]
  }
}

the response after processing:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [

{
        "_index": "diary_index",
        "_id": "2",
        "_score": 8.969885,
        "_source": {
          "diary": "today is sunny"
        }
      },
      {
        "_index": "diary_index",
        "_id": "3",
        "_score": -5.736348,
        "_source": {
          "diary": "today is july fifth"
        }
      },
      {
        "_index": "diary_index",
        "_id": "4",
        "_score": -10.045217,
        "_source": {
          "diary": "it is winter"
        }
      }
      , {
        "_index": "diary_index",
        "_id": "1",
        "_score": -11.055182,
        "_source": {
          "diary": "how are you"
        }
      }
    ]
  }
}

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

mch2 · 2024-09-04T16:16:36Z

tagging @msfroh @martin-gaievski

msfroh · 2024-09-12T21:04:39Z

Makes sense -- if a field was added by response processor, the only way that you can really do something with it (before sending back to the client) is with another response processor.

With the ever-present "naming is hard" challenge, I'm not sure I like the name rescore. Maybe sort? Or sort_by_field?

msfroh · 2024-09-12T21:22:49Z

In theory, if we really wanted to get into the "do one thing" philosophy for search processors, then we could use the remove_field response processor to remove the field after sorting, rather than including it in this processor.

Similarly/alternatively, we could add a copy response processor that copies the field value into _score. (That could also be done with a shared copy document processor, if we ever get those.) Of course, the copy ingest processor already does two things by having the remove_source parameter.

mingshl · 2024-09-12T21:30:46Z

Makes sense -- if a field was added by response processor, the only way that you can really do something with it (before sending back to the client) is with another response processor.

With the ever-present "naming is hard" challenge, I'm not sure I like the name rescore. Maybe sort? Or sort_by_field?

there is a sort response processor already, but it's sorting a array field in the document.

yep, naming is hard. sort_by_field is straight forward but a bit long.

brianf-aws · 2024-09-19T21:50:10Z

Hey y'all, I would like to work on this feature! I like Rescore since its taking a already scored value e.g. _score": 1 and changing it to something more relevant. The closest thing I can think of is refineScore as a name.

I have some questions I want to bring to the table:

What do we do with Tiebreakers? Should we use First item in with same score to be shown earlier?
How do we want to handle failure? Should we just return the values that were given before hand?
- One thing I was thinking is when the field provided doesn't exist Do we return the previous response or stop the pipeline
What are some ways that users can mishandle the processor? _I'm new to crafting processors but would like some context
What do we do when the field to sort by field is not a singleton number?

brianf-aws · 2024-09-24T18:56:16Z

Adding this META GH enhancement for ML Inference

brianf-aws · 2024-09-28T00:55:08Z

I did some research and I think we might be able to leave out the sorting functionality as there exists a sort results functionality within OS core you can see here , this feature allows you to search within the search hits.

In essence this allows you to define a search pipeline with the rescore processor and then after you are done you could sort by the _score field. I feel that more discussion is needed though.

Here is an example

GET /my-index/_search?search_pipeline=book_pipeline
{
  "sort": [
    {
      "rating": {
        "order": "desc"
      }
    }
  ]
}

mingshl · 2024-09-28T01:00:52Z

I did some research and I think we might be able to leave out the sorting functionality as there exists a sort results functionality within OS core you can see here , this feature allows you to search within the search hits.

In essence this allows you to define a search pipeline with the rescore processor and then after you are done you could sort by the _score field. I feel that more discussion though.

Here is an example
GET /my-index/_search?search_pipeline=book_pipeline

{

  "sort": [

    {

      "rating": {

        "order": "desc"

      }

    }

  ]

}

Good that you found out the sort function in the query. The intention for rescore and sort is that the new score (rather than BM25 score) is added during the search response processors, then we can utilize the new score field (which not exists in the document either) then rescore processor can be helpful for ranking the hits and replace the scores.

brianf-aws · 2024-10-01T00:55:46Z

After discussing (offline) we have some suggestions to this processor

Naming it rescore maybe a bit misleading we had devs thinking this was applying an algorithm in reality it was not. Here are some names they thought of ScoreAdjustmentProcessor, ScoreModifierProcessor, ScoreUpdaterProcessor, RankOverProcessor , RefineScoreProcessor, ReorderProcessor.
deciding to keep only field_to_sort_by as a mandatory field and give the possibility for the sort order to be null this gives the flexibility for the user to apply the sorting on the search query as mentioned above.
- Given that we are leaving this field optional we will have to think about renaming the fields as well such as field_to_sort_by ->target_field
Add a manual check to see if (replaceScore: False, remove_sort_field: True) because in this scenario the user deletes a field that the result is sorted by but doesnt know why as the field is removed. When this occurs we will throw an exception, because its possible for the user to turn on remove_sort_field because the default is set to false

ohltyler · 2024-11-19T20:15:17Z

@brianf-aws / @mingshl can this be resolved?

brianf-aws · 2024-11-19T20:35:45Z

Yes we can close it as this was implemented here opensearch-project/neural-search#932

mingshl added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 3, 2024

github-actions bot added the Search Search query, autocomplete ...etc label Sep 3, 2024

github-project-automation bot added this to Search Project Board Sep 3, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Sep 3, 2024

mch2 removed the untriaged label Sep 4, 2024

mingshl mentioned this issue Sep 5, 2024

[RFC] Support pass query string to add in model input in ml inference search response processor opensearch-project/ml-commons#2897

Closed

mingshl assigned brianf-aws Sep 25, 2024

brianf-aws mentioned this issue Sep 26, 2024

[BUG] ignoreFailure boolean Value is not used in Search Response Processors #16100

Closed

mingshl moved this from 🆕 New to Now(This Quarter) in Search Project Board Sep 30, 2024

mingshl added the v2.18.0 Issues and PRs related to version 2.18.0 label Sep 30, 2024

mingshl added this to OpenSearch Roadmap Sep 30, 2024

github-project-automation bot moved this to New in OpenSearch Roadmap Sep 30, 2024

mingshl moved this from New to In Progress in OpenSearch Roadmap Sep 30, 2024

mingshl added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Sep 30, 2024

brianf-aws mentioned this issue Oct 7, 2024

[Meta] ReRankProcessor enhancement: ReRank by Field opensearch-project/neural-search#926

Closed

brianf-aws mentioned this issue Oct 14, 2024

ByFieldRerank Processor (ReRankProcessor enhancement) opensearch-project/neural-search#932

Merged

5 tasks

mingshl removed the Roadmap:Vector Database/GenAI Project-wide roadmap label label Oct 16, 2024

sean-zheng-amazon removed this from OpenSearch Roadmap Oct 21, 2024

mingshl closed this as completed Nov 19, 2024

github-project-automation bot moved this from Now(This Quarter) to ✅ Done in Search Project Board Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Introduce a rescore search response processor #15631

[Feature Request] Introduce a rescore search response processor #15631

mingshl commented Sep 3, 2024 •

edited

Loading

mch2 commented Sep 4, 2024

msfroh commented Sep 12, 2024

msfroh commented Sep 12, 2024

mingshl commented Sep 12, 2024 •

edited

Loading

brianf-aws commented Sep 19, 2024 •

edited

Loading

brianf-aws commented Sep 24, 2024 •

edited

Loading

brianf-aws commented Sep 28, 2024 •

edited

Loading

mingshl commented Sep 28, 2024

brianf-aws commented Oct 1, 2024

ohltyler commented Nov 19, 2024

brianf-aws commented Nov 19, 2024

[Feature Request] Introduce a rescore search response processor #15631

[Feature Request] Introduce a rescore search response processor #15631

Comments

mingshl commented Sep 3, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

mch2 commented Sep 4, 2024

msfroh commented Sep 12, 2024

msfroh commented Sep 12, 2024

mingshl commented Sep 12, 2024 • edited Loading

brianf-aws commented Sep 19, 2024 • edited Loading

brianf-aws commented Sep 24, 2024 • edited Loading

brianf-aws commented Sep 28, 2024 • edited Loading

mingshl commented Sep 28, 2024

brianf-aws commented Oct 1, 2024

ohltyler commented Nov 19, 2024

brianf-aws commented Nov 19, 2024

mingshl commented Sep 3, 2024 •

edited

Loading

mingshl commented Sep 12, 2024 •

edited

Loading

brianf-aws commented Sep 19, 2024 •

edited

Loading

brianf-aws commented Sep 24, 2024 •

edited

Loading

brianf-aws commented Sep 28, 2024 •

edited

Loading