Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add options to configure minimal score and single match score fo normalization processor for Hybrid search #299

Open
martin-gaievski opened this issue Sep 7, 2023 · 4 comments
Labels
backlog All the backlog features should be marked with this label Enhancements Increases software capabilities beyond original client specifications help wanted Extra attention is needed

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem?

There is no way of setting min score for a hit in the final result list from Hybrid query, as well as it's not possible to set a score that will be returned in case there is only one match from all sub-queries for a min-max normalization technique. That may affect relevance of the result.

What solution would you like?

For min score: set the minimal score a part of the configuration for normalization processor. Possible request can look like:

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "l2",
                    "parameters" : { 
                          "min_score" : 0.001 
                    }
                },
                "combination": {
                    "technique": "arithmetic_mean",
                    "parameters" : { 
                          "min_score" : 0.001 
                    }
                }
            }
        }
    ]
}

For a single match score: set the single match score as a parameter for normalization technique. For techniques that do not support such parameter is will be ignored. Possible request can look like:

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "min_max",
                    "parameters" : { 
                          "single_match_score" : 0.0 
                    }
                }
            }
        }
    ]
}

What alternatives have you considered?

With current implementation additional post-processing is required.

Do you have any additional context?

As part of the normalization scores are adjusted to be in the interval [0, .. 1.0]. That means matching doc with the minimal score will receive score 0.0.
For a min-max normalization technique, if there is only one matching document there will be a single score, X. As per formula that leads to a "division by zero" case, as score is calculated as (X - min)/(max - min).

@martin-gaievski martin-gaievski added Enhancements Increases software capabilities beyond original client specifications backlog All the backlog features should be marked with this label untriaged labels Sep 7, 2023
@navneet1v navneet1v added help wanted Extra attention is needed and removed untriaged labels Sep 15, 2023
@heemin32
Copy link
Collaborator

@martin-gaievski, could you elaborate on why setting a minimum value is beneficial? You mentioned that 'As part of the normalization, scores are adjusted to be in the interval [0, .. 1.0]. That means a matching document with the minimal score will receive a score of 0.0.' Would there be any issues if a document with the minimal score is assigned 0.0 after normalization?

Regarding single_match_score, does it currently return Infinity? Wouldn't it be more intuitive to simply return 1 as the normalized score, rather than requiring the user to provide one?

@martin-gaievski
Copy link
Member Author

martin-gaievski commented Nov 24, 2024

@martin-gaievski, could you elaborate on why setting a minimum value is beneficial? You mentioned that 'As part of the normalization, scores are adjusted to be in the interval [0, .. 1.0]. That means a matching document with the minimal score will receive a score of 0.0.' Would there be any issues if a document with the minimal score is assigned 0.0 after normalization?

Yes, it's an issue, as in most scenarios score 0.0 means this document is no match, so may drop a valid search hit.

Regarding single_match_score, does it currently return Infinity? Wouldn't it be more intuitive to simply return 1 as the normalized score, rather than requiring the user to provide one?

This currently makes sense for min/max normalization, and yes we do return 1.0 in case of a single matching doc. My idea was to keep default behavior same, but override if you provided a value for "single_match_score"

@heemin32
Copy link
Collaborator

Got it. I think it's a good idea to provide a feature that allows users to override those values, but it would be great if we could also offer the best possible default values.

@jdomkline
Copy link

jdomkline commented Dec 18, 2024

I love this feature request. As a workaround, I'm using the following kludge that may be brittle. I'm doing a HYB with a LEX and NEURAL sub queries

  1. I wrap the LEX query in a script_score to lower the minscores down to 0.
    "script_score": {
    "query": {..},
    "script": {
    "source": "if (_score >= 10) { return _score; } else { return 0; }"
    }
  2. Then in the Search request I apply the minscore with the min in the Neural range, e.g. "min_score" : 0.002.

This trims off both the undesired LEX and Neural results. Is it, however, too much of a hack? I plan to recheck it with each release (I'm on AWS OSS v2.17)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog All the backlog features should be marked with this label Enhancements Increases software capabilities beyond original client specifications help wanted Extra attention is needed
Projects
Status: Backlog
Development

No branches or pull requests

4 participants