-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586
Merged
david-leifker
merged 9 commits into
datahub-project:master
from
deepgarg-visa:fix_autoCompleteForMultiple_query_result
Oct 15, 2024
Merged
Changes from 3 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
f5181a8
fix(search): make graphql query autoCompleteForMultiple show exact ma…
deepgarg-visa 3910999
Merge branch 'refs/heads/master' into fix_autoCompleteForMultiple_que…
deepgarg-visa 39ff93a
Merge branch 'master' into fix_autoCompleteForMultiple_query_result
deepgarg-visa 0ae7f57
fix(search): fix testcases
deepgarg-visa 97f788a
Merge branch 'master' into fix_autoCompleteForMultiple_query_result
deepgarg-visa 1f8a7ce
Merge branch 'master' into fix_autoCompleteForMultiple_query_result
deepgarg-visa 0d38a8b
Merge remote-tracking branch 'origin/fix_autoCompleteForMultiple_quer…
deepgarg-visa ba9bcba
Merge branch 'master' into fix_autoCompleteForMultiple_query_result
deepgarg-visa 071d032
implement review comments and code refactor
deepgarg-visa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoded to 2.0 and looks like it treats URN matches and non-URN matches the same. Previously it weighted URNs higher which probably doesn't make sense for auto-complete purposes, but now the conditional on line 220 is the same for both conditions. So we should probably simplify the conditional and likely remove the boostScore entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I checked, we can't remove the boostscore entirely. We need to have higher boost score for ngram fields for non-urn fields in "multi_match" clause to get the expected result. I would suggest that we can have boostscore for non-urn fields from their respective searchable spec(which I already did) and urn field can have only default boost score (1.0), as you have rightly mention that it doesn't make sense for auto-complete purposes.
Please find the expected multi_match clause
{ "multi_match": { "query": "aucdl", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^10.0", "name.ngram._3gram^10.0", "name.ngram._4gram^10.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "phrase", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1.0 } }
In case of function at line 266, we can put condition to have default boostscore of 10.0 for non-urn field and boostscore of 1.0 for urn field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there are two things to consider here, the relative weights between the fields being used for the query AND the relative weights of the subfields (delimited, ngram, etc). Most likely we should be applying a multiplier. The URN while important as a unique identifier for general search is not as important here so the urn field weight is likely 1.0.
Next lets consider a multiplier for ngram. I am not seeing these being applied ->
datahub/metadata-service/configuration/src/main/resources/application.yaml
Line 240 in b74ba11
The field weights for ngrams should be impacted by that configuration and it doesn't appear to be based on. the example query above.
Generally the weight formula for ngram subfields should be something like:
<field annotation weight>
*<configuration ngram factor>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @david-leifker, As suggested done the following changes
Please find the final query
{ "multi_match": { "query": "container", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^12.0", "name.ngram._3gram^15.0", "name.ngram._4gram^18.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "bool_prefix", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } }, { "match": { "name.keyword": { "query": "container", "operator": "OR", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "boost": 10 } } }