-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PosgreSQL hybrid search #958
base: main
Are you sure you want to change the base?
Conversation
Include: - Text search index - Hybrid Search configuration - Vector or hybrid search
@microsoft-github-policy-service agree |
- Index name should be different per table. - Add missing filter
@@ -175,7 +183,8 @@ public async Task CreateTableAsync( | |||
{this._colContent} TEXT DEFAULT '' NOT NULL, | |||
{this._colPayload} JSONB DEFAULT '{{}}'::JSONB NOT NULL | |||
); | |||
CREATE INDEX IF NOT EXISTS idx_tags ON {tableName} USING GIN({this._colTags}); | |||
CREATE INDEX IF NOT EXISTS {tableName}_idx_tags ON {tableName} USING GIN({this._colTags}); | |||
CREATE INDEX IF NOT EXISTS {tableName}_idx_content ON {tableName} USING GIN(to_tsvector('english',{this._colContent})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the language be configurable?
FROM {tableName}, plainto_tsquery('english', @query) query | ||
WHERE {filterSqlHybridText} AND to_tsvector('english', {this._colContent}) @@ query | ||
ORDER BY ts_rank_cd(to_tsvector('english', {this._colContent}), query) DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: language should be configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost ready to merge, only a few minor tweaks, please see the comments inline. Two most important:
- configurable language
- documenting the new SQL and the hard coded calculations
Looks like the PR got stale, with some unsolved errors and comments. We might have to archive it unless someone can kindly complete the task. |
I will resolve the comments this weekend. |
Add parametrization to text search language dictionary and parametrization of the Reciprocal Ranked Fusion "k-nearest neighbor" to score results of Hybrid Search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished to review the changes
This PR Includes:
Motivation and Context (Why the change? What's the scenario?)
Vector search do not provide the best results in an important number of scenarios. Hybrid search provides better results.
High level description (Approach, Design)
This change includes a new parameter for activate hybrid search on PostgreSQL extension. This parameters defaults to the previous implementation (vector search).
The search will use vector search or hybrid search depending on the parameter.