-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full-text search using SearchVector #8430
base: master
Are you sure you want to change the base?
Full-text search using SearchVector #8430
Conversation
a1e1520
to
ed99452
Compare
6bab3df
to
186cabc
Compare
186cabc
to
f6af36a
Compare
FTR here are the SQL queries generated from the current code for SELECT "repository"."id", "repository"."repository_group_id", "repository"."name", "repository"."dvcs_type", "repository"."url", "repository"."branch", "repository"."codebase", "repository"."description", "repository"."active_status", "repository"."life_cycle_order", "repository"."performance_alerts_enabled", "repository"."expire_performance_data", "repository"."is_try_repo", "repository"."tc_root_url" FROM "repository" WHERE "repository"."name" = 'mozilla-central' LIMIT 21
SELECT "repository"."id", "repository"."repository_group_id", "repository"."name", "repository"."dvcs_type", "repository"."url", "repository"."branch", "repository"."codebase", "repository"."description", "repository"."active_status", "repository"."life_cycle_order", "repository"."performance_alerts_enabled", "repository"."expire_performance_data", "repository"."is_try_repo", "repository"."tc_root_url" FROM "repository" WHERE "repository"."name" = 'mozilla-central' LIMIT 21
SELECT "push"."id", "push"."repository_id", "push"."revision", "push"."author", "push"."time", "repository"."id", "repository"."repository_group_id", "repository"."name", "repository"."dvcs_type", "repository"."url", "repository"."branch", "repository"."codebase", "repository"."description", "repository"."active_status", "repository"."life_cycle_order", "repository"."performance_alerts_enabled", "repository"."expire_performance_data", "repository"."is_try_repo", "repository"."tc_root_url" FROM "push" INNER JOIN "repository" ON ("push"."repository_id" = "repository"."id") WHERE ("push"."repository_id" = 1 AND "push"."id" IN (SELECT "subquery"."push_id" FROM (SELECT DISTINCT U0."push_id", U1."time" FROM "commit" U0 INNER JOIN "push" U1 ON (U0."push_id" = U1."id") WHERE (U1."repository_id" = 1 AND to_tsvector('english'::regconfig, COALESCE(U0."revision", '') || ' ' || COALESCE(U0."author", '') || ' ' || COALESCE(U0."comments", '')) @@ (plainto_tsquery('english'::regconfig, '1916016'))) ORDER BY U1."time" DESC LIMIT 200) subquery)) ORDER BY "push"."time" DESC LIMIT 10
SELECT "commit"."id", "commit"."push_id", "commit"."revision", "commit"."author", "commit"."comments" FROM "commit" WHERE "commit"."push_id" IN (34)
SELECT "commit"."id", "commit"."push_id", "commit"."revision", "commit"."author", "commit"."comments" FROM "commit" WHERE "commit"."push_id" = 34 ORDER BY "commit"."id" DESC LIMIT 20 I think only the 3rd one comes from this patch, let me reformat it: SELECT "push"."id", "push"."repository_id", "push"."revision", "push"."author", "push"."time", "repository"."id", "repository"."repository_group_id", "repository"."name", "repository"."dvcs_type", "repository"."url", "repository"."branch", "repository"."codebase", "repository"."description", "repository"."active_status", "repository"."life_cycle_order", "repository"."performance_alerts_enabled", "repository"."expire_performance_data", "repository"."is_try_repo", "repository"."tc_root_url"
FROM "push"
INNER JOIN "repository" ON ("push"."repository_id" = "repository"."id")
WHERE ("push"."repository_id" = 1 AND "push"."id" IN (
SELECT "subquery"."push_id"
FROM (
SELECT DISTINCT U0."push_id", U1."time"
FROM "commit" U0
INNER JOIN "push" U1 ON (U0."push_id" = U1."id")
WHERE (
U1."repository_id" = 1 AND
to_tsvector('english'::regconfig, COALESCE(U0."revision", '') || ' ' || COALESCE(U0."author", '') || ' ' || COALESCE(U0."comments", '')) @@ (plainto_tsquery('english'::regconfig, '1916016'))
)
ORDER BY U1."time" DESC LIMIT 200
) subquery)
)
ORDER BY "push"."time" DESC
LIMIT 10 which doesn't look too bad. I only wonder about the 3rd nested subquery but I guess that's unavoidable with this strategy. Still I asked chatgpt about simplifying it, here is what I got: SELECT DISTINCT ON (p.id)
p.id,
p.repository_id,
p.revision,
p.author,
p.time,
r.repository_group_id,
r.name,
r.dvcs_type,
r.url,
r.branch,
r.codebase,
r.description,
r.active_status,
r.life_cycle_order,
r.performance_alerts_enabled,
r.expire_performance_data,
r.is_try_repo,
r.tc_root_url
FROM push p
INNER JOIN repository r ON p.repository_id = r.id
INNER JOIN commit c ON c.push_id = p.id
WHERE
p.repository_id = 1
AND to_tsvector('english'::regconfig, COALESCE(c.revision, '') || ' ' || COALESCE(c.author, '') || ' ' || COALESCE(c.comments, ''))
@@ plainto_tsquery('english'::regconfig, '1916016')
ORDER BY p.time DESC
LIMIT 10; Interstingly it still uses the Gin index according to Also I'd be totally fine landing the patch as it is now (especially with the new tests, the index, etc) and tuning the query afterwards. (Note: It could be interesting to see where the 2 last requests come from but that's out of scope for this patch.) |
Bugzilla link
Work in progress document
Before
The search functionality allowed users to search only by author email address or ID. For example:
Endpoint:
http://localhost:8000/api/project/try/push/[email protected]
Result: Returns all revisions authored by [email protected]
After
The search functionality has been enhanced to support more advanced use cases. Users can now search by additional attributes like
bug_numbers
,summary
, etc., using a unified search parameter (search
).The implementation leverages Django's
SearchVector
to perform full-text searches across multiple fields dynamicallyThe search combines
revision
,author
, andcomments
fieldsEndpoint:
http://localhost:8000/api/project/try/push/?search=1906541
Result: Returns results matching the query across relevant fields such as
bug_numbers
,summary
,author
, andrevisions
.17.01.2025_07.48.28_REC.mp4