-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add architecture for record linkage #160
feat: add architecture for record linkage #160
Conversation
This enables to search through available CVE containers in the database using a very simple PostgreSQL TS vector search.
According to the specs of the VM.
With the fields and indices added, searching for `python` in /triage takes ~1s (previously, it would take ~7.5s). The triggers added will make sure that the vector searches get updated as rows of data get added, deleted or updated.
I took #84 as starting point. The rough roadmap is:
|
I just noticed I mistakenly took away the query filter in commit 7d58699, so it indeed was not filtering and the query is not hitting the indices. I'll move on with UI and architectural tasks and I'll come back to performance later. |
src/website/shared/migrations/0025_affectedproduct_search_vector_and_more.py
Show resolved
Hide resolved
The previous query failed to hit the GIN indices created for the dedicated SearchVector fields introduced. Now the query makes use of them and returns results for a `python` search in ~1.5s.
The general workflow gets introduced by generated random matches using record linkage toolkit.
This increases its utility for inserting test data during development.
@alejandrosame can we salvage anything from here except for more self-descriptive names? Seems like the business logic was ported a while ago. |
V1 already implemented in #254 |
I didn't noticed this question before. I don't think so. The original discussions were nudging the implementation to use or at least consider the recordlinkage python library, so if anything #254 should have taken into account these changes. Since clearly code wasn't reused there, anything going forward should just go from there. The vision for the record linkage wasn't completely clear before attempting this anyway. |
No description provided.