-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: push down where clause into vector index #503
Comments
DefinitionAs Index cond pushdown could only accept format like: Example: SELECT * FROM t
WHERE val <<->> ball('[0.5,0.5,0.5]'::vector, 0.99)
ORDER BY val <-> '[0.5,0.5,0.5]' limit 10;
SELECT * FROM t
WHERE val0 <<=>> ball('[0, 0, 0]'::vector, 1)
AND val1 <<=>> ball('[0, 0, 0]'::vecf16, 1);
SELECT * FROM t
WHERE val <<#>> ball('[0.5,0.5,0.5]'::vector, 0.9); Semantics:
OperationsValidate:
Pushdown:
Execution: graph LR
7[Index query] -->|select| 4[SELECT Result]
1[Where Clause] -->|not pushdown| 3[Filter]
3[Filter] -->|apply| 4[Index SCAN Result]
4[Index SCAN Result] -->|filter| 8[SELECT Result]
1[Where Clause] -->|pushdown| 2[Index Cond]
2[Index Cond] -->|early stop| 5[Index Query]
5[Index query] -->|select| 6[SELECT Result]
ExperimentConclusion: The source of the original query statement is outside the If input vectors: The result of original query:
|
filter | inserted | filtered vectors | searched / recall |
---|---|---|---|
cos dis < 0.97 |
0.45M | 202 / 0.045% | 3 / 1.4% |
cos dis < 0.98 |
0.45M | 957 / 0.2% | 12 / 1.3% |
cos dis < 0.99 |
0.45M | 3981 / 0.8% | 73 / 1.8% |
filter | inserted | filtered vectors | searched / recall |
---|---|---|---|
dot dis < -0.4 |
0.45M | 223 / 0.049% | 5 / 2.2% |
dot dis < -0.2 |
0.45M | 1843 / 0.4% | 20 / 1.1% |
dot dis < 0 |
0.45M | 9744 / 2.1% | 114 / 1.2% |
filter | inserted | filtered vectors | searched / recall |
---|---|---|---|
l2 dis < 190 |
0.45M | 137 / 0.034% | 8 / 5.8% |
l2 dis < 195 |
0.45M | 1418 / 0.3% | 195 / 13.7% |
l2 dis < 200 |
0.45M | 8448 / 1.8% | 1825 / 21.6% |
l2 dis < 205 |
0.45M | 33431 / 7.4% | 17712 / 52.9% |
The result of improved query: $source(640) = [0.5, 0.5, ..., 0.5]$
filter | inserted | filtered vectors | searched vectors / recall |
---|---|---|---|
cos dis < 0.11 |
0.45M | 12 / 0.026% | 10 / 83.3% |
cos dis < 0.115 |
0.45M | 423 / 0.94% | 376 / 88.9% |
cos dis < 0.117 |
0.45M | 1261 / 2.80% | 1232 / 97.7% |
cos dis < 0.12 |
0.45M | 5134 / 11.4% | 5104 / 99.4% |
filter | inserted | filtered vectors | searched / recall |
---|---|---|---|
dot dis < -175 |
0.45M | 16 / 0.035% | 16 / 100.0% |
dot dis < -173 |
0.45M | 96 / 0.21% | 93 / 96.8% |
dot dis < -171 |
0.45M | 613 / 1.36% | 586 / 95.5% |
dot dis < -170 |
0.45M | 1421 / 3.15% | 1361 / 95.7% |
dot dis < 168 |
0.45M | 6477 / 14.39% | 6130 / 94.6% |
filter | inserted | filtered vectors | searched vectors |
---|---|---|---|
l2 dis < 46 |
0.45M | 12 / 0.026% | 7 / 58.3% |
l2 dis < 47 |
0.45M | 143 / 0.31% | 56 / 39.1% |
l2 dis < 48 |
0.45M | 951 / 2.11% | 851 / 89.4% |
l2 dis < 49 |
0.45M | 4696 / 10.43% | 4621 / 98.4% |
l2 dis < 50 |
0.45M | 17033 / 37.85% | 16893 / 99.2% |
Thanks for working on this! Out of curiosity, does this handle relaxed monotonicity? Some time ago I experimented with a plpgsql function that iterates through results and stops at the first distance above the threshold, but I noticed the recall was lower because there can be tuples below the threshold after that. |
@mertalev Ideally this can be done with relaxed monotonicity, and vbase original paper covered this scenario with high recall at 98%. We're trying to replicate the results now |
Ref: #502
Early stop index iterator with
where distance<0.99
clauseThe text was updated successfully, but these errors were encountered: