-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question Re. Image Search Query With Distances Annotated As A New Column #501
Comments
You can get it to use the index and still do pre-filtering with a query like this: SELECT
segment.*,
embedding <=> :embedding AS distance
FROM
segment
INNER JOIN
asset ON segment.asset_id = asset.id
WHERE
distance < 0.99
AND segment.category = 'visual'
AND asset.collection_id = :collection_id
ORDER BY
embedding <=> :embedding
LIMIT 100 This will still be slower than the post-filtering approach because it needs to evaluate the cosine distance for (potentially many) assets in the WHERE condition, but the recall should be correct. I used |
The last SQL should be the right one. Does it still return only one result? Or multiple results but to be slow? And vector index is an approximate index which means the results might not be the exact same as the exact one (without using the index).' For your cases, if you don't have too many categories, you can put category_id and vector in one table and create partial index on each category. This will accelerate the filter condition. |
Unfortunately it's hard to co-optimize the vector search and filter condition together. We have some internal experiments and only a very limited cases can have improvments. |
@mertalev Have you had success running that exact query? When I tried running it, I get this error saying "distance" is not defined: ERROR: column "distance" does not exist
LINE 10: distance < 0.99 I also tried another variant of what you suggested but ran into the same error: WITH distances AS (
SELECT
segment.*,
embedding <=> :embedding
AS distance
FROM
segment
INNER JOIN
asset ON segment.asset_id = asset.id
WHERE
distance < 0.99
AND segment.category = 'visual'
AND asset.collection_id = :collection_id
ORDER BY
embedding <=> :embedding
)
SELECT distances.id, distances.distance, distances.category
FROM distances
LIMIT 100
; |
@VoVAllen the last SQL I'd shared in the opening comment does give correct number of results, and yes I'm keeping in mind that the index gives approx matches so the results won't necessarily be identical to brute force. Here's the query plan visualised: PS: You can click on "Raw" to see the exact query planner output and "Query" for the query that generated it.
I see. I'm slightly confused by this, because I was under the impression that this was one of the promises of the VBASE algorithm? Are there ways you would adjust what my query is trying to achieve to leverage the indexes and vbase correctly that I'm missing out on? |
Just a thought: are compound indexes with a vector + non-vector column supported? If so, perhaps that could help?
Unfortunately, I don't think this will work for our use case |
Did some further experiments, and we found that returning the distance itself in a new column has negligible cost, and the part of the query responsible for the slowdown is the To be more speicifc, the commented out line is responsible for this query to go from 269ms -> 16200ms: WITH distances AS (
SELECT
segment.id, segment.category,
embedding <=> '[0.048492431640625,0.004207611083984375,0.020538330078125,-0.06622314453125,-0.009918212890625,0.0819091796875,-0.0247344970703125,-0.01090240478515625,0.0679931640625,-0.040924072265625,-0.0034122467041015625,0.072021484375,0.0008335113525390625,-0.006832122802734375,0.00824737548828125,-0.03314208984375,0.025909423828125,0.04852294921875,-0.0171661376953125,0.0048370361328125,-0.0230712890625,-0.01190185546875,-0.0043792724609375,0.01007080078125,-0.052276611328125,0.0180816650390625,-0.00858306884765625,0.0078582763671875,-0.050811767578125,0.007328033447265625,0.0712890625,0.033416748046875,-0.03240966796875,-0.0029735565185546875,-0.0399169921875,-0.07537841796875,-0.0631103515625,-0.02191162109375,-0.0292510986328125,0.063232421875,0.093017578125,0.022674560546875,-0.0186920166015625,0.0041046142578125,-0.0300140380859375,-0.0941162109375,0.038970947265625,0.036651611328125,-0.0223388671875,0.002079010009765625,0.00818634033203125,-0.0187225341796875,0.031982421875,-0.01050567626953125,-0.00460052490234375,-0.0517578125,-0.0269317626953125,-0.038055419921875,0.007781982421875,0.038482666015625,-0.05145263671875,0.00878143310546875,0.021575927734375,0.053375244140625,0.0005512237548828125,0.03448486328125,-0.01157379150390625,0.0237274169921875,-0.05712890625,0.099365234375,-0.0019664764404296875,0.056243896484375,-0.00019276142120361328,-0.027069091796875,0.1966552734375,-0.01096343994140625,0.00528717041015625,-0.0244140625,-0.0211944580078125,0.0116424560546875,-0.053863525390625,-0.042266845703125,-0.0241546630859375,0.016571044921875,0.01535797119140625,0.0203704833984375,-0.036376953125,0.029815673828125,-0.048858642578125,-0.027984619140625,0.0044097900390625,0.049072265625,-0.06146240234375,0.002292633056640625,-0.01023101806640625,-0.0404052734375,0.03216552734375,-0.0093536376953125,-0.01009368896484375,0.03692626953125,-0.0257568359375,0.0193939208984375,0.004894256591796875,0.005229949951171875,-0.007106781005859375,0.06427001953125,0.007354736328125,0.050048828125,0.05267333984375,0.044952392578125,-0.0908203125,-0.02679443359375,0.01000213623046875,0.0247955322265625,0.0106658935546875,-0.0261688232421875,0.0207672119140625,0.045806884765625,0.035491943359375,-0.014556884765625,0.0421142578125,-0.045654296875,-0.048065185546875,0.0234832763671875,-0.14111328125,-0.004230499267578125,0.024566650390625,-0.0244293212890625,0.00693511962890625,0.01145172119140625,0.040283203125,-0.06060791015625,0.00921630859375,0.029083251953125,-0.09417724609375,0.028656005859375,-0.00201416015625,0.09478759765625,-0.01023101806640625,0.0249786376953125,0.018310546875,-0.02783203125,-0.022979736328125,-0.032073974609375,-0.0009412765502929688,0.00971221923828125,0.0071563720703125,-0.06329345703125,-0.01146697998046875,0.0149078369140625,0.0007505416870117188,0.01476287841796875,-0.0672607421875,-0.05224609375,-0.004695892333984375,0.006256103515625,0.00579833984375,0.0203704833984375,-0.01812744140625,-0.003818511962890625,0.007167816162109375,0.041107177734375,-0.01318359375,0.01480865478515625,-0.05267333984375,-0.0017557144165039062,0.011199951171875,-0.0153656005859375,0.0550537109375,0.03387451171875,-0.0196990966796875,0.033782958984375,-0.0158233642578125,0.0,0.037567138671875,0.0020351409912109375,-0.0206451416015625,-0.0242767333984375,0.037933349609375,0.03271484375,-0.035003662109375,-0.0303802490234375,-0.0282440185546875,0.0411376953125,0.023040771484375,0.183349609375,0.034759521484375,-0.01788330078125,0.07275390625,-0.025360107421875,0.0416259765625,0.037567138671875,-0.04315185546875,-0.06610107421875,0.001766204833984375,0.06866455078125,0.0164642333984375,-0.014373779296875,-0.033905029296875,-0.01079559326171875,-0.03521728515625,0.07550048828125,-0.01229095458984375,0.0278778076171875,0.039764404296875,-0.046722412109375,-0.0186309814453125,0.033660888671875,0.00748443603515625,0.0007295608520507812,-0.0947265625,0.04132080078125,-0.0140380859375,0.01447296142578125,-0.051544189453125,-0.055419921875,0.02691650390625,0.0095062255859375,-0.01568603515625,0.004299163818359375,-0.041656494140625,-0.0280303955078125,0.07684326171875,0.00970458984375,0.04351806640625,-0.078125,-0.0384521484375,0.0222930908203125,-0.003314971923828125,-0.055999755859375,0.01373291015625,0.01485443115234375,-0.04168701171875,0.04571533203125,-0.04693603515625,-0.0166473388671875,-0.072998046875,-0.0240478515625,0.002040863037109375,0.0220794677734375,-0.0009369850158691406,-0.027069091796875,0.04248046875,-0.0750732421875,-0.0960693359375,-0.03765869140625,-0.0173187255859375,-0.0325927734375,0.050750732421875,-0.052520751953125,-0.038818359375,-0.177001953125,-0.0188751220703125,0.0130462646484375,0.04058837890625,0.042236328125,-0.03338623046875,-0.031982421875,0.04229736328125,-0.05474853515625,0.0287017822265625,0.007640838623046875,-0.033447265625,0.0033664703369140625,-0.055694580078125,-0.0714111328125,-0.036346435546875,0.00142669677734375,-0.03070068359375,0.0299530029296875,-0.03369140625,-0.00966644287109375,0.005718231201171875,0.0187225341796875,-0.0213775634765625,-0.0185546875,-0.04180908203125,-0.0450439453125,-0.0229034423828125,0.04693603515625,0.036163330078125,0.061004638671875,-0.0672607421875,0.0009226799011230469,-0.046600341796875,-0.0242767333984375,-0.058624267578125,-0.1256103515625,0.023223876953125,0.0295867919921875,-0.046234130859375,0.0163421630859375,0.050384521484375,-0.025115966796875,0.037994384765625,0.027984619140625,0.0261383056640625,-0.0117645263671875,-0.004352569580078125,0.00960540771484375,0.07293701171875,0.062225341796875,-0.01007080078125,-0.02685546875,-0.0005059242248535156,-0.0122833251953125,-0.024749755859375,0.036529541015625,-0.03192138671875,0.015869140625,0.0777587890625,0.0109405517578125,-0.050018310546875,0.06787109375,0.08477783203125,-0.035736083984375,0.01482391357421875,0.05218505859375,-0.004047393798828125,-0.0165252685546875,0.1019287109375,-0.04510498046875,0.022186279296875,0.005382537841796875,0.004451751708984375,0.01129913330078125,-0.048980712890625,0.047027587890625,-0.0146026611328125,-0.012939453125,-0.0210418701171875,-0.0308990478515625,-0.057220458984375,0.0005764961242675781,0.01168060302734375,-0.004848480224609375,0.01641845703125,-0.0226898193359375,0.0187835693359375,0.1494140625,0.055328369140625,-0.0178985595703125,-0.0169219970703125,-0.043792724609375,-0.0238800048828125,-0.013580322265625,0.0104522705078125,0.0526123046875,-0.033050537109375,0.00775909423828125,-0.025238037109375,0.00724029541015625,-0.025909423828125,-0.00861358642578125,0.040679931640625,0.1619873046875,0.0263519287109375,-0.0229034423828125,0.0101165771484375,-0.041107177734375,-0.0301361083984375,0.006916046142578125,-0.0175323486328125,0.01502227783203125,-0.0161895751953125,-0.01323699951171875,-0.004383087158203125,-0.0087432861328125,-0.0511474609375,0.00356292724609375,0.011383056640625,-0.0548095703125,-0.039947509765625,0.045318603515625,-0.01392364501953125,-0.01788330078125,0.025634765625,0.00868988037109375,-0.00244903564453125,0.0030956268310546875,0.0032634735107421875,0.06744384765625,0.0064697265625,0.005268096923828125,-0.014190673828125,-0.01392364501953125,0.01105499267578125,0.0307769775390625,-0.017425537109375,0.002033233642578125,0.034698486328125,-0.0250244140625,-0.00201416015625,-0.0233154296875,-0.019287109375,0.0181884765625,0.008575439453125,-0.03564453125,0.045379638671875,0.003162384033203125,0.0053863525390625,0.011322021484375,-0.01247406005859375,-0.04608154296875,0.02166748046875,0.0238189697265625,0.022125244140625,0.0035915374755859375,0.022064208984375,0.00432586669921875,-0.03546142578125,0.01241302490234375,0.02740478515625,0.004619598388671875,-0.09027099609375,-0.052764892578125,0.0325927734375,-0.02593994140625,0.0094146728515625,-0.01071929931640625,0.01453399658203125,0.00960540771484375,0.05364990234375,0.003635406494140625,-0.020416259765625,-0.0279083251953125,-0.004726409912109375,-0.007053375244140625,0.004291534423828125,-0.00629425048828125,-0.0200958251953125,-0.049163818359375,0.02899169921875,-0.06494140625,-0.0117034912109375,-0.04913330078125,0.0148468017578125,-0.070556640625,0.04052734375,0.0009059906005859375,0.021759033203125,-0.0352783203125,-0.0341796875,-0.050262451171875,0.00033593177795410156,0.043792724609375,-0.024658203125,0.0182952880859375,-0.01245880126953125,-0.01078033447265625,-0.0157012939453125,0.045257568359375,0.0224609375,-0.0308074951171875,0.06298828125,0.0051727294921875,0.00559234619140625,-0.025604248046875,-0.0242919921875,0.01508331298828125,-0.01375579833984375,0.0262298583984375,-0.043365478515625,-0.032958984375,-0.0101776123046875,-0.0250244140625,-0.0118408203125,0.0208282470703125,-0.006626129150390625,0.047698974609375,0.0027217864990234375,-0.023956298828125,-0.0016622543334960938,0.051513671875,-0.001529693603515625,-0.0250396728515625,-0.0016460418701171875,-0.0206451416015625,0.047576904296875,-0.06134033203125,0.03338623046875,-0.0251922607421875,0.17041015625,0.04351806640625,-0.07464599609375,0.03369140625,1.4543533325195312e-05,-0.0011463165283203125,-0.048004150390625,0.00016164779663085938,0.0247344970703125,-0.0215301513671875,0.03076171875,-0.016937255859375,-0.0943603515625,-0.026824951171875,0.01163482666015625,0.039276123046875,-0.01502227783203125,0.00400543212890625,0.01165771484375,0.0169677734375,-0.033050537109375,-0.03863525390625,-0.09918212890625,0.04339599609375,-0.04229736328125,-0.005825042724609375,-0.030731201171875,-0.0289306640625,0.0068206787109375,0.0158538818359375,0.043853759765625,0.005344390869140625,-0.0231170654296875,0.0287017822265625,0.01654052734375,0.041961669921875,0.005596160888671875,0.0307769775390625,0.0102081298828125,0.03472900390625,0.0164337158203125,0.00878143310546875,0.061492919921875,-0.0435791015625,-0.08380126953125,0.0120391845703125,-0.020660400390625,-0.049346923828125,0.03900146484375,-0.03497314453125,0.00788116455078125,-0.053192138671875,0.0191497802734375,0.02227783203125,0.0333251953125,-0.00919342041015625,-0.104248046875,-0.049774169921875,-0.0645751953125,0.028350830078125,-0.03839111328125,-0.01654052734375,-0.00263214111328125,-0.007259368896484375,0.017120361328125,-0.00391387939453125,-0.0848388671875,0.006214141845703125,-0.0175018310546875,-0.051971435546875,-0.048828125,-0.08807373046875,0.00550079345703125,0.030303955078125,0.00983428955078125,-0.0084686279296875,0.01213836669921875,0.01532745361328125,0.003978729248046875,0.03924560546875,-0.02044677734375,0.0391845703125,-0.051513671875,-0.02532958984375,-0.0233612060546875,0.03936767578125,-0.0286712646484375,-0.037811279296875,0.05712890625,0.0418701171875,0.0294036865234375,0.006137847900390625,-0.0252532958984375,0.003887176513671875,-0.0084991455078125,0.003604888916015625,-0.01824951171875,0.04150390625,-0.08380126953125,-0.00284576416015625,0.01535797119140625,0.0009307861328125,-0.00376129150390625,0.01319122314453125,-0.057586669921875,0.0261383056640625,0.01068878173828125,0.0295867919921875,-0.00908660888671875,-0.0201873779296875,-0.01419830322265625,0.0159149169921875,0.06597900390625,-0.0137176513671875,0.014495849609375,0.01540374755859375,0.01279449462890625,-0.0262298583984375,-0.0182952880859375,0.04644775390625,-0.00547027587890625,-0.0021648406982421875,-0.03570556640625,-0.04400634765625,0.00457000732421875,0.01441192626953125,0.0352783203125,0.043182373046875,-0.0010404586791992188,-0.032196044921875,-0.0079345703125,-0.043792724609375,-0.016204833984375,0.05029296875,0.004558563232421875,-0.055328369140625,-0.0265960693359375,0.028350830078125,0.03192138671875,0.029693603515625,0.040435791015625,0.004878997802734375,-0.034515380859375,0.025146484375,-0.01326751708984375,-0.0140838623046875,-0.0008788108825683594,-0.0230865478515625,0.0088653564453125,0.00522613525390625,-0.0179290771484375,0.0303802490234375,0.032928466796875,-0.0184326171875,0.033935546875,-0.002887725830078125,-0.0090484619140625,0.0176849365234375]'
AS distance
FROM
segment
INNER JOIN
asset ON segment.asset_id = asset.id
WHERE
segment.category = 'visual'
AND asset.collection_id = 'ed0bbe8d-ce06-4b49-9e01-0d763439fa8a'
ORDER BY
distance
)
SELECT distances.id, distances.distance, distances.category
FROM distances
-- WHERE distances.distance < 0.99
LIMIT 100
; |
@rsomani95 Sorry, the earlier one was from memory. The corrected query is this: SELECT
segment.*
FROM
segment
INNER JOIN
asset ON segment.asset_id = asset.id
WHERE
embedding <=> :embedding < 0.99
AND segment.category = 'visual'
AND asset.collection_id = :collection_id
ORDER BY
embedding <=> :embedding
LIMIT 100; |
@mertalev thank you. This query seems to have similar results to my last comment, where it takes ~16.2s, and removing the |
I think it's because the limit is 100, but there are only 96 results that meet the threshold. It will end up checking every embedding in the table to find another 4 matches. Setting the limit to 96 should be much faster. I wonder if a way to stop early would help performance here. The ordering should be informative of when there's no longer a point to keep searching. |
Originally started this discussion in Discord (I think with @VoVAllen) but posting here to keep the discussion more focused:
https://discord.com/channels/974584200327991326/1220203706553860227/1245115744224542770
In spirit, what I'm trying to do is a standard vector query with an additional condition where the results are filtered by the cosine distance to the query vector.
Dataset Context
My dataset has 3 tables
collection
,asset
, andsegment
, setup like so:I'm using the
0.3.0-alpha2
docker container for all my tests.I then build some standard indexes on theese tables, followed by the HNSW index. Here is a fully reproducible script to reproduce the dataset:
https://gist.github.com/rsomani95/56194e33364208019a09846cb9bacabb#file-build_dataset_with_indexes-sql-L1-L113
Queries
Before running any queries, I've set
vectors.hnsw_ef_search=100
globally.1. Correct results, but no index usage
This query does a brute force search. However, the results are useful because they serve as a ground truth for checking future results for other queries. In my case, I got 96 results (but this will vary because the embeddings are randomised)
2. Adding
LIMIT
to use index, but incorrect resultsAs suggested in the thread here, I added a
LIMIT
under theORDER BY
clause to activate the index.Here's the query with the
LIMIT
added:This does activate the index, but now only returns 1 row as opposed to 96 earlier. I suspect this is because the query is formulated incorrectly; in its current formulation, we must be doing post filtering (I am assuming that the vbase implementation is correct and not the root cause of the issue here)
I tried another approach, adding
LIMIT
to the bottom of the query:and while this uses the index, the query runs a lot slower, and still produces incorrect results.
I'd love some guidance on how to formulate the SQL above correctly to use the index and get correct results.
The text was updated successfully, but these errors were encountered: