Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings #422

Merged
merged 2 commits into from
Oct 5, 2020
Merged

FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings #422

merged 2 commits into from
Oct 5, 2020

Conversation

lalitpagaria
Copy link
Contributor

  • Allow multiple write calls to existing FAISS index.
  • Fixing issue when update_embeddings always create new FAISS index instead of clearing existing one. New index creation may not free existing used memory and cause memory leak.

@lalitpagaria lalitpagaria changed the title FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings WIP FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings Sep 23, 2020
@tholor
Copy link
Member

tholor commented Sep 23, 2020

Looking good. Thanks for adding this @lalitpagaria !

Let me know when it's ready for review

@lalitpagaria
Copy link
Contributor Author

@tholor technically PR is ready but there is one short coming, please read below -

  1. Call write multiple times for Docs without embedding -> There will no logical issue
  2. Call write multiple times for Docs with embeddings -> As FAISS store currently convert embedding (L2 to IP) to allow Inner Product search specially for HNSWx index, for this we are doing conversion and it cause issue.

Currently we follow this link to enable L2 metrics to allow IP search. It method required phi to be computed before and used this to add another dimension to the embedding. So if we call write multiple times, each time phi values will be different hence wrong computation of extra dimension to support IP search.

Thats why in my PR #385 , I abstracted out this functionality to separate class FaissIndexStore. So it will up to user to choose which metrics to use. As FAISS support few IP indexes as well like IVF, IndexFlatIP etc.

Alternative is to expose _get_phi and _get_hnsw_vectors function via utils class and caller (retriever in our case) can use it to pre compute and add extra dimension before writing them to the FAISS store.

BTW update_embeddings do not suffer from this issue as it get's all embeddings together.

Please let me know what do you think.

@tholor tholor self-assigned this Sep 23, 2020
@lalitpagaria lalitpagaria changed the title WIP FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings FAISS Store: allow multiple write calls and fix potential memory leak in update_embeddings Sep 23, 2020
@lalitpagaria
Copy link
Contributor Author

lalitpagaria commented Sep 23, 2020

@tholor It seems I had wrong assumption I tested this scenario on notebook and verified that multiple write works perfectly. Refer this notebook https://colab.research.google.com/drive/11e3zdFP6kg2xhJ8LvplVt08B8Nf2iEGn?usp=sharing

So yes my existing PR will ready for review and it not have issue which I raised in my previous comment.

In case if you not able to open notebook try this -

!pip install faiss-gpu

import numpy as np
import faiss


# see http://ulrichpaquet.com/Papers/SpeedUp.pdf theorem 5

def get_phi(xb): 
    return (xb ** 2).sum(1).max()

def augment_xb(xb, phi=None): 
    norms = (xb ** 2).sum(1)
    if phi is None: 
        phi = norms.max()
    extracol = np.sqrt(phi - norms)
    return np.hstack((xb, extracol.reshape(-1, 1)))

def augment_xq(xq): 
    extracol = np.zeros(len(xq), dtype='float32')
    return np.hstack((xq, extracol.reshape(-1, 1)))


nq = 100
nb = 1000
d = 32

# Search vectors
xq = faiss.randn((nq, d))

# Embeddings
xb1 = faiss.randn((nq, d))
xb2 = faiss.randn((nq, d))
# concatenate for single write
xb = np.concatenate((xb1, xb2), axis=0)


# Secenario_1: Call add two times
# reference IP search via IP Index
k = 10
index = faiss.IndexFlatIP(d)
# Append two times
index.add(xb1)
index.add(xb2)
Dref, Iref = index.search(xq, k)


# reference IP search via L2 Index
k = 10
index = faiss.IndexFlatL2(d + 1)

# Append two times
index.add(augment_xb(xb1))
index.add(augment_xb(xb2))
D, I = index.search(augment_xq(xq), k)

# Check if after result
print("Secenario_1:", np.all(I == Iref))




# Secenario_2: Call add two times for FlatL2 and single times for FlatIP
# reference IP search via IP Index
k = 10
index = faiss.IndexFlatIP(d)
# Append one time concated array
index.add(xb)
Dref, Iref = index.search(xq, k)


# reference IP search via L2 Index
k = 10
index = faiss.IndexFlatL2(d + 1)

# Append two times
index.add(augment_xb(xb1))
index.add(augment_xb(xb2))
D, I = index.search(augment_xq(xq), k)

# Check if after result
print("Secenario_2:", np.all(I == Iref))



# Secenario_2: Call add one time for FlatL2 and two times for FlatIP
# reference IP search via IP Index
k = 10
index = faiss.IndexFlatIP(d)
# Append two times
index.add(xb1)
index.add(xb2)
Dref, Iref = index.search(xq, k)


# reference IP search via L2 Index
k = 10
index = faiss.IndexFlatL2(d + 1)

# Append one time concated array
index.add(augment_xb(xb))
D, I = index.search(augment_xq(xq), k)

# Check if after result
print("Secenario_3:", np.all(I == Iref))




# Secenario_2: Call add one time for FlatL2 and FlatIP
# reference IP search via IP Index
k = 10
index = faiss.IndexFlatIP(d)
# Append one time concated array
index.add(xb)
Dref, Iref = index.search(xq, k)


# reference IP search via L2 Index
k = 10
index = faiss.IndexFlatL2(d + 1)

# Append one time concated array
index.add(augment_xb(xb))
D, I = index.search(augment_xq(xq), k)

# Check if after result
print("Secenario_4:", np.all(I == Iref))

@tholor
Copy link
Member

tholor commented Sep 24, 2020

Thanks. The code example is very helpful. I will review it in the next days. I want to investigate the transformation via Phi a bit deeper as a bug here will be very hard to trace later and can impact performance a lot.

@lalitpagaria
Copy link
Contributor Author

@tholor Hope you may find time to review this PR.

I just want to add one more point that this PR also have fix for potential memory leak issue in update_embeddings function, if called multiple times. As update_embeddings function, always create new faiss_index, hence old memory will not released if variable faiss_index has been overwritten. Refer facebookresearch/faiss#872 and facebookresearch/faiss#257

@tholor
Copy link
Member

tholor commented Sep 30, 2020

I had a look at your test script. Very helpful scenarios you have in there!

However, I found one major issue:
xq, xb1 and xb2 are all exactly the same arrays in this test. This is because you init them with faiss.randn() which has a default seed argument.

When changing the init to:

# Search vectors
xq = faiss.randn((nq, d), seed=11)

# Embeddings
xb1 = faiss.randn((nq, d), seed=12)
xb2 = faiss.randn((nq, d), seed=13)
# concatenate for single write
xb = np.concatenate((xb1, xb2), axis=0)
print(f"Same array: {np.all(xb1 == xb2)}")

The output is

Same array: False
Secenario_1: False
Secenario_2: False
Secenario_3: True
Secenario_4: True 

So it's only working in the scenarios with one write to the L2 index.
I can dig deeper here into the phi augmentation and try to find a workaround.

@lalitpagaria
Copy link
Contributor Author

@tholor Not sure why test failed which is unrelated to PR. Locally no test failed on my system. I just rebased with latest master. Is it possible to re-run the build?

- Fixing issue when update_embeddings always create new FAISS index instead of clearing existing one. New index creation may not free existing used memory and cause memory leak.
@lalitpagaria
Copy link
Contributor Author

lalitpagaria commented Oct 2, 2020

@tholor Not sure if this is valid scenario? Pardon my less knowledge about embedding.

I assume a model (same model) will produce same embedding for a text, if no changes in any parameters except called on different time. Not sure if model will update it's seed with the time. Also I presume any change in model's seed will require update_embeddings operation again for all the documents.

Here I have question, which is relevant to use case I am trying to solving: Continues fine training.
Is it require to update embedding for all documents on every fine tuning?

Base Model  ------> Updated Model -----------> User  --------> Feedback -------> Fine tuning -------> Updated Model ------>

@tholor
Copy link
Member

tholor commented Oct 2, 2020

I assume a model (same model) will produce same embedding for a text, if no changes in any parameters except called on different time. Not sure if model will update it's seed with the time. Also I presume any change in model's seed will require update_embeddings operation again for all the documents.

One model will always produce the same embedding for one text. There is no randomness (and therefore no seed) in these models at inference time. Not sure why you are asking this, but if it's related to the seed in the above test scenarios, this is something different: if we don't vary the seed there we basically simulate indexing the same batch of documents twice. However, we usually have different docs in the two batches (xb1 and xb2) and therefore different embeddings.

Is it require to update embedding for all documents on every fine tuning?

Yes, every time we have a new model (e.g. after fine-tuning), we'll need to update all embeddings.

@tholor Not sure why test failed which is unrelated to PR. Locally no test failed on my system. I just rebased with latest master. Is it possible to re-run the build?

Yes, you can now re-run github actions. There is a button on the top right to do this when you are on the page for the failed test

@lalitpagaria
Copy link
Contributor Author

Thanks for explanation. I created this notebook to test end to end pipeline to verify phi impact see if this help.

@tholor
Copy link
Member

tholor commented Oct 5, 2020

After some further investigation, I think we can actually get rid of the phi normalization trick.
The HMSW Index in FAISS now also supports the inner product metric directly. While I had problems with initializing the index via the factory the direct init seems to work well (facebookresearch/faiss#1434).
When comparing both approaches, it seemed that L2+ Phi normalization is a bit faster (0.009 vs. 0.006 sec/query), but accuracy was actually a bit worse on some random dummy vectors (~ 0.85 vs 0.54) .

Therefore, I think we can merge this PR now and switch from Phi normalization to the native faiss implementation in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants