Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support local disk caching and Mmap data #21866

Open
1 task done
xiaofan-luan opened this issue Jan 30, 2023 · 10 comments
Open
1 task done

[Feature]: Support local disk caching and Mmap data #21866

xiaofan-luan opened this issue Jan 30, 2023 · 10 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Now Milvus fully loaded vector index into memory to support query/search, but it required too much memory and could cause OOM if memory is not enough.

To improve, we could define load as put data into local disk, and mmap the data into memory. All memory in data will be managed by operating system page cache and user can loaded larger dataset into milvus without fully in memory (If memory is enough, I would expect similar performance compared to current in memory version).

There are few things we need to investigate before put this on our schedule:

  1. how to mmap scalar data, including delta files(maybe all in memory for now)? what is the performance with 1%, 10%, 50%, 90%, 100% memory/disk percentage?
  2. how to mmap vector index, what about the performance?
  3. how about scalar index? such as marisa trie? inverted index should be ok to mmap directly since lucene is doing so.
  4. how much longer does it take to load into disk compare to directly into memory?

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@xiaofan-luan xiaofan-luan added the kind/feature Issues related to feature request from users label Jan 30, 2023
@xiaofan-luan xiaofan-luan self-assigned this Jan 30, 2023
@yah01
Copy link
Member

yah01 commented Jan 31, 2023

/assign

@yah01
Copy link
Member

yah01 commented Mar 6, 2023

@yah01
Copy link
Member

yah01 commented Mar 29, 2023

The IVF_FLAT doesn't support mmap for now, due to it stores the original data separately. Will work on it after the C++ segment loader ready

@yah01
Copy link
Member

yah01 commented Oct 20, 2023

I'm going to support IVF index with mmap as Knowhere has changed IVF impl to contain data part

@xiaofan-luan
Copy link
Collaborator Author

I'm going to support IVF index with mmap as Knowhere has changed IVF impl to contain data part

@faiss already support mmap
are we gonna to simply enable it?
BTW, is there a way to enable mmap only on some field?

@yah01
Copy link
Member

yah01 commented Oct 20, 2023

I'm going to support IVF index with mmap as Knowhere has changed IVF impl to contain data part

@faiss already support mmap are we gonna to simply enable it? BTW, is there a way to enable mmap only on some field?

Need to dive into the faiss impl and file format

@yah01
Copy link
Member

yah01 commented Oct 20, 2023

@cydrain would this index contain the vector data if it was created in old version?

@xiaofan-luan
Copy link
Collaborator Author

I'm going to support IVF index with mmap as Knowhere has changed IVF impl to contain data part

@faiss already support mmap are we gonna to simply enable it? BTW, is there a way to enable mmap only on some field?

Need to dive into the faiss impl and file format

We actually have a user who want to run mmap on FLAT index

@patelprateek
Copy link

@yah01 : does faiss also support adding metadata along with embeddings or is this only done by knowhere ?

@xiaofan-luan
Copy link
Collaborator Author

@yah01 : does faiss also support adding metadata along with embeddings or is this only done by knowhere ?

faiss does‘t have idea of metadata

xiaofan-luan pushed a commit that referenced this issue Dec 21, 2023
support enable/disable mmap for index, the user could alter the index's
mode by `AlterIndex` method
related: #21866

---------

Signed-off-by: yah01 <[email protected]>
Signed-off-by: yah01 <[email protected]>
sre-ci-robot pushed a commit that referenced this issue Jan 11, 2024
this supports mmap for marisa trie index
related #21866

Signed-off-by: yah01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

3 participants