Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Load failed with error "failed to Deserialize index: faiss inner error.." when enable vector raw data mmap or vector index mmap for IVF series index with COSINE metric type #36052

Closed
1 task done
binbinlv opened this issue Sep 6, 2024 · 6 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@binbinlv
Copy link
Contributor

binbinlv commented Sep 6, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240904-a32f337e and 2.4-20240904-2c1fa504
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0rc74
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Load failed with error "failed to Deserialize index: faiss inner error.." when enable vector raw data mmap or vector index mmap for IVF series index with COSINE metric type

code=2001, message=show collection failed: At LoadSegment:  => failed to Deserialize index: faiss inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:609

Expected Behavior

Load successfully

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import json
import random

connections.connect()

dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
double_field = FieldSchema(name="nullableFid", dtype=DataType.DOUBLE, is_primary=False)
int32_field = FieldSchema(name="int32", dtype=DataType.INT64)
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=1000)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim, mmap_enabled=True)
schema = CollectionSchema(fields=[int64_field, double_field, int32_field, string_field, float_vector])
utility.drop_collection("test")
collection = Collection("test", schema=schema)
res = collection.schema
print(res)

index = {"index_type": "IVF_FLAT", "metric_type": "COSINE", "params": {"nlist":100}}

nb = 1000000
slice = 10000
for i in range(int(nb/slice)):
     vectors = [[random.random() for _ in range(dim)] for _ in range(slice)]
     data = [[j for j in range(i*slice,(i+1)*slice)], [i*1.0 for i in range(slice)],[i for i in range(slice)], ["1" for _ in range(slice)], vectors]
     #data = [[i for i in range(nb)], [i*1.0 for i in range(nb)],[None for _ in range(nb)], [], vectors]
     #  equals to data1 = [[1,2], [None,None],[None,None], vectors]
     #data1 = [[i for i in range(nb)], [],[],[], vectors]
     collection.insert(data=data)
     print("inserted %d %d" %(i, slice))
#collection.upsert(data=data)
#collection.upsert(data=data1)
collection.flush()
res = collection.num_entities
print(res)
collection.create_index("float_vector", index, index_name="index_name_1")
collection.load()
print("loaded")

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22CBL%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22mmap-test-24-eptov.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

@binbinlv binbinlv added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 6, 2024
@binbinlv binbinlv added this to the 2.4.11 milestone Sep 6, 2024
@yanliang567
Copy link
Contributor

/assign @cqy123456
/unassign

@sre-ci-robot sre-ci-robot assigned cqy123456 and unassigned yanliang567 Sep 6, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 6, 2024
@binbinlv
Copy link
Contributor Author

binbinlv commented Sep 6, 2024

if IVF_FLAT + L2: load is successful

@yanliang567 yanliang567 modified the milestones: 2.4.11, 2.4.12 Sep 18, 2024
@yanliang567 yanliang567 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 20, 2024
@binbinlv
Copy link
Contributor Author

/assign

@binbinlv
Copy link
Contributor Author

working on verification.

@binbinlv
Copy link
Contributor Author

Verified and fixed:
milvus: v2.4.12
pymilvus: 2.4.7

@amaypatil02
Copy link

Since it was identified as a bug, is it fair to see that mmap supports iv-flat vector index on 2.4.12 and above ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants