-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior in InvertedLists merge_from #2621
Labels
Comments
kuarora
added a commit
to kuarora/faiss
that referenced
this issue
Mar 28, 2024
…Lists Summary: **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Differential Revision: D55482518
kuarora
added a commit
to kuarora/faiss
that referenced
this issue
Mar 28, 2024
…Lists (facebookresearch#3327) Summary: **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Differential Revision: D55482518
kuarora
added a commit
to kuarora/faiss
that referenced
this issue
Apr 3, 2024
…Lists (facebookresearch#3327) Summary: **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518
kuarora
added a commit
to kuarora/faiss
that referenced
this issue
Apr 3, 2024
…Lists (facebookresearch#3327) Summary: **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518
kuarora
added a commit
to kuarora/faiss
that referenced
this issue
Apr 3, 2024
…Lists (facebookresearch#3327) Summary: **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518
facebook-github-bot
pushed a commit
that referenced
this issue
Apr 3, 2024
…Lists (#3327) Summary: Pull Request resolved: #3327 **Context** 1. [Issue 2621](#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
abhinavdangeti
pushed a commit
to blevesearch/faiss
that referenced
this issue
Jul 12, 2024
…Lists (facebookresearch#3327) Summary: Pull Request resolved: facebookresearch#3327 **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
aalekhpatel07
pushed a commit
to aalekhpatel07/faiss
that referenced
this issue
Oct 17, 2024
…Lists (facebookresearch#3327) Summary: Pull Request resolved: facebookresearch#3327 **Context** 1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address Enet4#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The OnDiskInvertedLists merge_from takes several other InvertedLists as input
https://github.com/facebookresearch/faiss/blob/main/faiss/invlists/OnDiskInvertedLists.h#L104
while the default merge_from from InvertedLists does take a single one
https://github.com/facebookresearch/faiss/blob/main/faiss/invlists/InvertedLists.h#L112
this is inconsistent and the OnDisk one does not have an option to shift indices.
The text was updated successfully, but these errors were encountered: