Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior in InvertedLists merge_from #2621

Closed
mdouze opened this issue Dec 14, 2022 · 0 comments
Closed

Inconsistent behavior in InvertedLists merge_from #2621

mdouze opened this issue Dec 14, 2022 · 0 comments
Assignees

Comments

@mdouze
Copy link
Contributor

mdouze commented Dec 14, 2022

The OnDiskInvertedLists merge_from takes several other InvertedLists as input

https://github.com/facebookresearch/faiss/blob/main/faiss/invlists/OnDiskInvertedLists.h#L104

while the default merge_from from InvertedLists does take a single one

https://github.com/facebookresearch/faiss/blob/main/faiss/invlists/InvertedLists.h#L112

this is inconsistent and the OnDisk one does not have an option to shift indices.

@mdouze mdouze self-assigned this Dec 14, 2022
kuarora added a commit to kuarora/faiss that referenced this issue Mar 28, 2024
…Lists

Summary:
**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Differential Revision: D55482518
kuarora added a commit to kuarora/faiss that referenced this issue Mar 28, 2024
…Lists (facebookresearch#3327)

Summary:

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Differential Revision: D55482518
kuarora added a commit to kuarora/faiss that referenced this issue Apr 3, 2024
…Lists (facebookresearch#3327)

Summary:

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518
kuarora added a commit to kuarora/faiss that referenced this issue Apr 3, 2024
…Lists (facebookresearch#3327)

Summary:

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518
kuarora added a commit to kuarora/faiss that referenced this issue Apr 3, 2024
…Lists (facebookresearch#3327)

Summary:

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address facebookresearch#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address facebookresearch#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518
facebook-github-bot pushed a commit that referenced this issue Apr 3, 2024
…Lists (#3327)

Summary:
Pull Request resolved: #3327

**Context**
1. [Issue 2621](#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518

fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
@kuarora kuarora closed this as completed Apr 3, 2024
abhinavdangeti pushed a commit to blevesearch/faiss that referenced this issue Jul 12, 2024
…Lists (facebookresearch#3327)

Summary:
Pull Request resolved: facebookresearch#3327

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address #2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518

fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
aalekhpatel07 pushed a commit to aalekhpatel07/faiss that referenced this issue Oct 17, 2024
…Lists (facebookresearch#3327)

Summary:
Pull Request resolved: facebookresearch#3327

**Context**
1. [Issue 2621](facebookresearch#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address #1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address Enet4#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518

fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants