Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved ID hashing on indexing nodes #272

Closed
timonv opened this issue Sep 5, 2024 · 0 comments · Fixed by #277
Closed

Improved ID hashing on indexing nodes #272

timonv opened this issue Sep 5, 2024 · 0 comments · Fixed by #277
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@timonv
Copy link
Member

timonv commented Sep 5, 2024

Is your feature request related to a problem? Please describe.
Currently IDs for indexing nodes are generated with a hash based on the chunk and path, with the default_hasher from hash map via the hasher trait. The maximum amount of values is theoreticly 18446744073709551615. In storage, IDs are used to upsert data.

Overlap can happen with large amounts of data and would lead to incorrect results. There are better solutions. Additonally, the field is public and the method (calculate_hash()) does not set the field.

Describe the solution you'd like
Qdrant supports UUIDs now for a while. Idea is to use UUIDv3 (with md5) instead. Ideally, users can opt in to the new implementation, with a deprecation warning on the old implementation. For both solutions, id should just be retrieved by an id() function that lazilly retrieves or sets the id. Memory storage uses ordered ids for easier debugging, so some kind of overwrap might still be useful. All implementors of storage and node cache need to be updated.

@timonv timonv added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 5, 2024
timonv added a commit that referenced this issue Sep 6, 2024
Fixing id generation properly as per #272, will be merged in together.

- **Clippy**
- **fix(qdrant)!: Default hasher changed in Rust 1.81**
@timonv timonv closed this as completed in 57fe4aa Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant