You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently IDs for indexing nodes are generated with a hash based on the chunk and path, with the default_hasher from hash map via the hasher trait. The maximum amount of values is theoreticly 18446744073709551615. In storage, IDs are used to upsert data.
Overlap can happen with large amounts of data and would lead to incorrect results. There are better solutions. Additonally, the field is public and the method (calculate_hash()) does not set the field.
Describe the solution you'd like
Qdrant supports UUIDs now for a while. Idea is to use UUIDv3 (with md5) instead. Ideally, users can opt in to the new implementation, with a deprecation warning on the old implementation. For both solutions, id should just be retrieved by an id() function that lazilly retrieves or sets the id. Memory storage uses ordered ids for easier debugging, so some kind of overwrap might still be useful. All implementors of storage and node cache need to be updated.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently IDs for indexing nodes are generated with a hash based on the chunk and path, with the default_hasher from hash map via the hasher trait. The maximum amount of values is theoreticly 18446744073709551615. In storage, IDs are used to upsert data.
Overlap can happen with large amounts of data and would lead to incorrect results. There are better solutions. Additonally, the field is public and the method (calculate_hash()) does not set the field.
Describe the solution you'd like
Qdrant supports UUIDs now for a while. Idea is to use UUIDv3 (with md5) instead. Ideally, users can opt in to the new implementation, with a deprecation warning on the old implementation. For both solutions, id should just be retrieved by an id() function that lazilly retrieves or sets the id. Memory storage uses ordered ids for easier debugging, so some kind of overwrap might still be useful. All implementors of storage and node cache need to be updated.
The text was updated successfully, but these errors were encountered: