Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytes Vectors from r.hget vs Bytes string returned from r.ft().search(query="*") #2772

Closed
ghost opened this issue May 22, 2023 · 3 comments · Fixed by #3309
Closed

Bytes Vectors from r.hget vs Bytes string returned from r.ft().search(query="*") #2772

ghost opened this issue May 22, 2023 · 3 comments · Fixed by #3309

Comments

@ghost
Copy link

ghost commented May 22, 2023

Redis Python Lib Version: version 4.5.5

Redis Stack Version: version 7.0.0

Platform: Python 3.10.6 and Ubuntu 22.04

Description: Description of your issue, stack traces from errors and code that reproduces the issue

After storing a bunch of numpy vectors in bytes in HSETs and creating an index (FT), I am trying to retrieve all of the embeddings using FT.SEARCH with "*" query, however, the vector is returned in a string that differs from the bytes format I get when using HGET. I'll add a few line of code as an example:

import redis
import os
import numpy as np

_redis_match_config = os.getenv("NQAI_REDIS_MATCH_CONFIG")
fake_vec = np.array([0.1,0.2,0.3,0.4])
r = redis.Redis(**_redis_match_config)
expert_hash = {"person_id":1, "vector_emb" : fake_vec.astype(np.float32).tobytes()}
r.hset("person:1", mapping=expert_hash)
index_name = "person"
person_prefix = f"{index_name}:"
vector_search_attributes = {"TYPE": "FLOAT32", "DIM": 4, "DISTANCE_METRIC": "COSINE"}
schema = (
                    TagField("person_id"),
                    VectorField("embeddings_bio", algorithm="HNSW", attributes=vector_search_attributes)
                    )

r.ft(index_name).create_index(fields=schema, definition=IndexDefinition(prefix=[person_prefix], index_type=IndexType.HASH))

byets_person_1 = r.hget("person:1", "vector_emb")
print(byets_person_1)
print(np.frombuffer(byets_person_1, dtype=np.float32))
> output : b"\xcd\xcc\xcc=\xcd\xccL>\x9a\x99\x99>\xcd\xcc\xcc>"
> output : array([0.1, 0.2, 0.3, 0.4], dtype=float32)

However, when I do:

query = (
                    Query("*")
                    .return_fields("id", "vector_emb",)
                )
all_of = r.ft(index_name).search(query=query, query_params={}).docs
print(all_of[0]["vector_emb"])
print(all_of[0]["vector_emb"].encode("utf-32"))
print(np.frombuffer(bytes(all_of[0]["vector_emb"].encode("utf-32")), dtype=np.float32))
> output : "=L>>>"
> output: b'\xff\xfe\x00\x00=\x00\x00\x00L\x00\x00\x00>\x00\x00\x00>\x00\x00\x00>\x00\x00\x00'
> output : array([9.1475e-41 8.5479e-44 1.0650e-43 8.6881e-44 8.6881e-44 8.6881e-44], dtype=float32)

I have tried different combinations of .encode("utf-xx") and dtype=np.floatxx to no avail! Please help. Thanks.

@trish11953
Copy link

Yes, had the same bug on my end when I tried retrieving the list of floats vector from
results = self.client.ft(self.index).search(query_expression, query_params).docs
It came back with a weird encoding which could not be decoded back to the original vector.

@AdamAdLightning
Copy link

I'm having a similar issue. I need to read these vectors back and do some processing on them, but I'm unable to decode them when I read them from a hash using hget.

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
@gerzse gerzse closed this as completed in 1bb8eab Jul 10, 2024
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
agnesnatasya pushed a commit to agnesnatasya/redis-py that referenced this issue Jul 20, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
@marisancans
Copy link

Is this fixed? Im having the same problem. I just want my vector back :/

vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants