Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

Open
AnneYang720 opened this issue Jul 13, 2022 · 3 comments
Open
Labels
bug Bug

Comments

@AnneYang720
Copy link
Contributor

Version:

$ pip3 show redis
Name: redis
Version: 4.3.4

Platform:
Python 3.9.2 on Debian GNU/Linux 11

Description: The bytes is converted to string in the vector search results and there is an error in this conversion. The bytes including b'\x80' is converted to a wrong string.

Example Code

from redis import Redis
from redis.commands.search.field import VectorField
from redis.commands.search.query import Query

r = Redis(host='localhost',port=6379)
schema = (VectorField("v", "HNSW", {"TYPE": "FLOAT32", "DIM": 1, "DISTANCE_METRIC": "L2"}),)
r.ft().create_index(schema)

r.hset(f'{1}',mapping={'v':b'\x80\x00\x00\x00'})

q = Query("*=>[KNN 1 @v $vec AS vector_score]").dialect(2)
results = r.ft().search(q, query_params={"vec": b'\x80\x00\x00\x00'}).docs

for m in results:
    print(m.v)
    print('match emb =', bytes(m.v,'utf-8'))

The original bytes b'\x80\x00\x00\x00' is converted to string '\x00\x00\x00'.

Reason

# /redis/commands/search/result.py
dict(
    dict(
        zip(
            map(to_string, res[i + fields_offset][::2]),
            map(to_string, res[i + fields_offset][1::2]),
        )
    )
)

# /redis/commands/search/_util.py
def to_string(s):
    if isinstance(s, str):
        return s
    elif isinstance(s, bytes):
        return s.decode("utf-8", "ignore") # here! 
    else:
        return s
@chayim chayim added the bug Bug label Jul 13, 2022
@colibrisson
Copy link

@AnneYang720 did you find a workaround?

@kamyabzad
Copy link
Contributor

What about using "backslashreplace" mode instead of "ignore"?

@gaoyichuan
Copy link

@kamyabzad I think in this case, we should get the original bytes as result, rather than try any kind of unicode decoding? Since user may need to convert this back to a numpy array or float array.

I don't see a good solution or workaround under current search result parsing codebase though, maybe we need some ideas from the maintainers.

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
gerzse pushed a commit that referenced this issue Jul 10, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
agnesnatasya pushed a commit to agnesnatasya/redis-py that referenced this issue Jul 20, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants