byte vector is incorrectly decoded as utf-8 string in ft result class #2275

AnneYang720 · 2022-07-13T04:15:26Z

Version:

$ pip3 show redis
Name: redis
Version: 4.3.4

Platform:
Python 3.9.2 on Debian GNU/Linux 11

Description: The bytes is converted to string in the vector search results and there is an error in this conversion. The bytes including b'\x80' is converted to a wrong string.

Example Code

from redis import Redis
from redis.commands.search.field import VectorField
from redis.commands.search.query import Query

r = Redis(host='localhost',port=6379)
schema = (VectorField("v", "HNSW", {"TYPE": "FLOAT32", "DIM": 1, "DISTANCE_METRIC": "L2"}),)
r.ft().create_index(schema)

r.hset(f'{1}',mapping={'v':b'\x80\x00\x00\x00'})

q = Query("*=>[KNN 1 @v $vec AS vector_score]").dialect(2)
results = r.ft().search(q, query_params={"vec": b'\x80\x00\x00\x00'}).docs

for m in results:
    print(m.v)
    print('match emb =', bytes(m.v,'utf-8'))

The original bytes b'\x80\x00\x00\x00' is converted to string '\x00\x00\x00'.

Reason

# /redis/commands/search/result.py
dict(
    dict(
        zip(
            map(to_string, res[i + fields_offset][::2]),
            map(to_string, res[i + fields_offset][1::2]),
        )
    )
)

# /redis/commands/search/_util.py
def to_string(s):
    if isinstance(s, str):
        return s
    elif isinstance(s, bytes):
        return s.decode("utf-8", "ignore") # here! 
    else:
        return s

The text was updated successfully, but these errors were encountered:

colibrisson · 2023-07-13T13:43:49Z

@AnneYang720 did you find a workaround?

kamyabzad · 2024-02-26T07:03:10Z

What about using "backslashreplace" mode instead of "ignore"?

gaoyichuan · 2024-02-28T11:41:27Z

@kamyabzad I think in this case, we should get the original bytes as result, rather than try any kind of unicode decoding? Since user may need to convert this back to a numpy array or float array.

I don't see a good solution or workaround under current search result parsing codebase though, maybe we need some ideas from the maintainers.

Fixes: redis#2772, redis#2275

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

Make it possible to configure at field level how search results are decoded. Fixes: redis#2772, redis#2275

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

Make it possible to configure at field level how search results are decoded. Fixes: redis#2772, redis#2275

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

chayim added the bug Bug label Jul 13, 2022

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024

Decode search results at field level

fe30d4f

Fixes: redis#2772, redis#2275

uglide mentioned this issue Jul 9, 2024

Decode search results at field level #3309

Merged

6 tasks

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024

Decode search results at field level

1403c95

Fixes: redis#2772, redis#2275

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024

Decode search results at field level

7ca2f29

Fixes: redis#2772, redis#2275

gerzse pushed a commit that referenced this issue Jul 10, 2024

Decode search results at field level (#3309)

1bb8eab

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024

Decode search results at field level (redis#3309)

8c79060

Make it possible to configure at field level how search results are decoded. Fixes: redis#2772, redis#2275

gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024

Decode search results at field level (redis#3309)

62fc873

Make it possible to configure at field level how search results are decoded. Fixes: redis#2772, redis#2275

gerzse pushed a commit that referenced this issue Jul 11, 2024

Decode search results at field level (#3309)

6a2a636

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

agnesnatasya pushed a commit to agnesnatasya/redis-py that referenced this issue Jul 20, 2024

Decode search results at field level (redis#3309)

d02d78a

Make it possible to configure at field level how search results are decoded. Fixes: redis#2772, redis#2275

bsbodden mentioned this issue Sep 13, 2024

Decoding vector fails redis/redis-vl-python#219

Closed

vladvildanov pushed a commit that referenced this issue Sep 27, 2024

Decode search results at field level (#3309)

68f9450

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

vladvildanov pushed a commit that referenced this issue Sep 27, 2024

Decode search results at field level (#3309)

f6351ee

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

vladvildanov pushed a commit that referenced this issue Sep 27, 2024

Decode search results at field level (#3309)

2717a0e

Make it possible to configure at field level how search results are decoded. Fixes: #2772, #2275

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

AnneYang720 commented Jul 13, 2022

colibrisson commented Jul 13, 2023

kamyabzad commented Feb 26, 2024

gaoyichuan commented Feb 28, 2024

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

Comments

AnneYang720 commented Jul 13, 2022

colibrisson commented Jul 13, 2023

kamyabzad commented Feb 26, 2024

gaoyichuan commented Feb 28, 2024