Fix `normalize_embedding` using numba #2347

tstadel · 2022-03-22T17:56:25Z

Currently when passing a single vector to DocumentStore.normalize_embedding an error is thrown if numba is installed. normalize_embedding expects 1-dimensional or 2-dimentional vectors, selecting an appropriate code path for each. However numba needs to compile the whole function. Thus the 2-dim path is also being compiled with 1-dim args resulting in a TypingError as ascontigousarray() only exists for ndarrays and not for floats:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tstad/git/haystack/haystack/document_stores/weaviate.py", line 477, in write_documents
    self.normalize_embedding(vector)
  File "/home/tstad/miniconda3/envs/haystack-dev/lib/python3.7/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/tstad/miniconda3/envs/haystack-dev/lib/python3.7/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function ascontiguousarray>) found for signature:
 
 >>> ascontiguousarray(float32)

Impact:
normalized_embedding is used by FAISSDocumentStore, Milvus1DocumentStore, PineconeDocumentStore and WeaviateDocumentStore when using cosine similarity.

Proposed changes:

split "dynamic-typed" normalize_embedding into "static-typed" normalize_embedding_1D and normalize_embedding_2D, so numba can deal with it.

Status (please check what you already did):

First draft (up for discussions & feedback)
Final code

Fixes the bad one in #2346

julian-risch

LGTM! 👍 If there were more vectors than just two in emb in _normalize_embedding_2D() then we could even consider @njit(parallel=True, fastmath=True) to get an additional speed boost: https://numba.pydata.org/numba-doc/latest/user/performance-tips.html It's definitely not a bottleneck though. 😄 Maybe a nice Hacky Friday project to find performance bottlenecks and speed them up...

fix normalize_embedding using numba

3e06eea

tstadel requested review from bogdankostic and julian-risch March 22, 2022 17:56

Update Documentation & Code Style

d4d820a

julian-risch added the topic:document_store label Mar 22, 2022

fix too-many-public-methods pylint msg

d2cc5de

julian-risch approved these changes Mar 22, 2022

View reviewed changes

tstadel merged commit 851fe1c into master Mar 22, 2022

tstadel deleted the fix_normalize_embedding_numba branch March 22, 2022 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `normalize_embedding` using numba #2347

Fix `normalize_embedding` using numba #2347

tstadel commented Mar 22, 2022 •

edited

Loading

julian-risch left a comment

Fix normalize_embedding using numba #2347

Fix normalize_embedding using numba #2347

Conversation

tstadel commented Mar 22, 2022 • edited Loading

julian-risch left a comment

Choose a reason for hiding this comment

Fix `normalize_embedding` using numba #2347

Fix `normalize_embedding` using numba #2347

tstadel commented Mar 22, 2022 •

edited

Loading