Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor Difference in Embeddings Between Batch and Single Sentence Encoding #3109

Closed
XYZliang opened this issue Dec 3, 2024 · 2 comments
Closed

Comments

@XYZliang
Copy link

XYZliang commented Dec 3, 2024

Hi,

I noticed a minor difference in the embeddings when using batch encoding versus encoding sentences one by one and then merging the results.

Here are the details:
1. Model: SentenceTransformer('jina-embeddings-v3')
2. Batch encoding: embeddings = model.encode(sentence_list)
3. Single encoding: Using a loop, embedding = model.encode(single_sentence) for each sentence and then combining them.

When comparing the embeddings of the first sentence from both methods, the cosine similarity was 0.99996984, showing a very slight difference.

Could you please clarify if this is expected behavior? If so, is it due to internal optimizations, precision differences, or something else?

Thanks for your great work on this library!

@tomaarsen
Copy link
Collaborator

Hello!

There's a few others who found similar things (#2312, #2451). You can read through those as well, but in short: we believe this originates in torch or even lower-level than that, rather than sentence-transformers or transformers. I would not be surprised if this is based on hardware optimizations or approximations automatically done for efficiency.

I could imagine that lower precision (e.g. model.bfloat16() or model.half()) could make this a bit worse even, but I don't think that this will ever notably impact downstream performance like retrieval, classification, clustering, etc.

See also related issues for just torch:

  • Tom Aarsen

@XYZliang
Copy link
Author

XYZliang commented Dec 4, 2024

Hello! 你好

There's a few others who found similar things (#2312, #2451). You can read through those as well, but in short: we believe this originates in torch or even lower-level than that, rather than sentence-transformers or transformers. I would not be surprised if this is based on hardware optimizations or approximations automatically done for efficiency.还有一些人也发现了类似的问题( #2312#2451 )。如果这是基于硬件优化或为提高效率而自动进行的近似,我也不会感到惊讶。

I could imagine that lower precision (e.g. model.bfloat16() or model.half()) could make this a bit worse even, but I don't think that this will ever notably impact downstream performance like retrieval, classification, clustering, etc.我可以想象,较低的精度(如 model.bfloat16()model.half() )甚至会使情况变得更糟,但我认为这不会对检索、分类、聚类等下游性能产生显著影响。

See also related issues for just torch:另请参阅相关问题,只需 torch

Hi Tom Aarsen,

Thank you so much for the detailed explanation! As this is my first time working with models like SentenceTransformers, your insights really helped me understand the subtle differences and the reasons behind them. It’s great to know that these minor variations are expected and generally won’t impact downstream tasks.

I truly appreciate the fantastic work you and the team have done on this library. It’s been a pleasure exploring its features, and I’m looking forward to learning more and contributing in the future!

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants