Why is there a small difference between the vector of a single sentence and the vector of a batch of sentences? #2451

wencan · 2024-01-27T05:41:31Z

model = SentenceTransformer('LaBSE')
zh_vec = model.encode('可以给你认识的人打个电话。')
vecs = model.encode(['contacts who may know about a job?', '可以给你认识的人打个电话。'])

print(zh_vec.mean(), vecs1[1].mean())

got:
-0.011027637 -0.011027641

The text was updated successfully, but these errors were encountered:

ir2718 · 2024-01-27T16:21:55Z

@wencan

This is a known issue stemming from the underlying libraries (probably even torch)
#2312
huggingface/transformers#2401

Although, on second thought, when you instantiate a model this way, you're using it in training mode. It might be because of the way dropout behaves in training mode. Try adding model.eval() before calculating the embeddings.

tomaarsen · 2024-01-28T10:39:47Z

Hello!

@ir2718 is right, those other issues contain a bit more information. In short: there are slight differences, but significantly small that the embeddings are not notably affected.

As for the second possible explanation: model.encode moves the model to eval() mode automatically, so that shouldn't be the cause.

Tom Aarsen

tomaarsen closed this as completed Jan 31, 2024

tomaarsen mentioned this issue Dec 3, 2024

Minor Difference in Embeddings Between Batch and Single Sentence Encoding #3109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there a small difference between the vector of a single sentence and the vector of a batch of sentences? #2451

Why is there a small difference between the vector of a single sentence and the vector of a batch of sentences? #2451

wencan commented Jan 27, 2024

ir2718 commented Jan 27, 2024 •

edited

Loading

tomaarsen commented Jan 28, 2024

Why is there a small difference between the vector of a single sentence and the vector of a batch of sentences? #2451

Why is there a small difference between the vector of a single sentence and the vector of a batch of sentences? #2451

Comments

wencan commented Jan 27, 2024

ir2718 commented Jan 27, 2024 • edited Loading

tomaarsen commented Jan 28, 2024

ir2718 commented Jan 27, 2024 •

edited

Loading