Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

英文检索准确度不高的原因 #1

Open
zzkrzxx opened this issue Sep 20, 2024 · 1 comment
Open

英文检索准确度不高的原因 #1

zzkrzxx opened this issue Sep 20, 2024 · 1 comment

Comments

@zzkrzxx
Copy link

zzkrzxx commented Sep 20, 2024

英文检索

选取的embedding模型为BAAI/bge-large-en-v1.5,参考examples/faiss_search.py进行英文检索,效果很差,请问一下可能的原因是什么呢?

@Tongjilibo
Copy link
Owner

如果不使用faiss,而直接用BertSimilariy试试正常吗

from bert4vector.core import BertSimilarity

model = BertSimilarity('/data/pretrain_ckpt/embedding/BAAI--bge-base-en-v1.5')

model.add_corpus(['hello', 'nice to meet you'])
model.add_corpus(['thank you very much', 'i love you'])
model.summary()
print(model.search('hi', topk=2))

以下是输出

+------------------------------------------------+
| name    | size | few_samples                   |
+------------------------------------------------+
| default | 4    | ['hello', 'nice to meet you'] |
+------------------------------------------------+
{'hi': [{'text': 'hello', 'corpus_id': 0, 'score': 0.954134464263916}, {'text': 'nice to meet you', 'corpus_id': 1, 'score': 0.769316554069519}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants