-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding failed with large-nli-stsb model but work for base model #117
Comments
Hi @predoctech, this might be due to the indexing call to Elasticsearch timing out. The timeout is increased from default 10 seconds to 30 seconds in #119. Can you pull the latest master and try again to see if it resolves the issue? |
Hi @tanaysoni, when I tested the changes today the timeout error still persists. Is a 30 second timeframe still not enough, or there is something else going on with indexing call? |
Hi @predoctech, how large are the documents(number of characters or bytes) you're indexing? Can you try if the indexing works for a single document instead of a batch? |
@tanaysoni , my document are FAQ pairs of 970 rows all in a single document. That isn't too large a document isn't it, and the base models worked just the large model timed out. |
Hi @predoctech, yes, you're right it might be possible that 30 seconds is not enough. Can you try again with a very large timeout(eg, 600) here? |
Hi @tanaysoni , I tried your suggestion and now the error is not with timeout but with the embedding itself. Specifically I noticed the following error message: |
The larger models like |
Thanks @tholor . That's right, setting 1024 as dimensionality for large models does work. |
Fixed in #130 |
Trying to make use of the pre-trained model in sentence-transformer. It works for base model (tried bert-base and roberta-base) but failed in large models (for roberta and I think bert as well) with the following errors at time of embedding corpus:
Seems like size of the embedding is too large to write as Elasticsearch index?
The text was updated successfully, but these errors were encountered: