-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to merge results of mteb-fr with those of mteb-original #79
base: main
Are you sure you want to change the base?
Conversation
@@ -34,7 +34,8 @@ def model_name(self): | |||
return self._model_name | |||
|
|||
def encode_documents(self, input: Documents) -> Embeddings: | |||
return self.model(input).numpy().tolist() | |||
truncated_documents = [" ".join(x.split(' ')[:self.max_token_length]) for x in input] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why splitting on words instead of tokens ?
AbstractEmbeddingFunction
already has a truncation method implemented, which is called before theencode_documents
method so no need to implement truncation here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @wissam-sib added this because we couldn't run MUSE-large. But it was on another PR and it was already merged into main
I think (I rebased this branch). Should I remove this ?
'devtest' | ||
] | ||
|
||
def split_model_name(model_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more elegant to use os.path.basename(mypath) to get the last folder
mteb/results
repo (bash script to installgit lfs
and clonemteb/results
repository from HF)