Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to merge results of mteb-fr with those of mteb-original #79

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

imenelydiaker
Copy link

@imenelydiaker imenelydiaker commented Feb 19, 2024

  • Copy model folder with all tasks from mteb-fr when model ha snot been evaluated in mteb
  • Copy task file from mteb-fr to existing model folder in mteb
  • Add "fr" results from mteb-fr to existing model evaluation on a task in mteb.
  • [NOT] Replace existing "fr" results. (will not do because revision number is different)
  • Handle model name case (e.g. LASER2 in mteb == laser2 in mteb-fr)
  • Better handling of mteb/results repo (bash script to install git lfs and clone mteb/results repository from HF)

@imenelydiaker imenelydiaker linked an issue Feb 19, 2024 that may be closed by this pull request
@@ -34,7 +34,8 @@ def model_name(self):
return self._model_name

def encode_documents(self, input: Documents) -> Embeddings:
return self.model(input).numpy().tolist()
truncated_documents = [" ".join(x.split(' ')[:self.max_token_length]) for x in input]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why splitting on words instead of tokens ?

  • AbstractEmbeddingFunction already has a truncation method implemented, which is called before the encode_documents method so no need to implement truncation here :)

Copy link
Author

@imenelydiaker imenelydiaker Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wissam-sib added this because we couldn't run MUSE-large. But it was on another PR and it was already merged into main I think (I rebased this branch). Should I remove this ?

'devtest'
]

def split_model_name(model_name: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more elegant to use os.path.basename(mypath) to get the last folder

@MathieuCiancone MathieuCiancone self-requested a review February 26, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Put Results on HF
2 participants