Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added model results to repo and updated CLI to create consistent folder structure. #254

Merged
merged 4 commits into from
Mar 19, 2024

Conversation

KennethEnevoldsen
Copy link
Contributor

This is a suggested change. The goal is to make it easier to add model evaluations along with a dataset (as a kind of test).

This includes a few changes:

  • the results folder is now added to contain results using the form results/{model_name}/{task_results}
  • Changes CLI to follow this format
    • Additionally also adds a model_meta.json with the model_name, time of run and versions during run.

Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - is the intention to have one result file for every dataset? Could be a good idea, so it's easy to get an idea of what kind of performance to expect

README.md Outdated Show resolved Hide resolved
Co-authored-by: Niklas Muennighoff <[email protected]>
@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Mar 18, 2024

Yep exactly. This also makes it easier for us to review dataset submissions as they then also include at least 1-3 models that have run on the data.

We want it to be both:

  1. Meaningful variation: some variation between more or less capable models e.g. e5-multilingual-small and paraphrase-mpnet and ideally in the expected direction
  2. Non-trivial task: we also don't want e5 small "solve the task"

@KennethEnevoldsen
Copy link
Contributor Author

The assumption is to have the results at least for newly submitted datasets.

@Muennighoff
Copy link
Contributor

Makes sense, should this be specified somewhere e.g. in how to contribute? Also maybe it makes sense to explicitly tell people always to run model X so it's a bit easier to compare if it's always the same model? If it's just 1 model, it should probably be a multilingual one 🤔

@KennethEnevoldsen
Copy link
Contributor Author

Also maybe it makes sense to explicitly tell people always to run model X so it's a bit easier to compare if it's always the same model? If it's just 1 model, it should probably be a multilingual one 🤔

Yep I plan to make a PR with how to add datasets that include some standard models (small and multilingual). I was thinking e5-multilingual-small and the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.

@imenelydiaker
Copy link
Contributor

imenelydiaker commented Mar 18, 2024

Great idea! Both models are multilingual so yes why not! Just keep in mind that for some languages they may not perform very well.. [under-represented languages in training datasets]

@@ -0,0 +1 @@
{"model_name": "sentence-transformers/all-MiniLM-L6-v2", "time_of_run": "2024-03-18 11:22:22.739054", "versions": {"sentence_transformers": "2.0.0", "transformers": "4.6.1", "pytorch": "1.8.1"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a revision number to make sure it's the same model version that is used? Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually love to, but couldn't figure out how to do it. I don't believe it is recorded in the model object. You can naturally fetch the latest from the repo, but then hitting the cache causes discrepancies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about doing the same as for datasets in mteb (revision_id). Just specifying the commit id from the HF repo that stores the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would love to do that. However, I am not sure the commit id is available in the model object (you would have to know it beforehand). I would love to add that (but seems like that is outside the scope of this PR)

@imenelydiaker do anyone from your team have the time to give it a go?

Copy link
Contributor

@imenelydiaker imenelydiaker Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see what you mean. I can check this and open another PR.

@MartinBernstorff
Copy link
Contributor

MartinBernstorff commented Mar 19, 2024

I suppose this closes #248 - would add it to the PR description, but can't edit 👍

@KennethEnevoldsen
Copy link
Contributor Author

@MartinBernstorff this is #254

@MartinBernstorff
Copy link
Contributor

Ah yeah, have updated my comment 👍

@KennethEnevoldsen KennethEnevoldsen merged commit 8a758bc into main Mar 19, 2024
3 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the add-results-folder branch March 19, 2024 13:19
MartinBernstorff pushed a commit to MartinBernstorff/mteb that referenced this pull request Mar 20, 2024
…er structure. (embeddings-benchmark#254)

* Added model results to repo and updated CLI to create consistent folder structure.

* ci: updated ci to use make install

* Added missing pytest dependencies

* Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

---------

Co-authored-by: Niklas Muennighoff <[email protected]>
MartinBernstorff added a commit that referenced this pull request Mar 21, 2024
* refactor: rename description to metadata dict

* refactor: add TaskMetadata and first example

* update 9 files

* update TaskMetadata.py

* update TaskMetadata.py

* update TaskMetadata.py

* update LICENSE, TaskMetadata.py and requirements.dev.txt

* update 151 files

* update 150 files

* update 43 files and delete 1 file

* update 106 files

* update 45 files

* update 6 files

* update 14 files

* Added model results to repo and updated CLI to create consistent folder structure. (#254)

* Added model results to repo and updated CLI to create consistent folder structure.

* ci: updated ci to use make install

* Added missing pytest dependencies

* Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

---------

Co-authored-by: Niklas Muennighoff <[email protected]>

* Restructing the readme (#262)

* restructing the readme

* removed double specification of versions and moved all setup to pyproject.toml

* correctly use flat-layout for the package

* build(deps): update TaskMetadata.py and pyproject.toml

* update 221 files

* build(deps): update pyproject.toml

* build(deps): update pyproject.toml

* build(deps): update pyproject.toml

---------

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Niklas Muennighoff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants