Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update figures - THIS WILL BE CLOSED DO NOT USE #123

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified analyses_outputs/results.xlsx
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Binary file modified analyses_outputs/statistical_tests/conover_friedman.pdf
Binary file not shown.
Binary file modified analyses_outputs/statistical_tests/conover_friedman.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified analyses_outputs/statistical_tests/critical_difference_diagram.pdf
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions rename_DiaBLaBitextMining_files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
for f in results/*/*/DiaBLa*; do
mv "$f" "${f/DiaBLa/DiaBla}"
done

for f in results/*/DiaBLa*; do
mv "$f" "${f/DiaBLa/DiaBla}"
done
3 changes: 2 additions & 1 deletion requirements.analysis.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# This environment is dedicated to result analysis
mteb @ git+https://github.com/Lyon-NLP/mteb-french.git #removing the other mteb above for now (to adapt when mteb-french gets merged)
#mteb @ git+https://github.com/Lyon-NLP/mteb-french.git #removing the other mteb above for now (to adapt when mteb-french gets merged)
mteb==1.7.56
matplotlib>=3.8.2
openpyxl>=3.1.2
pandas>=2.1.3
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"dataset_revision": "80dc3040d19756742c9a18267ab30f54fb8e226b",
"dev": {
"eng_Latn-fra_Latn": {
"accuracy": 0.9979939819458375,
"f1": 0.9973253092611166,
"main_score": 0.9973253092611166,
"precision": 0.9969909729187563,
"recall": 0.9979939819458375
},
"evaluation_time": 154.02,
"fra_Latn-eng_Latn": {
"accuracy": 0.9939819458375125,
"f1": 0.9919759277833501,
"main_score": 0.9919759277833501,
"precision": 0.9909729187562688,
"recall": 0.9939819458375125
}
},
"mteb_dataset_name": "FloresBitextMining",
"mteb_version": "1.7.56"
}
26 changes: 22 additions & 4 deletions script_mteb_french/results_analysis/results_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,20 @@
import pandas as pd

DATASET_KEYS = {
"DiaBLaBitextMining": ["fr-en"],
"DiaBlaBitextMining": ["fr-en"],
"FloresBitextMining": MTEB(tasks=['FloresBitextMining'], task_langs=['fr', 'en']).tasks[0].langs,
"MasakhaNEWSClassification": MTEB(tasks=['MasakhaNEWSClassification'], task_langs=['fr']).tasks[0].langs,
"MasakhaNEWSClusteringS2S": MTEB(tasks=['MasakhaNEWSClusteringS2S'], task_langs=['fr']).tasks[0].langs,
"MasakhaNEWSClusteringP2P": MTEB(tasks=['MasakhaNEWSClusteringP2P'], task_langs=['fr']).tasks[0].langs,
"XPQARetrieval": MTEB(tasks=['XPQARetrieval'], task_langs=['fr']).tasks[0].langs,
}

HF_SUBSETS_VALUES = ["fra-fra"]
ISO3_LANGUAGE = ["fra-Latn"]

MODELS_TO_IGNORE = ['voyage-01', 'voyage-02', 'voyage-lite-01']
MODELS_TO_IGNORE = ['voyage-01', 'voyage-02', 'voyage-lite-01', 'Geotrend/distilbert-base-en-fr-es-pt-it-cased',
'Geotrend/bert-base-10lang-cased', 'Geotrend/bert-base-15lang-cased', 'Geotrend/bert-base-25lang-cased',
'dangvantuan/sentence-camembert-large', 'distilbert-base-uncased']
Copy link

@wissam-sib wissam-sib Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ignoring sentence-camembert-large? It was one of the best models

Copy link
Author

@imenelydiaker imenelydiaker Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is a newer version: Lajavaness/sentence-camembert-large, see description in this link. They say it's better than the other one.

I see that Lajavaness proposes a sentence-flaubert, maybe we should add it instead of flaubert

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just saw that! Nice



class ResultsParser:
Expand Down Expand Up @@ -126,8 +131,21 @@ def _get_task_score(self, task_name:str, task_results:str, subkey:str|None = Non
result_name_score (tuple[str, str]): the name of the task and name of the main scoring metric
for that task
"""
key = subkey if subkey else self.lang
selected_split = split if split else self.split

if task_results["mteb_version"].startswith("1.11.1"):
result = None
for eval in task_results["scores"][selected_split]:
hf_subset = eval['hf_subset']
languages = eval['languages'] # used when hf_subset = "default"
if (hf_subset == subkey) or (hf_subset in HF_SUBSETS_VALUES) or (languages == ISO3_LANGUAGE):
result = eval["main_score"]
continue
main_score = self.tasks_main_scores_map[task_name]
result_name_score = (task_name, main_score)
return result, result_name_score

key = subkey if subkey else self.lang
result = task_results[selected_split]
if key in result:
result = result[key]
Expand Down Expand Up @@ -173,7 +191,7 @@ def _convert_to_results_dataframe(self, result_dict:dict):
else:
subkeys = [None]
for split in self.eval_splits_map[task_name]:
if split in task_results:
if (split in task_results) or ("scores" in task_results and split in task_results["scores"]):
for subkey in subkeys:
result, result_name_score = self._get_task_score(task_name, task_results, subkey, split)
dataset_name = f"{task_name}_{split}_{subkey}" if subkey and task_type == "BitextMining" else f"{task_name}_{split}"
Expand Down
2 changes: 0 additions & 2 deletions script_mteb_french/run_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@
"OrdalieTech/Solon-embeddings-base-0.1",
"manu/sentence_croissant_alpha_v0.3",
"manu/sentence_croissant_alpha_v0.2",
"manu/bge-m3-custom-fr",
"BAAI/bge-m3",
]

# these models max_length is indicated to be 514 whereas the embedding layer actually supports 512
Expand Down