Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise comparison GPT evaluation #34

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

maxime-louis
Copy link
Collaborator

@maxime-louis maxime-louis commented Nov 20, 2024

  • Enabling pairwise comparison using openai evaluator and using llm/vllm
  • cleaning eval.py
  • removing eval.py into evaluate.py
  • removing bem
  • refactoring llm evaluator and vllm evaluator.

For usage, specify when using eval.py:
--opponent_folder # folder containing a json with identical queries as the 'folder'
--opponent_name # for metric naming
--folder must be specified: otherwise we don't know which opponent folder matches exactly.

@AlexandreMisrahi2005
Copy link
Collaborator

I think this needs slight modifications in tests/zeroshot_test.py (i added them in branch multidomaincommit fed8375)

@maxime-louis
Copy link
Collaborator Author

Example command:

python3 evaluate.py --llm --folder $folder_1 --opponent_folder $folder_2 --opponent_name 5_docs --sample 1000

And metrics file will look like:

  "M": 0.5546428571428571,
  "EM": 0.0,
  "F1": 0.032544733511811585,
  "Precision": 0.017218690497507058,
  "Recall": 0.4792493758176808,
  "Recall_char3gram": 0.7297633463840201,
  "Rouge-1": 0.051238652578241985,
  "Rouge-2": 0.01682638307811561,
  "Rouge-L": 0.05061159507765502,
  "VLLMeval": "0.5866141732283464",
  "LLMeval_VS_5_docs_100_win": 8.24742268041237,
  "LLMeval_VS_5_docs_100_tie": 82.47422680412372,
  "LLMeval_VS_5_docs_100_lose": 9.278350515463918,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_100_win": 25.510204081632654,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_100_tie": 66.3265306122449,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_100_lose": 8.16326530612245,
  "LLMeval_VS_5_docs_1000_win": 10.717896865520729,
  "LLMeval_VS_5_docs_1000_tie": 81.19312436804853,
  "LLMeval_VS_5_docs_1000_lose": 8.088978766430738,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_1000_win": 19.854469854469855,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_1000_tie": 67.87941787941789,
  "LLMeval_vllm_SOLAR-107B_VS_5_docs_1000_lose": 12.266112266112266,
  "LLMeval_SOLAR-107B_logits_1000": 0.6162379224810404,
  "LLMeval_vllm_SOLAR-107B_1000": 0.5806122448979592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants