add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

xwinxu · 2024-03-05T01:26:06Z

We would also like to request being verified and added to the Verified leaderboard. Thanks so much for your help!

Edit [03/06/2024]: I have included a prompt and config file in model_configs for the -Verified run using our model served on the endpoint, and updated the model outputs and leaderboard with the results from this setting. Note that if the client for generations does not use chat_template or chatml like the openai client, i.e. offline evaluation code, then please use the non-Verified prompt template so that the prompts are formatted correctly as inputs to the decoder for the generations.

src/alpaca_eval/models_configs/Contextual-KTO-Mistral-PairRM/prompt.txt

YannDubs · 2024-03-05T06:11:17Z

Impressive results 💯

For going in the leaderboard, you need to also submit the model_outputs and the annotations. Commands are described here
Being verified is not required nor common for models submitted by the community. But here are the steps you need to follow.

xwinxu · 2024-03-05T06:51:25Z

Thanks! My bad, have now submitted the outputs / annotations -- just had to comment out a line in .gitignore. I have also emailed you with the appropriate things to run the verification -- can continue correspondence in that thread. Can you clarify what is meant by "api key to decode model"? Is this for the huggingface api, and is this necessary for public model repos?

xwinxu · 2024-03-05T06:57:26Z

@YannDubs another question, when you say "script to run without GPU", do you mean we'd need to host on here https://ui.endpoints.huggingface.co/endpoints?

YannDubs · 2024-03-05T07:00:53Z

It can be on any platform you want: we had models on HF API, on Together, or even on local servers using OpenAI's client. The point is to have a single alpaca_eval command that I can run without requiring any GPUs nor OpenAI credits to verify.

That's why it's uncommon to be verified outside of industry labs. I'll go ahead with the community results until I get all the above!

results/Contextual-KTO-Mistral-PairRM/reference_outputs.json

results/Contextual-KTO-Mistral-PairRM/weighted_alpaca_eval_gpt4_turbo/leaderboard.csv

YannDubs · 2024-03-05T07:07:16Z

@xwinxu please only add the files described in the README. Instead of changing .gitignore you just need to use git add -f to force commit a file

xwinxu · 2024-03-05T07:40:54Z

@xwinxu please only add the files described in the README. Instead of changing .gitignore you just need to use git add -f to force commit a file

thanks for the tip, rectified the files committed! (also emailed you the appropriate keys for the verification).

YannDubs · 2024-03-07T03:30:05Z

Closing in favor of #250 (verified), congrats! very impressive 💯

xwinxu added 3 commits March 1, 2024 23:28

config and prompt fixes to get 2nd on leaderboard

ce072c5

Add Contextual-KTO-Mistral-PairRM to AlpacaEval

4cbe660

merge

8032608

YannDubs reviewed Mar 5, 2024

View reviewed changes

src/alpaca_eval/models_configs/Contextual-KTO-Mistral-PairRM/prompt.txt Show resolved Hide resolved

add model outputs and annotation results

01bf8c4

YannDubs reviewed Mar 5, 2024

View reviewed changes

results/Contextual-KTO-Mistral-PairRM/reference_outputs.json Outdated Show resolved Hide resolved

YannDubs reviewed Mar 5, 2024

View reviewed changes

results/Contextual-KTO-Mistral-PairRM/weighted_alpaca_eval_gpt4_turbo/leaderboard.csv Outdated Show resolved Hide resolved

Removed unneeded files

6fd1dcd

xwinxu and others added 5 commits March 5, 2024 17:56

Merge branch 'tatsu-lab:main' into main

e8e55ce

add verified model config

8a19a23

update model outputs and annotation with tgi evaluation

aaa68ba

update leaderboard

213a413

remove reference outputs and leaderboard files

f80ede1

xwinxu requested a review from YannDubs March 6, 2024 20:44

YannDubs closed this Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

xwinxu commented Mar 5, 2024 •

edited

Loading

YannDubs commented Mar 5, 2024 •

edited

Loading

xwinxu commented Mar 5, 2024

xwinxu commented Mar 5, 2024

YannDubs commented Mar 5, 2024 •

edited

Loading

YannDubs commented Mar 5, 2024

xwinxu commented Mar 5, 2024 •

edited

Loading

YannDubs commented Mar 7, 2024

add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

Conversation

xwinxu commented Mar 5, 2024 • edited Loading

YannDubs commented Mar 5, 2024 • edited Loading

xwinxu commented Mar 5, 2024

xwinxu commented Mar 5, 2024

YannDubs commented Mar 5, 2024 • edited Loading

YannDubs commented Mar 5, 2024

xwinxu commented Mar 5, 2024 • edited Loading

YannDubs commented Mar 7, 2024

xwinxu commented Mar 5, 2024 •

edited

Loading

YannDubs commented Mar 5, 2024 •

edited

Loading

YannDubs commented Mar 5, 2024 •

edited

Loading

xwinxu commented Mar 5, 2024 •

edited

Loading