-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246
Conversation
Thanks! My bad, have now submitted the outputs / annotations -- just had to comment out a line in |
@YannDubs another question, when you say "script to run without GPU", do you mean we'd need to host on here https://ui.endpoints.huggingface.co/endpoints? |
It can be on any platform you want: we had models on HF API, on Together, or even on local servers using OpenAI's client. The point is to have a single alpaca_eval command that I can run without requiring any GPUs nor OpenAI credits to verify. That's why it's uncommon to be verified outside of industry labs. I'll go ahead with the community results until I get all the above! |
results/Contextual-KTO-Mistral-PairRM/weighted_alpaca_eval_gpt4_turbo/leaderboard.csv
Outdated
Show resolved
Hide resolved
@xwinxu please only add the files described in the README. Instead of changing .gitignore you just need to use |
thanks for the tip, rectified the files committed! (also emailed you the appropriate keys for the verification). |
Closing in favor of #250 (verified), congrats! very impressive 💯 |
We would also like to request being verified and added to the Verified leaderboard. Thanks so much for your help!
Edit [03/06/2024]: I have included a prompt and config file in
model_configs
for the-Verified
run using our model served on the endpoint, and updated the model outputs and leaderboard with the results from this setting. Note that if the client for generations does not usechat_template
orchatml
like the openai client, i.e. offline evaluation code, then please use the non-Verified
prompt template so that the prompts are formatted correctly as inputs to the decoder for the generations.