Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Contextual-KTO-Mistral-PairRM to AlpacaEval2 #246

Closed
wants to merge 10 commits into from

Conversation

xwinxu
Copy link
Contributor

@xwinxu xwinxu commented Mar 5, 2024

We would also like to request being verified and added to the Verified leaderboard. Thanks so much for your help!

Edit [03/06/2024]: I have included a prompt and config file in model_configs for the -Verified run using our model served on the endpoint, and updated the model outputs and leaderboard with the results from this setting. Note that if the client for generations does not use chat_template or chatml like the openai client, i.e. offline evaluation code, then please use the non-Verified prompt template so that the prompts are formatted correctly as inputs to the decoder for the generations.

@YannDubs
Copy link
Collaborator

YannDubs commented Mar 5, 2024

Impressive results 💯

  1. For going in the leaderboard, you need to also submit the model_outputs and the annotations. Commands are described here
  2. Being verified is not required nor common for models submitted by the community. But here are the steps you need to follow.

@xwinxu
Copy link
Contributor Author

xwinxu commented Mar 5, 2024

Thanks! My bad, have now submitted the outputs / annotations -- just had to comment out a line in .gitignore. I have also emailed you with the appropriate things to run the verification -- can continue correspondence in that thread. Can you clarify what is meant by "api key to decode model"? Is this for the huggingface api, and is this necessary for public model repos?

@xwinxu
Copy link
Contributor Author

xwinxu commented Mar 5, 2024

@YannDubs another question, when you say "script to run without GPU", do you mean we'd need to host on here https://ui.endpoints.huggingface.co/endpoints?

@YannDubs
Copy link
Collaborator

YannDubs commented Mar 5, 2024

It can be on any platform you want: we had models on HF API, on Together, or even on local servers using OpenAI's client. The point is to have a single alpaca_eval command that I can run without requiring any GPUs nor OpenAI credits to verify.

That's why it's uncommon to be verified outside of industry labs. I'll go ahead with the community results until I get all the above!

@YannDubs
Copy link
Collaborator

YannDubs commented Mar 5, 2024

@xwinxu please only add the files described in the README. Instead of changing .gitignore you just need to use git add -f to force commit a file

@xwinxu
Copy link
Contributor Author

xwinxu commented Mar 5, 2024

@xwinxu please only add the files described in the README. Instead of changing .gitignore you just need to use git add -f to force commit a file

thanks for the tip, rectified the files committed! (also emailed you the appropriate keys for the verification).

@xwinxu xwinxu requested a review from YannDubs March 6, 2024 20:44
@YannDubs
Copy link
Collaborator

YannDubs commented Mar 7, 2024

Closing in favor of #250 (verified), congrats! very impressive 💯

@YannDubs YannDubs closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants