Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fairseq2 LM Evaluation Harness Wrapper #707

Open
mostafaelhoushi opened this issue Jul 26, 2024 · 0 comments
Open

Create fairseq2 LM Evaluation Harness Wrapper #707

mostafaelhoushi opened this issue Jul 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mostafaelhoushi
Copy link

mostafaelhoushi commented Jul 26, 2024

Describe the solution you would like:
Create a fairseq2 wrapper class/script to enable LM Evaluation Harness, https://github.com/EleutherAI/lm-evaluation-harness. This

We need to create a wrapper class that implements the following functions:

  • loglikelihood()
  • loglikelihood_rolling()
  • generate_until()

Example wrapper classes/scripts from other frameworks that integrates LM Evaluation Harness:

  • TorchTune's eleuther_eval.py: the _EvalWrapper class
  • GPTFast's eval.py: this one has a script with command line interface

They are both single classes/scripts to port.

Describe the alternatives you have considered:
An alternative is to convert fairseq2 model to HuggingFace format and then evaluating on LM Evaluation Harness codebase. But this is a lengthy process.

With the proposed solution, we can evaluate all of LM Evaluation Harness tasks (and future tasks as well!) naturally from within fairseq2.

Additional Context:
LM Evaluation Harness, https://github.com/EleutherAI/lm-evaluation-harness, is a widely used tool to evaluate language models on a wide range of tasks.
Although it is mainly uses HuggingFace transformers, it can use any model from any framework, provided that we create a wrapper class as described above.

You can see in their README that they already support other frameworks such as NVIDIA nemo, vLLM, OpenAI, JAX, Megatron DeepSpeed, etc.

Stretch Goals:
In the future we can even support BigCode Evaluation Harness that evaluates LLMs on coding tasks.

@mostafaelhoushi mostafaelhoushi added the enhancement New feature or request label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant