You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GPTFast's eval.py: this one has a script with command line interface
They are both single classes/scripts to port.
Describe the alternatives you have considered:
An alternative is to convert fairseq2 model to HuggingFace format and then evaluating on LM Evaluation Harness codebase. But this is a lengthy process.
With the proposed solution, we can evaluate all of LM Evaluation Harness tasks (and future tasks as well!) naturally from within fairseq2.
Additional Context:
LM Evaluation Harness, https://github.com/EleutherAI/lm-evaluation-harness, is a widely used tool to evaluate language models on a wide range of tasks.
Although it is mainly uses HuggingFace transformers, it can use any model from any framework, provided that we create a wrapper class as described above.
You can see in their README that they already support other frameworks such as NVIDIA nemo, vLLM, OpenAI, JAX, Megatron DeepSpeed, etc.
Stretch Goals:
In the future we can even support BigCode Evaluation Harness that evaluates LLMs on coding tasks.
The text was updated successfully, but these errors were encountered:
Describe the solution you would like:
Create a fairseq2 wrapper class/script to enable LM Evaluation Harness, https://github.com/EleutherAI/lm-evaluation-harness. This
We need to create a wrapper class that implements the following functions:
loglikelihood()
loglikelihood_rolling()
generate_until()
Example wrapper classes/scripts from other frameworks that integrates LM Evaluation Harness:
eleuther_eval.py
: the_EvalWrapper
classeval.py
: this one has a script with command line interfaceThey are both single classes/scripts to port.
Describe the alternatives you have considered:
An alternative is to convert fairseq2 model to HuggingFace format and then evaluating on LM Evaluation Harness codebase. But this is a lengthy process.
With the proposed solution, we can evaluate all of LM Evaluation Harness tasks (and future tasks as well!) naturally from within fairseq2.
Additional Context:
LM Evaluation Harness, https://github.com/EleutherAI/lm-evaluation-harness, is a widely used tool to evaluate language models on a wide range of tasks.
Although it is mainly uses HuggingFace
transformers
, it can use any model from any framework, provided that we create a wrapper class as described above.You can see in their README that they already support other frameworks such as NVIDIA nemo, vLLM, OpenAI, JAX, Megatron DeepSpeed, etc.
Stretch Goals:
In the future we can even support BigCode Evaluation Harness that evaluates LLMs on coding tasks.
The text was updated successfully, but these errors were encountered: