Is it possible to run the harness against API hosted models? #148

pnewhook · 2023-10-16T18:55:39Z

I have a model that's only available through a RESTful API, and need to get some benchmarks. I'd like to run MultiPL-E benchmarks with a few languages. Has any work gone into using bigcode-evaluation-harness to perform generation with an API instead of on the local machine?

lm-evaluation-harness has the ability to run against commercial APIs, especially OpenAI.

loubnabnl · 2023-10-20T15:39:31Z

Hello, we currently don't support external APIs, only generations with transformers. Feel free to open a PR if you have something in mind.

If you're interested in HumanEvalPack benchmarks and OpenAI models there's a task that supports it(docs here)https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/humanevalpack_openai.py

krrishdholakia · 2023-11-03T14:41:51Z

is there a way i could 'fake' a local model and have it call a hosted API endpoint? @pnewhook @loubnabnl

loubnabnl · 2023-11-24T10:47:13Z

I don't think that's possible with current setup which uses transformers loading that assumes you have the model checkpoint

hmellor mentioned this issue Jan 23, 2024

Make main.py compatible with OpenAI compatible APIs #189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to run the harness against API hosted models? #148

Is it possible to run the harness against API hosted models? #148

pnewhook commented Oct 16, 2023

loubnabnl commented Oct 20, 2023 •

edited

Loading

krrishdholakia commented Nov 3, 2023

loubnabnl commented Nov 24, 2023

Is it possible to run the harness against API hosted models? #148

Is it possible to run the harness against API hosted models? #148

Comments

pnewhook commented Oct 16, 2023

loubnabnl commented Oct 20, 2023 • edited Loading

krrishdholakia commented Nov 3, 2023

loubnabnl commented Nov 24, 2023

loubnabnl commented Oct 20, 2023 •

edited

Loading