Could you release the reproduction data for your result #5

p1nksnow · 2024-04-04T06:53:51Z

I'm testing the pass rate evaluation, could you offer the reproduction data like Toolbench?
Thanks for your reply

zhichengg · 2024-04-26T05:38:48Z

Hi! Thank you for your interest in our work.

We are planning to publish our model inference results soon. However, OpenAI updated their gpt-4-turbo models this month. With the new model as the evaluator, the performance will systematically drop. We used gpt-4-turbo-preview in our experiments but the behaviour of this model also changed a lot. We will soon update the model performance with gpt-4-turbo-2024-04-09. We are also training our own evaluator model with an open-source model to replace these closed-source models.

importpandas · 2024-07-28T11:39:19Z

Hi, thanks for your great job of StableToolBench. Is there any update on the release plan of model inference results?

I'm working on StableToolBench to build benchmark with other evaluation metrics. However, it's expensive to rerun all the model inference results. While the evaluation setup may change, is it possible to release the inference results first? It seems that the inference results will be always consistent during the whole evaluation process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you release the reproduction data for your result #5

Could you release the reproduction data for your result #5

p1nksnow commented Apr 4, 2024

zhichengg commented Apr 26, 2024

importpandas commented Jul 28, 2024

Could you release the reproduction data for your result #5

Could you release the reproduction data for your result #5

Comments

p1nksnow commented Apr 4, 2024

zhichengg commented Apr 26, 2024

importpandas commented Jul 28, 2024