AssistantBench evaluates the ability of AI agents to solve reaslistic and time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?".
To start working on AssistantBench, please check out our HuggingFace dataset and leaderboard, where you can also make new submissions.
We also introduce SeePlanAct (SPA), a new web agent built to tackle tasks in AssistantAgent. Code to run SPA and additional resources will be released soon!
@misc{yoran2024assistantbenchwebagentssolve,
title={AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?},
author={Ori Yoran and Samuel Joseph Amouyal and Chaitanya Malaviya and Ben Bogin and Ofir Press and Jonathan Berant},
year={2024},
eprint={2407.15711},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.15711},
}