EARBench: Benchmarking Physical Risk Awareness of Embodied AI Agents

This is the official repository of the paper "EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents".

📄Paper | 🤗Dataset

Overview

EARBench is the first automated physical risk assessment framework specifically designed for Embodied AI (EAI) scenarios. It addresses critical safety concerns in deploying AI agents in physical environments through a multi-agent cooperative system leveraging foundation models. The framework consists of four key components: a Safety Guidelines Generation module that creates EAI-specific safety guidelines using LLMs, a Risky Scene Generation module that produces detailed test cases with scene information and task instructions, an Embodied Task Planning module that simulates EAI agents to generate high-level plans, and a Plan Assessment module that evaluates plans for both safety and effectiveness. Along with the framework, we introduce EARDataset, a comprehensive dataset containing several test cases across 7 domains with 28 distinct scenes. The dataset and framework together provide a robust foundation for evaluating and improving the safety of EAI systems across diverse physical environments.

Quick Start

Installation

git clone https://github.com/zihao-ai/EARBench.git
cd EARBench
pip install -r requirements.txt

Download the dataset

Download the images of the EARDataset from Google Drive / Hugging Face and unzip/rename it. The structure of the dataset should be like this:

EARDataset
  - images
    - <scene>
      - <image_path>
  - dataset.csv

Evaluation

You can quickly evaluate any LLM-based EAI agent with OpenAI API on the EARDataset. The evaluation results will be saved in the results folder.

python evaluate.py --model <model> --api_key <api_key> --api_url <api_url>

Create your own test cases

You can create your own test cases for new domains with the following scripts:

Safety Tips Generation:

python safety_tip_generation.py --scene <scene> --output_dir <output_dir>

Risky Scene Generation:

python scene_generation.py --scene <scene> --safety_tip <safety_tip> --explanation <explanation>

Scene Image Generation:

Generate the prompt for text2image model:

python text2image_prompt_generation.py --scene <scene> --output_dir <output_dir>

Generate the image with the prompt:

python scene_image_generation.py --prompt <prompt> --output <output>

Evaluate the image with the text observation:

python image_judger.py --scene <scene> --img_path <img_path> --text_observation <text_observation>

Text Observation Generation:

python text_observation_generation.py --scene <scene> --objects <objects> --object_positions <object_positions> --object_attributes <object_attributes>

Citation

If you find our work helpful, please cite:

@aticle{zhu2024EARBench,
  title={EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents},
  author={Zhu, Zihao and Wu, Bingzhe and Zhang, Zhengyou and Han, Lei and Liu, Qingshan and Wu, Baoyuan},
  journal={arXiv preprint arXiv:2408.04449},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
llm_models		llm_models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.csv		dataset.csv
evaluate.py		evaluate.py
image_judger.py		image_judger.py
plan_evaluation.py		plan_evaluation.py
plan_generation_with_image.py		plan_generation_with_image.py
plan_generation_with_text.py		plan_generation_with_text.py
requirements.txt		requirements.txt
safety_tip_generation.py		safety_tip_generation.py
scene_generation.py		scene_generation.py
scene_image_generation.py		scene_image_generation.py
text2image_prompt_generation.py		text2image_prompt_generation.py
text_observation_generation.py		text_observation_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EARBench: Benchmarking Physical Risk Awareness of Embodied AI Agents

Overview

Quick Start

Installation

Download the dataset

Evaluation

Create your own test cases

Safety Tips Generation:

Risky Scene Generation:

Scene Image Generation:

Text Observation Generation:

Citation

About

Releases

Packages

Languages

License

zihao-ai/EARBench

Folders and files

Latest commit

History

Repository files navigation

EARBench: Benchmarking Physical Risk Awareness of Embodied AI Agents

Overview

Quick Start

Installation

Download the dataset

Evaluation

Create your own test cases

Safety Tips Generation:

Risky Scene Generation:

Scene Image Generation:

Text Observation Generation:

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages