This is a guide to run example scripts for paper "Guiding Long-Horizon Task and Motion Planning with Vision Language Models" by Yang, et al. See project page or arXiv.
First, follow the main README in kitchen-worlds to install the dependencies.
Then run the following:
conda activate kitchen
pip install pddlgym anytree
Set your OpenAI api key or Anthropic key in the environment variable (e.g. in ~/.bashrc
or ~/.zshrc
):
export OPENAI_API_KEY=<openai_api_key>
export ANTHROPIC_API_KEY=<anthropic_api_key>
or put them in a text file inside the keys
directory that will be ignored by git
cd vlm_tools; mkdir keys
echo <openai_api_key> > keys/openai_api_key.txt
echo <anthropic_api_key> > keys/anthropic_api_key.txt
Ask GPT4v to break down a high-level goal to a sequence of subgoals, then give them to PDDLStream in sequence.
cd pybullet_planning
python tutorials/test_vlm_tamp.py \
--open_goal "make chicken soup" \
--exp_subdir "test_fun" \
--planning_mode "sequence"
Common args to the script:
--open_goal
: Natural language description of the goal--exp_subdir
: Output will be saved inexperiments/{exp_subdir}/{auto_datetime}_vlm-tamp/
--problem_name
: Name to a python class invlm_tools/problems_vlm_tamp.py
that initiate the scene and problem, it initiates all objects that are required to solve the given open goal--difficulty
: Difficulty level of the task, which the scene builder function uses to determine how much movable and articulated obstacles to add,default=0
--dual_arm
:action='store_true'
, whether to use dual arm or single arm of the PR2 robot--planning_mode
: Whether to use dual arm or single arm of the PR2 robot,choices=['sequence', 'actions', 'sequence-reprompt', 'actions-reprompt']
--load_llm_memory
: Subpath insidekitchen-worlds/experiments/
in the format of{exp_subdir}/{auto_datetime}_vlm-tamp/
where the previous responses generated by the VLM is saved, e.g.test_run_vlm_tamp_pr2_chicken_soup/241106_212402_vlm-tamp
The output logs of previous runs can be viewed at http://0.0.0.0:9000/ by running the following in a different terminal.
(cd experiments/; python -m http.server 9000)
After the server is launched, the log of the last run can be viewed at http://0.0.0.0:9000/latest_run/log/.
Please cite the following paper if you use this code in your research:
@misc{yang2024guidinglonghorizontaskmotion,
title={Guiding Long-Horizon Task and Motion Planning with Vision Language Models},
author={Zhutian Yang and Caelan Garrett and Dieter Fox and Tomás Lozano-Pérez and Leslie Pack Kaelbling},
year={2024},
eprint={2410.02193},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.02193},
}