APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents
Authors: Jun Yu Chen, Tao Gao
APT is an advanced framework that leverages Large Language Models (LLMs) to enable autonomous agents to construct complex and creative structures within the Minecraft environment. Unlike previous approaches that focus on skill-based tasks or rely on image-based diffusion models for voxel structures, APT utilizes the intrinsic spatial reasoning capabilities of LLMs.
By employing chain-of-thought decomposition alongside multimodal inputs, the framework generates detailed architectural layouts and blueprints. This allows the agent to execute tasks under zero-shot or few-shot learning scenarios.
- LLM-Driven Spatial Reasoning: Utilizes GPT-based LLMs for advanced spatial planning.
- Chain-of-Thought Decomposition: Breaks down complex construction tasks into manageable steps.
- Multimodal Input Integration: Combines textual and visual instructions for comprehensive understanding.
- Memory and Reflection Modules: Facilitates lifelong learning, adaptive refinement, and error correction.
- Zero-Shot/Few-Shot Learning: Executes tasks without extensive pre-training.
- Complex Structure Generation: Builds intricate structures with functionalities like Redstone-powered systems.
- Emergent Behaviors: Exhibits unexpected behaviors such as scaffolding, showcasing advanced problem-solving techniques.
We introduce a comprehensive benchmark comprising diverse construction tasks designed to test:
- Creativity
- Spatial reasoning
- Adherence to in-game rules
- Effective integration of multimodal instructions
Experimental results demonstrate the agent's ability to accurately interpret extensive instructions involving numerous items, positions, and orientations. A/B testing indicates that the inclusion of a memory module significantly enhances performance, emphasizing its role in continuous learning and experience reuse.
Ensure you have Node.js installed on your machine. If not, follow these steps:
- Visit the Node.js Official Site.
- Download and install the appropriate version for your operating system.
- Open your terminal or command prompt.
- Run the following command to install Mineflayer:
npm install mineflayer
For more information, you can find the detailed tutorial at the Mineflayer GitHub Repository.
- Minecraft Java Edition
- A valid Mojang or Microsoft account
- Visit the Minecraft Official Website.
- Navigate to the "Get Minecraft" section.
- Select your platform (Windows, macOS, or Linux) and choose Java Edition.
- Follow the instructions to purchase (if you don't already own it) and download the launcher.
- Install the launcher and log in using your Mojang or Microsoft account.
- Start the launcher, select the release version compatible with Mineflayer. For our test setup, we used version
1.17
. Then click Play to load the game. - Create a new world and ensure the following settings are applied:
- Set the game mode to Creative.
- Enable Cheats to allow command usage required for our package.
- Once the world loads, open the Pause Menu, click Open to LAN, and take note of the Port Number displayed. This will be required for later usage.
For more detailed instructions, visit the Minecraft Help Center.
- For convenient usage, refer to the
APTAgent.ipynb
file for detailed demo instructions. This notebook provides step-by-step guidance to experiment with our agent framework. - Please ensure you use your own OpenAI API key for authentication.
If you find our agent framework or benchmark meaningful, please consider citing us!
@misc{chen2024aptarchitecturalplanningtexttoblueprint,
title={APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents},
author={Jun Yu Chen and Tao Gao},
year={2024},
eprint={2411.17255},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.17255},
}