[ICLR 2025] Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li* · Qi Wang* · Yunbo Wang · Xin Jin · Yang Li · Wenjun Zeng · Xiaokang Yang

Paper | arXiv | Website

⚡ Quick Start | 📥 Checkpoints Download | 📝 Citation

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imgine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a long short-term world model. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Quick Start

Install the Environment

LS-Imagine is implemented and tested on Ubuntu 20.04 with python==3.9:

Create an environment

conda create -n ls_imagine python=3.9
conda activate ls_imagine

Install Java: JDK 1.8.0_171. Then install the MineDojo environment and MineCLIP following their official documents. During the installation of MineDojo, various errors may occur.

Note

We provide the detailed installation process and solutions to common errors, please refer to here.

Install dependencies
```
pip install -r requirements.txt
```
Download the MineCLIP weight here and place them at ./weights/mineclip_attn.pth.
We provide two options for recording data during the training process: TensorBoard and Weights & Biases (wandb).
- To use TensorBoard, set use_wandb to False in the ./config.yaml file.
- To use wandb (optional), set use_wandb to True in the ./config.yaml file. Additionally, retrieve your wandb API key and set it in the ./config.yaml file under the field wandb_key: {your_wandb_api_key}.

Pretrained Weights

We provide pretrained weights of LS-Imagine for the tasks mentioned in the paper. You can download them using the links in the table below and rename the downloaded file to latest.pt:

Task Name	Weight File
harvest_log_in_plains	latest_log.pt
harvest_water_with_bucket	latest_water.pt
harvest_sand	latest_sand.pt
mine_iron_ore	latest_iron.pt
shear_sheep	latest_wool.pt

To start a evaluating run from one of these checkpoints:

Set up the task for evaluation (instructions here).

Run the following command to test the success rate:

sh ./scripts/test.sh /path/to/latest.pt 100 test_harvest_log_in_plains

Quick Links

Training LS-Imagine in MineDojo
- U-Net Finetuning for Affordance Map Generation
- World Model and Behavior Learning
Success Rate Evaluation
Citation
Credits

Training LS-Imagine in MineDojo

LS-Imagine mainly consists of two stages: fine-tuning a multimodal U-Net for generating affordance maps, learning world models and behaviors.

You can either set up custom tasks in MineDojo (instructions here) or use the task setups mentioned in our paper. LS-Imagine allows to start from any stage of the pipeline, as we provide corresponding checkpoint files for each stage to ensure flexibility.

U-Net Finetuning for Affordance Map Generation

Download the pretrained U-Net weights from here and save them to ./affordance_map/pretrained_unet_checkpoint/swin_unet_checkpoint.pth.
Set up the task (instructions here) and run the following command to collect data:
```
sh ./script/collect.sh your_task_name
```
Annotate the collected data using a method based on sliding bounding box scanning and simulated exploration to generate the fine-tuning dataset:
```
sh ./scripts/affordance.sh your_task_name your_prompt
```
Fine-tune the pretrained U-Net weights using the annotated dataset to generate task-specific affordance maps:
```
sh ./scripts/finetune_unet.sh your_task_name
```
After training, the fine-tuned multimodal U-Net weights for the specified task will be saved in ./affordance_map/model_out.

World Model and Behavior Learning

Before starting the learning process for the world model and behavior, ensure you have obtained the multimodal U-Net weights. We provide the pretrained U-Net weights (link here) and the task-specific fine-tuned U-Net weights:

Task Name	Weight File
harvest_log_in_plains	swin_unet_checkpoint_log.pth
harvest_water_with_bucket	swin_unet_checkpoint_water.pth
harvest_sand	swin_unet_checkpoint_sand.pth
mine_iron_ore	swin_unet_checkpoint_iron.pth
shear_sheep	swin_unet_checkpoint_wool.pth

You can download these weights using the links provided in the table below and place them at ./affordance_map/finetune_unet/finetune_checkpoints/{task_name}/swin_unet_checkpoint.pth:

Set up the task and correctly configure the unet_checkpoint_dir to ensure the U-Net weights are properly located and loaded (instructions here).
Run the following command to start training the world model and behavior:
```
sh ./scripts/train.sh your_task_name
```

Success Rate Evaluation

After completing the training, the agent's weight file latest.pt will be saved in the ./logdir directory. You can evaluate the performance of LS-Imagine as mentioned in here.

Citation

If you find this repo useful, please cite our paper:

@inproceedings{li2025open,
    title={Open-World Reinforcement Learning over Long Short-Term Imagination}, 
    author={Jiajian Li and Qi Wang and Yunbo Wang and Xin Jin and Yang Li and Wenjun Zeng and Xiaokang Yang},
    booktitle={ICLR},
    year={2025}
}

Credits

The codes refer to the implemention of dreamerv3-torch and Swin-Unet. Thanks for the authors！

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
affordance_map		affordance_map
assets		assets
docs		docs
envs		envs
scripts		scripts
variant		variant
.DS_Store		.DS_Store
.gitignore		.gitignore
DS_Store		DS_Store
LICENSE		LICENSE
README.md		README.md
collect_rollouts.py		collect_rollouts.py
configs.yaml		configs.yaml
exploration.py		exploration.py
expr.py		expr.py
models.py		models.py
networks.py		networks.py
parallel.py		parallel.py
requirements.txt		requirements.txt
test.py		test.py
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2025] Open-World Reinforcement Learning over Long Short-Term Imagination

Paper | arXiv | Website

Quick Start

Install the Environment

Pretrained Weights

Quick Links

Training LS-Imagine in MineDojo

U-Net Finetuning for Affordance Map Generation

World Model and Behavior Learning

Success Rate Evaluation

Citation

Credits

About

Releases

Packages

Contributors 2

Languages

License

qiwang067/LS-Imagine

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2025] Open-World Reinforcement Learning over Long Short-Term Imagination

Paper | arXiv | Website

Quick Start

Install the Environment

Pretrained Weights

Quick Links

Training LS-Imagine in MineDojo

U-Net Finetuning for Affordance Map Generation

World Model and Behavior Learning

Success Rate Evaluation

Citation

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages