Skip to content

[ICLR 2025] PyTorch code for paper "Open-World Reinforcement Learning over Long Short-Term Imagination"

License

Notifications You must be signed in to change notification settings

qiwang067/LS-Imagine

Repository files navigation

[ICLR 2025] Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li* · Qi Wang* · Yunbo Wang · Xin Jin · Yang Li · Wenjun Zeng · Xiaokang Yang

Paper    |   arXiv    |    Website   

⚡ Quick Start | 📥 Checkpoints Download | 📝 Citation

Teaser image

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imgine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a long short-term world model. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

evaluation_results

Quick Start

Install the Environment

LS-Imagine is implemented and tested on Ubuntu 20.04 with python==3.9:

  1. Create an environment

    conda create -n ls_imagine python=3.9
    conda activate ls_imagine 
  2. Install Java: JDK 1.8.0_171. Then install the MineDojo environment and MineCLIP following their official documents. During the installation of MineDojo, various errors may occur.

Note

We provide the detailed installation process and solutions to common errors, please refer to here.

  1. Install dependencies

    pip install -r requirements.txt
  2. Download the MineCLIP weight here and place them at ./weights/mineclip_attn.pth.

  3. We provide two options for recording data during the training process: TensorBoard and Weights & Biases (wandb).

    • To use TensorBoard, set use_wandb to False in the ./config.yaml file.
    • To use wandb (optional), set use_wandb to True in the ./config.yaml file. Additionally, retrieve your wandb API key and set it in the ./config.yaml file under the field wandb_key: {your_wandb_api_key}.

Pretrained Weights

We provide pretrained weights of LS-Imagine for the tasks mentioned in the paper. You can download them using the links in the table below and rename the downloaded file to latest.pt:

Task Name Weight File
harvest_log_in_plains latest_log.pt
harvest_water_with_bucket latest_water.pt
harvest_sand latest_sand.pt
mine_iron_ore latest_iron.pt
shear_sheep latest_wool.pt

To start a evaluating run from one of these checkpoints:

  1. Set up the task for evaluation (instructions here).

  2. Run the following command to test the success rate:

    sh ./scripts/test.sh /path/to/latest.pt 100 test_harvest_log_in_plains

Quick Links

Training LS-Imagine in MineDojo

LS-Imagine mainly consists of two stages: fine-tuning a multimodal U-Net for generating affordance maps, learning world models and behaviors.

You can either set up custom tasks in MineDojo (instructions here) or use the task setups mentioned in our paper. LS-Imagine allows to start from any stage of the pipeline, as we provide corresponding checkpoint files for each stage to ensure flexibility.

U-Net Finetuning for Affordance Map Generation

  1. Download the pretrained U-Net weights from here and save them to ./affordance_map/pretrained_unet_checkpoint/swin_unet_checkpoint.pth.

  2. Set up the task (instructions here) and run the following command to collect data:

    sh ./script/collect.sh your_task_name
  3. Annotate the collected data using a method based on sliding bounding box scanning and simulated exploration to generate the fine-tuning dataset:

    sh ./scripts/affordance.sh your_task_name your_prompt
  4. Fine-tune the pretrained U-Net weights using the annotated dataset to generate task-specific affordance maps:

    sh ./scripts/finetune_unet.sh your_task_name
  5. After training, the fine-tuned multimodal U-Net weights for the specified task will be saved in ./affordance_map/model_out.

World Model and Behavior Learning

Before starting the learning process for the world model and behavior, ensure you have obtained the multimodal U-Net weights. We provide the pretrained U-Net weights (link here) and the task-specific fine-tuned U-Net weights:

Task Name Weight File
harvest_log_in_plains swin_unet_checkpoint_log.pth
harvest_water_with_bucket swin_unet_checkpoint_water.pth
harvest_sand swin_unet_checkpoint_sand.pth
mine_iron_ore swin_unet_checkpoint_iron.pth
shear_sheep swin_unet_checkpoint_wool.pth

You can download these weights using the links provided in the table below and place them at ./affordance_map/finetune_unet/finetune_checkpoints/{task_name}/swin_unet_checkpoint.pth:

  1. Set up the task and correctly configure the unet_checkpoint_dir to ensure the U-Net weights are properly located and loaded (instructions here).

  2. Run the following command to start training the world model and behavior:

    sh ./scripts/train.sh your_task_name

Success Rate Evaluation

After completing the training, the agent's weight file latest.pt will be saved in the ./logdir directory. You can evaluate the performance of LS-Imagine as mentioned in here.

Citation

If you find this repo useful, please cite our paper:

@inproceedings{li2025open,
    title={Open-World Reinforcement Learning over Long Short-Term Imagination}, 
    author={Jiajian Li and Qi Wang and Yunbo Wang and Xin Jin and Yang Li and Wenjun Zeng and Xiaokang Yang},
    booktitle={ICLR},
    year={2025}
}

Credits

The codes refer to the implemention of dreamerv3-torch and Swin-Unet. Thanks for the authors!

About

[ICLR 2025] PyTorch code for paper "Open-World Reinforcement Learning over Long Short-Term Imagination"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published