Skip to content

NJU-RL/Meta-DT

Repository files navigation

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin Chen*

A link to our paper can be found on arXiv

Overview

Official codebase for Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

MetaDT

Installation

Experiments require MuJoCo and D4RL. Follow the instructions in the [MuJoCo][D4RL] to install. Create a virtual environment using conda, and see requirments.txt file for more information about how to install the dependencies.

conda create -n meta_dt python=3.8.18 -y
conda activate meta_dt
pip install -r requirements.txt

Data Collection

Note that we set done = False in all environments, so we need to set done = False for environments walker and hopper manually in package rand_param_envs. We also share our datasets below.

Train SAC

We use SAC to train agents on different environments and collect datasets.
Train agents on different tasks in AntDir:

python train_data_collection.py --env_type ant_dir --save_freq 4000 --task_id_start 0 --task_id_end 5

in which task_id_start and task_id_end mean that training tasks of [task_id_start, task_id_end).

Generate Datasets

We use checkpoints of traning process to generate datasets. For medium and expert datasets, use:

python get_datassets.py --env_type ant_dir --data_type medium --task_id_start 0 --task_id_end 5 --capacity 20000

After obtaining datasets of all tasks, we should manually merge all task_info_{task_id}.json files into one file named task_info.json.

For medium-expert datasets, we use a mix of 70% medium and 30% expert datasets.

Downloads Datasets and pretrained world model

  • We share our datasets via this datasets
  • We share our pretrained world model via this world_model

Run Experiments

Train the context encoder using world model

python train_context.py --env_name AntDir-v0

Train the Meta Decision Transformer for few_shot Meta-DT

python train_meta_dt.py --env_name AntDir-v0 --zero_shot False --data_quality medium 

Train the Meta Decision Transformer for zero_shot Meta-DT

python train_meta_dt.py --env_name AntDir-v0 --zero_shot True --data_quality medium

Citation

Please cite our paper as:

@inproceedings{
wang2024metadt,
title={Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement},
author={Zhi Wang and Li Zhang and Wenhao Wu and Yuanheng Zhu and Dongbin Zhao and Chunlin Chen},
booktitle={Advances in Neural Information Processing Systems},
year={2024},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published