Skip to content

USC-Lira/language-preference-learning

Repository files navigation

Trajectory Improvement and Reward Learning from Comparative Language Feedback (CoRL 2024)

Authors: Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart Russell, Anca Dragan, Erdem Bıyık

Website: https://liralab.usc.edu/comparative-language-feedback

Paper: https://arxiv.org/abs/2410.06401

Installation

# create conda environment
conda create -n lang python=3.8
conda activate lang

# install dependencies
pip install -r requirements.txt

pip install -e .

Download Data

Please download the preprocessed data from here and put it in the data folder. If you want to collect your own data, please follow the instructions in this and this repo.

Feature Learning

We adopt a two-stage training procedure. First, we freeze the language model(T5) and train the trajectory encoder. Then we finetune the language model and the trajectory encoder jointly.

python -m feature_learning.learn_features --initial-loss-check \
--data-dir=data/robosuite_data --batch-size=1024 \
--use-lang-encoder  --exp-name=xxx --lang-model=t5-base --traj-reg-coeff 1e-2

Reward Learning

python -m lang_pref_learning.pref_learning.train_pref_learning --env=robosuite \
--data-dir=data/robosuite_pref_learning_2 \
--model-dir=MODEL_DIR \
--true-reward-dir=lang_pref_learning/pref_learning/true_rewards_rs/0 \
--method=lang \
--traj-encoder=mlp \
--lang-model-name=t5-small \
--seed=42 \
--lr=1e-2 \
--weight-decay=0.1 \
--num-iterations=1 \
--use-softmax \
--use-lang-pref \
--use-other-feedback \
--num-other-feedback=20 \

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published