Trajectory Improvement and Reward Learning from Comparative Language Feedback (CoRL 2024)

Authors: Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart Russell, Anca Dragan, Erdem Bıyık

Website: https://liralab.usc.edu/comparative-language-feedback

Paper: https://arxiv.org/abs/2410.06401

Installation

# create conda environment
conda create -n lang python=3.8
conda activate lang

# install dependencies
pip install -r requirements.txt

pip install -e .

Download Data

Please download the preprocessed data from here and put it in the data folder. If you want to collect your own data, please follow the instructions in this and this repo.

Feature Learning

We adopt a two-stage training procedure. First, we freeze the language model(T5) and train the trajectory encoder. Then we finetune the language model and the trajectory encoder jointly.

python -m feature_learning.learn_features --initial-loss-check \
--data-dir=data/robosuite_data --batch-size=1024 \
--use-lang-encoder  --exp-name=xxx --lang-model=t5-base --traj-reg-coeff 1e-2

Reward Learning

python -m lang_pref_learning.pref_learning.train_pref_learning --env=robosuite \
--data-dir=data/robosuite_pref_learning_2 \
--model-dir=MODEL_DIR \
--true-reward-dir=lang_pref_learning/pref_learning/true_rewards_rs/0 \
--method=lang \
--traj-encoder=mlp \
--lang-model-name=t5-small \
--seed=42 \
--lr=1e-2 \
--weight-decay=0.1 \
--num-iterations=1 \
--use-softmax \
--use-lang-pref \
--use-other-feedback \
--num-other-feedback=20 \

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
data		data
img		img
lang_pref_learning		lang_pref_learning
plots		plots
scripts		scripts
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
get_features.sh		get_features.sh
pref_learning_comparison.pdf		pref_learning_comparison.pdf
readme.md		readme.md
requirements.txt		requirements.txt
resized_img_obs_example.png		resized_img_obs_example.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trajectory Improvement and Reward Learning from Comparative Language Feedback (CoRL 2024)

Installation

Download Data

Feature Learning

Reward Learning

About

Releases

Packages

Contributors 2

Languages

USC-Lira/language-preference-learning

Folders and files

Latest commit

History

Repository files navigation

Trajectory Improvement and Reward Learning from Comparative Language Feedback (CoRL 2024)

Installation

Download Data

Feature Learning

Reward Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages