Authors: Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart Russell, Anca Dragan, Erdem Bıyık
Website: https://liralab.usc.edu/comparative-language-feedback
Paper: https://arxiv.org/abs/2410.06401
# create conda environment
conda create -n lang python=3.8
conda activate lang
# install dependencies
pip install -r requirements.txt
pip install -e .
Please download the preprocessed data from here and put it in the data
folder. If you want to collect your own data, please follow the instructions in this and this repo.
We adopt a two-stage training procedure. First, we freeze the language model(T5) and train the trajectory encoder. Then we finetune the language model and the trajectory encoder jointly.
python -m feature_learning.learn_features --initial-loss-check \
--data-dir=data/robosuite_data --batch-size=1024 \
--use-lang-encoder --exp-name=xxx --lang-model=t5-base --traj-reg-coeff 1e-2
python -m lang_pref_learning.pref_learning.train_pref_learning --env=robosuite \
--data-dir=data/robosuite_pref_learning_2 \
--model-dir=MODEL_DIR \
--true-reward-dir=lang_pref_learning/pref_learning/true_rewards_rs/0 \
--method=lang \
--traj-encoder=mlp \
--lang-model-name=t5-small \
--seed=42 \
--lr=1e-2 \
--weight-decay=0.1 \
--num-iterations=1 \
--use-softmax \
--use-lang-pref \
--use-other-feedback \
--num-other-feedback=20 \