Skip to content

Commit

Permalink
Merge pull request #76 from stanfordnlp/amir/dpo
Browse files Browse the repository at this point in the history
ReFT + DPO Tutorial
  • Loading branch information
frankaging authored May 5, 2024
2 parents b4c82d3 + 652acf1 commit 56bd279
Show file tree
Hide file tree
Showing 4 changed files with 6,249 additions and 1 deletion.
7 changes: 7 additions & 0 deletions examples/dpo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Representation Fine-Tuning for Direct Preference Optimization

This is a tutorial for using ReFT with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) objective.

Follow the [`dpo.ipynb`](dpo.ipynb) notebook for a walk-through of training a ReFT model with DPO to answer questions truthfully based on the [TruthfulQA](https://arxiv.org/abs/2109.07958) dataset.

The DPO ReFT trainer is based on the DPOTrainer implementation in the `trl` library. The adapted trainer is implemented in [`dpo_trainer.py`](dpo_trainer.py).
Loading

0 comments on commit 56bd279

Please sign in to comment.