Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.
It also helps us if you spread the word! Reference the library in blog posts about the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply ⭐️ the repository to say thank you.
However you choose to contribute, please be mindful and respect our code of conduct.
This guide was heavily inspired by the awesome scikit-learn guide to contributing.
There are several ways you can contribute to TRL:
- Fix outstanding issues with the existing code.
- Submit issues related to bugs or desired new features.
- Implement trainers for new post-training algorithms.
- Contribute to the examples or the documentation.
If you don't know where to start, there is a special Good First Issue listing. It will give you a list of open issues that are beginner-friendly and help you start contributing to open-source. The best way to do that is to open a Pull Request and link it to the issue that you'd like to work on. We try to give priority to opened PRs as we can easily track the progress of the fix, and if the contributor does not have time anymore, someone else can take the PR over.
For something slightly more challenging, you can also take a look at the Good Second Issue list. In general though, if you feel like you know what you're doing, go for it and we'll help you get there! 🚀
All contributions are equally valuable to the community. 🥰
Before you start contributing make sure you have installed all the dev tools:
make dev
If you notice an issue with the existing code and have a fix in mind, feel free to start contributing and open a Pull Request!
Do your best to follow these guidelines when submitting a bug-related issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
The TRL library is robust and reliable thanks to users who report the problems they encounter.
Before you report an issue, we would really appreciate it if you could make sure the bug was not already reported (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code.
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
- Your OS type and version, Python, PyTorch, TRL and Transformers versions.
- A short, self-contained, code snippet that allows us to reproduce the bug in less than 30s.
- The full traceback if an exception is raised.
- Attach any other additional information, like screenshots, you think may help.
To get the OS and software versions automatically, run the following command:
trl env
If there is a new feature you'd like to see in TRL, please open an issue and describe:
-
What is the motivation behind this feature? Is it related to a problem or frustration with the library? Is it a feature related to something you need for a project? Is it something you worked on and think it could benefit the community?
Whatever it is, we'd love to hear about it!
-
Describe your requested feature in as much detail as possible. The more you can tell us about it, the better we'll be able to help you.
-
Provide a code snippet that demonstrates the feature's usage.
-
If the feature is related to a paper, please include a link.
If your issue is well written we're already 80% of the way there by the time you create it.
New post-training methods are published frequently and those that satisfy the following criteria are good candidates to be integrated into TRL:
- Simplicity: Does the new method achieve similar performance as prior methods, but with less complexity? A good example is Direct Preference Optimization (DPO) [Rafailov et al, 2023], which provided a simpler and compelling alternative to RLHF methods.
- Efficiency: Does the new method provide a significant improvement in training efficiency? A good example is Odds Ratio Preference Optimization (ORPO) [Hong et al, 2023], which utilizes a similar objective as DPO but requires half the GPU VRAM.
Methods that only provide incremental improvements at the expense of added complexity or compute costs are unlikely to be included in TRL.
If you want to implement a trainer for a new post-training method, first open an issue and provide the following information:
- A short description of the method and a link to the paper.
- Link to the implementation if it is open-sourced.
- Link to model weights trained with the method if they are available.
Based on the community and maintainer feedback, the next step will be to implement the trainer and config classes. See the following examples for inspiration:
- Paired preference optimisation:
dpo_trainer.py
anddpo_config.py
- RL-based optimisation:
rloo_trainer.py](./trl/trainer/rloo_trainer.py) and [
rloo_config.py - Online optimisation:
online_dpo_trainer.py
andonline_dpo_config.py
We're always looking for improvements to the documentation that make it more clear and accurate. Please let us know how the documentation can be improved, such as typos, dead links, and any missing, unclear, or inaccurate content... We'll be happy to make the changes or help you contribute if you're interested!
Before writing code, we strongly advise you to search through the existing PRs or issues to make sure that nobody is already working on the same thing. If you are unsure, it is always a good idea to open an issue to get some feedback.
You will need basic git
proficiency to be able to contribute to
TRL. git
is not the easiest tool to use but it has the greatest
manual. Type git --help
in a shell and enjoy. If you prefer books, Pro
Git is a very good reference.
Follow these steps to start contributing:
-
Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
-
Clone your fork to your local disk, and add the base repository as a remote. The following command assumes you have your public SSH key uploaded to GitHub. See the following guide for more information.
$ git clone [email protected]:<your Github handle>/trl.git $ cd trl $ git remote add upstream https://github.com/huggingface/trl.git
-
Create a new branch to hold your development changes, and do this for every new PR you work on.
Start by synchronizing your
main
branch with theupstream/main
branch (more details in the GitHub Docs):$ git checkout main $ git fetch upstream $ git merge upstream/main
Once your
main
branch is synchronized, create a new branch from it:$ git checkout -b a-descriptive-name-for-my-changes
Do not work on the
main
branch. -
Set up a development environment by running the following command in a conda or a virtual environment you've created for working on this library:
$ make dev
(If TRL was already installed in the virtual environment, remove it with
pip uninstall trl
before reinstalling it.)Alternatively, if you are using Visual Studio Code, the fastest way to get set up is by using the provided Dev Container. Documentation on how to get started with dev containers is available here.
-
Develop the features on your branch.
As you work on the features, you should make sure that the test suite passes. You should run the tests impacted by your changes like this (see below an explanation regarding the environment variable):
$ pytest tests/<TEST_TO_RUN>.py
For the following commands leveraging the
make
utility, we recommend using the WSL system when running on Windows. More information here.You can also run the full suite with the following command.
$ make test
TRL relies on
ruff
for maintaining consistent code formatting across its source files. Before submitting any PR, you should apply automatic style corrections and run code verification checks.We provide a
precommit
target in theMakefile
that simplifies this process by running all required checks and optimizations on only the files modified by your PR.To apply these checks and corrections in one step, use:
$ make precommit
This command runs the following:
- Executes
pre-commit
hooks to automatically fix style issues withruff
and other tools. - Runs additional scripts such as adding copyright information.
If you prefer to apply the style corrections separately or review them individually, the
pre-commit
hook will handle the formatting for the files in question.Once you're happy with your changes, add changed files using
git add
and make a commit withgit commit
to record your changes locally:$ git add modified_file.py $ git commit
Please write good commit messages.
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
$ git fetch upstream $ git rebase upstream/main
Push the changes to your account using:
$ git push -u origin a-descriptive-name-for-my-changes
- Executes
-
Once you are satisfied (and the checklist below is happy too), go to the webpage of your fork on GitHub. Click on 'Pull request' to send your changes to the project maintainers for review.
-
It's ok if maintainers ask you for changes. It happens to core contributors too! To ensure everyone can review your changes in the pull request, work on your local branch and push the updates to your fork. They will automatically appear in the pull request.
- The title of your pull request should be a summary of its contribution;
- If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it);
- To indicate a work in progress please prefix the title with
[WIP]
, or mark the PR as a draft PR. These are useful to avoid duplicated work, and to differentiate it from PRs ready to be merged; - Make sure existing tests pass;
- Add high-coverage tests. No quality testing = no merge.
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the tests folder.
We use pytest
to run the tests. From the root of the
repository here's how to run tests with pytest
for the library:
$ python -m pytest -sv ./tests
That's how make test
is implemented (without the pip install
line)!
You can specify a smaller set of tests to test only the feature you're working on.
Our approach to deprecation and backward compatibility is flexible and based on the feature’s usage and impact. Each deprecation is carefully evaluated, aiming to balance innovation with user needs.
When a feature or component is marked for deprecation, its use will emit a warning message. This warning will include:
- Transition Guidance: Instructions on how to migrate to the alternative solution or replacement.
- Removal Version: The target version when the feature will be removed, providing users with a clear timeframe to transition.
Example:
warnings.warn(
"The `Trainer.foo` method is deprecated and will be removed in version 0.14.0. "
"Please use the `Trainer.bar` class instead.",
FutureWarning,
)
The deprecation and removal schedule is based on each feature's usage and impact, with examples at two extremes:
-
Experimental or Low-Use Features: For a feature that is experimental or has limited usage, backward compatibility may not be maintained between releases. Users should therefore anticipate potential breaking changes from one version to the next.
-
Widely-Used Components: For a feature with high usage, we aim for a more gradual transition period of approximately 5 months, generally scheduling deprecation around 5 minor releases after the initial warning.
These examples represent the two ends of a continuum. The specific timeline for each feature will be determined individually, balancing innovation with user stability needs.