GitHub

Themis is an open-source testing & evaluation framework for Reinforcement Learning experiments using PyTorch. It supports many famous environments and can automatically set up the RL algorithm for continuous or discrete action spaces with minimal intervention from the user. You can specify Themis to train a reward model from human preferences using our web-based crowdsourcing platform, making it ideal for experiments with interactive nature. It is also designed with explainability in mind and offers 3 ready-to-play methods from Captum.

This repo contains the RLHF system. For our web-based croudsourcing platform visit: https://anonymous.4open.science/r/rlhf_crowdsourcing_platform-B862 (Open in new tab).

Install the following modules to use Themis

Pytorch
Captum
Gymnassium
Minigrid
Hydra
termcolor
moviepy
Matplotlib
Pandas

Supported Gym environments

MuJoCo (eg. domain=Control, env=Humanoid-v4)
Atari (eg. domain=ALE, env=Breakout-v5)
Box2d (eg. domain=Box2d, env=Humanoid-v4)
Minigrid (eg. domain=Minigrid, env=DistShift1-v0)
BabyAI (eg. domain=BabyAI, env=GoToRedBallGrey-v0)

You can manually add more environments as long as they follow Gym format.

Clip Sampling Options

Uniform Sampling (feed_type=0)
Disagreement Sampling (feed_type=1)
Entropy Sampling (feed_type=2)
K Center (feed_type=3)
K Center + Disagreement (feed_type=4)
K Center + Entropy (feed_type=5)

SAC and unsupervised pre-training

Experiments can be executed with the following scripts:

./themis_pretrain.sh 
./themis_train.sh

Edit the files accordigly to specify changes in the experiment configuration.

Learn reward

To run experiment using a learned reward model set the flag learn_reward to True. Otherwise the environment reward will be used.

Run experiments on human teachers

Be sure to change the flag human_teacher to True. The method get_labels in the file reward_model.py contains the logic to generate clips ang receive input from the user. Explore the available tools from the lib/human_interface.py.

Use explainable techniques

To use the explainable techniques currently supported set either the xplain_action or xplain_state flag to True. Refer to lib/human_interface.py if you want to add more.

Run experiments on synthetic teachers

Themis is based on BPref, so it incorporates the same logic toward the synthetic teachers. To tweak the synthetic teacher tweak the relevant parameters in config/train_themis.py:

teacher_beta: rationality constant of stochastic preference model (default: -1 for perfectly rational model)
teacher_gamma: discount factor to model myopic behavior (default: 1)
teacher_eps_mistake: probability of making a mistake (default: 0)
teacher_eps_skip: hyperparameters to control skip threshold (\in [0,1])
teacher_eps_equal: hyperparameters to control equal threshold (\in [0,1])

Synthetic teacher examples:

Oracle teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Mistake teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0.1, teacher_eps_skip=0, teacher_eps_equal=0)

Noisy teacher: (teacher_beta=1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Skip teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0.1, teacher_eps_equal=0)

Myopic teacher: (teacher_beta=-1, teacher_gamma=0.9, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Equal teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0.1)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
agent		agent
config		config
lib		lib
rlkit/envs		rlkit/envs
stable_baselines3		stable_baselines3
suplementary_material		suplementary_material
.gitignore		.gitignore
README.md		README.md
logo.png		logo.png
requirements.txt		requirements.txt
run.sh		run.sh
run_test.sh		run_test.sh
themis_pretrain.py		themis_pretrain.py
themis_train.py		themis_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install the following modules to use Themis

Supported Gym environments

Clip Sampling Options

SAC and unsupervised pre-training

Learn reward

Run experiments on human teachers

Use explainable techniques

Run experiments on synthetic teachers

About

Releases

Packages

Languages

achouliaras/Themis

Folders and files

Latest commit

History

Repository files navigation

Install the following modules to use Themis

Supported Gym environments

Clip Sampling Options

SAC and unsupervised pre-training

Learn reward

Run experiments on human teachers

Use explainable techniques

Run experiments on synthetic teachers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages