Skip to content

This is the code for our NeurIPS 2022 paper "DOPE: DOUBLY OPTIMISTIC AND PESSIMISTIC EXPLORATION FOR SAFE REINFORCEMENT LEARNING"

Notifications You must be signed in to change notification settings

archanabura/DOPE-DoublyOptimisticPessimisticExploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

This document contains details on how to run the experiments.


Python Dependencies:

1. PuLP:  “pip install pulp”

2. MatplotLib, Numpy, Pandas



How to Run:

The code contains subfolders, named FactoredCMDP, InventoryControl, and MediaControl. These correspond to experiments for each of those environments in the paper.

In each of these subfolders,

1. First run model*.py. This will create solution files to compute regret and generate baseline policies for OptPessLP and DOPE.

2. Run OptCMDP*.py for OptCMDP, DOPE*.py for DOPE, OptPessLP*.py for OptPessLP, alwayssafe.py for AlwaysSafe. The code will generate .pckl files which contain cumulative regret for each environment. 

3. Change the RUN_NUMBER field to run the above for a different seed. The plots in the paper are averaged over 20 runs each.

4. For plots with different baseline policies, change the C_b value in the model*.py

About

This is the code for our NeurIPS 2022 paper "DOPE: DOUBLY OPTIMISTIC AND PESSIMISTIC EXPLORATION FOR SAFE REINFORCEMENT LEARNING"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages