Behavioral Cloning from Observation [Paper]

Update

2019/11/28:

Implement tensorflow 2.0 version and push to tf2.0 branch

Introduction

This is an implementation of BCO in Tensorflow on cartpole environment.

There are two phases in BCO: (1) Inverse dynamic model which experience in a self-supervised fashion. (2) Policy model which use behavioral cloning by observing the expert perform without actions and get the action by (1).

Algorithm

As method above, there are two phases in BCO. In lines 5-9, phase 1, improving the inverse dynamic model. In lines 10-12, phase 2, improving the policy model by behavioral cloning.

Training BCO

After collecting the observation of expert demonstration, you can then train BCO. The training script are in scripts directory. Let's see how to train BCO on cartople:

./scripts/run_bco_cartpole.sh
# python3 models/bco_cartpole.py --mode=train --input_file=demonstration/cartpole/cartpole.txt --model_dir=model/cartpole/

List of Args:

--input_filename   - the demonstration inputs
--mode             - train or test
--model_dir        - where to save/restore the model
--maxits           - the number of training iteration
--M                - the number of post demonstration examples
--batch_size       - number of examples in batch
--lr               - initial learning rate for adam SGD
--save_freq        - save model every save_freq iterations, 0 to disable
--print_freq       - print reward and loss every print_freq iterations, 0 to disable

Evaluation

Get your evaluation result by the testing script in scripts directory. Let's see the examples for evaluate BCO on cartpole:

./scripts/test_bco_cartpole.sh
# python3 models/bco_cartpole.py --mode=test --model_dir=model/cartpole/

Training on your own dataset and architectures

Prepare your own data

The representation expects trajectories to be in a text file in the form:

[state] [next_state]

Each line in the file represents an observation from state to next_state. The demonstration must be in a file that contains only of demonstrations of this form. See demonstration directory for examples.

Using your own architecture

First in __init__ function you could specific your own environment.

self.env = gym.make('Cartpole-v0') # which could change to any of your environment.

Later you have to implement following function which is related to your model or interact with your own environment.

implement your model.

build_policy_model builds your own policy model

build_idm_model builds your own inverse dynamic model
interact with environment.

pre_demonstration uniform sample action to generate (s_t, s_t+1) and action pairs

post_demonstration uses policy to generate (s_t, s_t+1) and action pairs

eval_rwd_policy gets reward by evaluate the policy function

Implement the above function. See bco_cartpole.py in models directory for examples.

Citation

@article{torabi2018behavioral,
  title={Behavioral Cloning from Observation},
  author={Torabi, Faraz and Warnell, Garrett and Stone, Peter},
  journal={arXiv preprint arXiv:1805.01954},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
demonstration/cartpole		demonstration/cartpole
images		images
model/cartpole		model/cartpole
models		models
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Behavioral Cloning from Observation [Paper]

Update

2019/11/28:

Introduction

Algorithm

Training BCO

Evaluation

Training on your own dataset and architectures

Prepare your own data

Using your own architecture

Citation

About

Releases

Packages

Languages

jerrylin1121/BCO

Folders and files

Latest commit

History

Repository files navigation

Behavioral Cloning from Observation [Paper]

Update

2019/11/28:

Introduction

Algorithm

Training BCO

Evaluation

Training on your own dataset and architectures

Prepare your own data

Using your own architecture

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages