GARAT: Generative Adversarial Reinforced Action Transformation

Research code refered to the paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch.

Project Scope:

We work on a Sim2Real/Sim2Sim Interface to resolve dynamics mismatch for find and touch tasks. Specifically, we are using the GARAT framework to learn action transformation policy (ATP) by an imitation learning method (TRPOGAIFO) based on target environment samples, and then updating target policy to be deployed at target environment through RL algorithms (DDPG + HER).

Branch:

master: all of our changes since OPOLO
temp: temporary branch with miscellaneous updates that are not consistent with the current git log, but may show clear our immediate changes

Training GARAT:

Example: run on the FetchReach-v1 task
Obtain a source policy from running DDPG+HER in FetchReach-v1, opolo-baselines/run/test.zip in our case
Collect demonstrated trajectories by function generate_target_traj(rollout_policy_path, env, save_path, n_episodes, n_transitions, seed)
- Function may be called and executed in training an ATP
- One can reduce the dimensionality of samples used
Train an ATP by opolo-baselines/run/train_agent_custom.py
Update target policy to be deployed at target environment by opolo-baselines/simulation_grounding/train_target_policy.py

Evaluating ATP:

python opolo-baselines/simulation_grounding/plot_state_distributions.py

Results can be found at: opolo-baselines/atp_plots/

Reminders:

Please contact or read our final report for more details
For Windows users, WSL2 is recommended for the purpose of using the mujoco library
One can tune hyperparameters for the used grounding algorithm at:

opolo-baselines/hyperparams/

We use a script to convert raw text to desired trajectory format of numpy dictionary, due to unstable connection with our UArm Swift robotic arm.

python opolo-baselines/sim_2_real/data_processing.py

ATPs can be found at:

opolo-baselines\run\test\logs\trpo-gaifo\trpogaifo\FetchReach-v1

where rank0 is of full-length samples and gamma = 0.95, rank1 is of reduced-length samples and gamma = 0.95, and rank1 is of reduced-length samples and gamma = 0.1

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
opolo-baselines		opolo-baselines
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GARAT: Generative Adversarial Reinforced Action Transformation

Project Scope:

Branch:

Training GARAT:

Evaluating ATP:

Reminders:

About

Releases

Packages

Languages

kentwhf/opolo-code

Folders and files

Latest commit

History

Repository files navigation

GARAT: Generative Adversarial Reinforced Action Transformation

Project Scope:

Branch:

Training GARAT:

Evaluating ATP:

Reminders:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages