Skip to content

Latest commit

 

History

History
55 lines (32 loc) · 2.53 KB

README.md

File metadata and controls

55 lines (32 loc) · 2.53 KB

GARAT: Generative Adversarial Reinforced Action Transformation

Research code refered to the paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch.

Project Scope:

We work on a Sim2Real/Sim2Sim Interface to resolve dynamics mismatch for find and touch tasks. Specifically, we are using the GARAT framework to learn action transformation policy (ATP) by an imitation learning method (TRPOGAIFO) based on target environment samples, and then updating target policy to be deployed at target environment through RL algorithms (DDPG + HER).


Branch:

  • master: all of our changes since OPOLO
  • temp: temporary branch with miscellaneous updates that are not consistent with the current git log, but may show clear our immediate changes

Training GARAT:

  • Example: run on the FetchReach-v1 task
  • Obtain a source policy from running DDPG+HER in FetchReach-v1, opolo-baselines/run/test.zip in our case
  • Collect demonstrated trajectories by function generate_target_traj(rollout_policy_path, env, save_path, n_episodes, n_transitions, seed)
    • Function may be called and executed in training an ATP
    • One can reduce the dimensionality of samples used
  • Train an ATP by opolo-baselines/run/train_agent_custom.py
  • Update target policy to be deployed at target environment by opolo-baselines/simulation_grounding/train_target_policy.py

Evaluating ATP:

python opolo-baselines/simulation_grounding/plot_state_distributions.py

Results can be found at: opolo-baselines/atp_plots/


Reminders:

  • Please contact or read our final report for more details

  • For Windows users, WSL2 is recommended for the purpose of using the mujoco library

  • One can tune hyperparameters for the used grounding algorithm at:

opolo-baselines/hyperparams/
  • We use a script to convert raw text to desired trajectory format of numpy dictionary, due to unstable connection with our UArm Swift robotic arm.
python opolo-baselines/sim_2_real/data_processing.py
  • ATPs can be found at:
opolo-baselines\run\test\logs\trpo-gaifo\trpogaifo\FetchReach-v1

where rank0 is of full-length samples and gamma = 0.95, rank1 is of reduced-length samples and gamma = 0.95, and rank1 is of reduced-length samples and gamma = 0.1