Research code refered to the paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch.
We work on a Sim2Real/Sim2Sim Interface to resolve dynamics mismatch for find and touch tasks. Specifically, we are using the GARAT framework to learn action transformation policy (ATP) by an imitation learning method (TRPOGAIFO) based on target environment samples, and then updating target policy to be deployed at target environment through RL algorithms (DDPG + HER).
- master: all of our changes since OPOLO
- temp: temporary branch with miscellaneous updates that are not consistent with the current git log, but may show clear our immediate changes
- Example: run on the FetchReach-v1 task
- Obtain a source policy from running DDPG+HER in FetchReach-v1,
opolo-baselines/run/test.zip
in our case - Collect demonstrated trajectories by function
generate_target_traj(rollout_policy_path, env, save_path, n_episodes, n_transitions, seed)
- Function may be called and executed in training an ATP
- One can reduce the dimensionality of samples used
- Train an ATP by
opolo-baselines/run/train_agent_custom.py
- Update target policy to be deployed at target environment by
opolo-baselines/simulation_grounding/train_target_policy.py
python opolo-baselines/simulation_grounding/plot_state_distributions.py
Results can be found at: opolo-baselines/atp_plots/
-
Please contact or read our final report for more details
-
For Windows users, WSL2 is recommended for the purpose of using the
mujoco
library -
One can tune hyperparameters for the used grounding algorithm at:
opolo-baselines/hyperparams/
- We use a script to convert raw text to desired trajectory format of numpy dictionary, due to unstable connection with our UArm Swift robotic arm.
python opolo-baselines/sim_2_real/data_processing.py
- ATPs can be found at:
opolo-baselines\run\test\logs\trpo-gaifo\trpogaifo\FetchReach-v1
where rank0
is of full-length samples and gamma = 0.95, rank1
is of reduced-length samples and gamma = 0.95, and rank1
is of reduced-length samples and gamma = 0.1