You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking into the implementation of guided cost reward model. In the training process there is only the loss of IOC but not the regularization term g_lcr and g_mono. Do I miss that or is it just not implemented in the code?
In addition, in the paper of guided cost learning, their Loss_Ioc consider different trajectories, each of which should be a complete episode. However, in DI-ENGINE, the training data consists of time-steps sampled from tracjectories, which means the time-steps in the training data are not from a complete episode and might also have repeated time-steps duiring sample. Is this designed on purpose or any misunderstanding of the paper?
Best regards
Zhixiong
The text was updated successfully, but these errors were encountered:
In the beginning of the implementation GCL, we discussed the problem of using the whole episodes or some fixed-length and non-overlapped trajectories. We checked the theoretical details in original paper and conducted some comparison experiments. The episode version didn't show obvious performance gain. Therefore, we add the trajectory version in DI-engine as default for simplicity.
The regularization terms in GCL are designed to help the special RL algorithm (Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics). But when we combine GCL with some latest DRL algorithms like DQN, we can omit these terms so we don't implement then in the current version
Dear Developers,
I am looking into the implementation of guided cost reward model. In the training process there is only the loss of IOC but not the regularization term g_lcr and g_mono. Do I miss that or is it just not implemented in the code?
In addition, in the paper of guided cost learning, their Loss_Ioc consider different trajectories, each of which should be a complete episode. However, in DI-ENGINE, the training data consists of time-steps sampled from tracjectories, which means the time-steps in the training data are not from a complete episode and might also have repeated time-steps duiring sample. Is this designed on purpose or any misunderstanding of the paper?
Best regards
Zhixiong
The text was updated successfully, but these errors were encountered: