Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing of Regularization of GCL #627

Closed
7 tasks
Tracked by #548
zhixiongzh opened this issue Mar 29, 2023 · 1 comment
Closed
7 tasks
Tracked by #548

Missing of Regularization of GCL #627

zhixiongzh opened this issue Mar 29, 2023 · 1 comment
Labels
discussion Discussion of a typical issue

Comments

@zhixiongzh
Copy link

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • [+] new feature request
  • [+] I have visited the readme and doc
  • [+] I have searched through the issue tracker and pr tracker
  • [+] I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)
>>> print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.4.6 1.10.0 3.7.11 (default, Jul 27 2021, 14:32:16) 
[GCC 7.5.0] linux

Dear Developers,

I am looking into the implementation of guided cost reward model. In the training process there is only the loss of IOC but not the regularization term g_lcr and g_mono. Do I miss that or is it just not implemented in the code?

In addition, in the paper of guided cost learning, their Loss_Ioc consider different trajectories, each of which should be a complete episode. However, in DI-ENGINE, the training data consists of time-steps sampled from tracjectories, which means the time-steps in the training data are not from a complete episode and might also have repeated time-steps duiring sample. Is this designed on purpose or any misunderstanding of the paper?

Best regards
Zhixiong

@PaParaZz1 PaParaZz1 added the discussion Discussion of a typical issue label Mar 29, 2023
@PaParaZz1
Copy link
Member

In the beginning of the implementation GCL, we discussed the problem of using the whole episodes or some fixed-length and non-overlapped trajectories. We checked the theoretical details in original paper and conducted some comparison experiments. The episode version didn't show obvious performance gain. Therefore, we add the trajectory version in DI-engine as default for simplicity.

The regularization terms in GCL are designed to help the special RL algorithm (Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics). But when we combine GCL with some latest DRL algorithms like DQN, we can omit these terms so we don't implement then in the current version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion of a typical issue
Projects
None yet
Development

No branches or pull requests

2 participants