-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In which part do you implement policy decoupling #12
Comments
Hi there, Thanks for your interest. Any further concern is welcome. |
I am not the author of this algorithm, but I carefully read through all code, I think the code of "policy decoupling" is below:
As paper said, it used Transformer to process input(obs), so "inputs" should be obs.
The author use HEATMAP to explain the relationship between self-attention matrix and final stragegy(Figure 6 in the paper) |
Thanks for your detailed explanation. And I pinned this issue for people who have the same confusion. |
Hello, I am very interested in your work! I have learned the code, especially the class "TransformerAggregationAgent". But I have not found where you implement the policy decoupling. The only thing I find is
q_agg = torch.mean(outputs, 1)
q = self.q_linear(q_agg)
I am confused that you calculatte the mean along the action dimension and then map the result back to the actions. Can you please explain the motivation of this part. Really look forward to your reply.
Thanks!
The text was updated successfully, but these errors were encountered: