-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the output of the decision transformer #27916
Comments
Hey 🤗 thanks for opening an issue! We try to keep the github issues for bugs/feature requests. Thanks! |
Thank you. I have created a post here: https://discuss.huggingface.co/t/question-about-the-output-of-the-decision-transformer/65384 |
I don't know this model at all so pinging @edbeeching the author of the PR! |
Hi @Pulsar110 , thanks for your question. It would probably be best to reach out to the authors with this question as our implementation aims to match the author's codebase: https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/decision_transformer/models/decision_transformer.py#L97 If I were to hazard a guess I would think that there is a mistake in their implementation and we should be indexing entry 0 at some point. Let us know what they say and perhaps we can update our implementation with any changes they suggest. I will close the issue for now but feel free to reopen it with more questions or if you hear back from them. |
From the code in here: https://github.com/huggingface/transformers/blob/v4.35.2/src/transformers/models/decision_transformer/modeling_decision_transformer.py#L920-L927
I'm not sure I understand why
self.predict_return(x[:, 2])
orself.predict_state(x[:, 2])
is predicting the return/next state given the state and action. From the comment on the top,x[:, 2]
is only the action? Am I missing something?And if this code is correct, what is the use of
x[:, 0]
?The text was updated successfully, but these errors were encountered: