You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can understand it, it's creating a sequence of (tlen + 1) rtg values, then checking whether the sequence length is <= tlen, and padding it with an extra value if not. (I'm struggling to see how this situation will ever arise.)
A few lines later, the padding code is applied, pre-padding with 0s to make sure everything is length max_len, except for rtg, which will now be length max_len + 1.
I don't understand the purpose of this extra value, especially since it seems to get stripped anyway by the SequenceTrainer:
Hi, I'm trying to understand the following code in gym/experiment.py/get_batch():
( from https://github.com/kzl/decision-transformer/blob/master/gym/experiment.py#:~:text=rtg.append(discount_cumsum,1))%5D%2C%20axis%3D1) )
As far as I can understand it, it's creating a sequence of (tlen + 1) rtg values, then checking whether the sequence length is <= tlen, and padding it with an extra value if not. (I'm struggling to see how this situation will ever arise.)
A few lines later, the padding code is applied, pre-padding with 0s to make sure everything is length
max_len
, except for rtg, which will now be lengthmax_len + 1
.I don't understand the purpose of this extra value, especially since it seems to get stripped anyway by the SequenceTrainer:
Am I missing something?
Thanks!
The text was updated successfully, but these errors were encountered: