Confusion over shape of returns_to_go in get_batch #38

DaveyBiggers · 2022-03-14T17:56:54Z

Hi, I'm trying to understand the following code in gym/experiment.py/get_batch():

rtg.append(discount_cumsum(traj['rewards'][si:], gamma=1.)[:s[-1].shape[1] + 1].reshape(1, -1, 1))
if rtg[-1].shape[1] <= s[-1].shape[1]:
    rtg[-1] = np.concatenate([rtg[-1], np.zeros((1, 1, 1))], axis=1)
...
tlen = s[-1].shape[1]

( from https://github.com/kzl/decision-transformer/blob/master/gym/experiment.py#:~:text=rtg.append(discount_cumsum,1))%5D%2C%20axis%3D1) )

As far as I can understand it, it's creating a sequence of (tlen + 1) rtg values, then checking whether the sequence length is <= tlen, and padding it with an extra value if not. (I'm struggling to see how this situation will ever arise.)
A few lines later, the padding code is applied, pre-padding with 0s to make sure everything is length max_len, except for rtg, which will now be length max_len + 1.

I don't understand the purpose of this extra value, especially since it seems to get stripped anyway by the SequenceTrainer:

state_preds, action_preds, reward_preds = self.model.forward(
    states, actions, rewards, rtg[:,:-1], timesteps, attention_mask=attention_mask,
)

Am I missing something?
Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion over shape of returns_to_go in get_batch #38

Confusion over shape of returns_to_go in get_batch #38

DaveyBiggers commented Mar 14, 2022

Confusion over shape of returns_to_go in get_batch #38

Confusion over shape of returns_to_go in get_batch #38

Comments

DaveyBiggers commented Mar 14, 2022