Why the padding is different for state, action, reward? #50

CeyaoZhang · 2022-11-13T16:55:39Z

Lines 147 to 154 in c9e6ac0

    
           s[-1] = np.concatenate([np.zeros((1, max_len - tlen, state_dim)), s[-1]], axis=1) 
        
           s[-1] = (s[-1] - state_mean) / state_std 
        
           a[-1] = np.concatenate([np.ones((1, max_len - tlen, act_dim)) * -10., a[-1]], axis=1) 
        
           r[-1] = np.concatenate([np.zeros((1, max_len - tlen, 1)), r[-1]], axis=1) 
        
           d[-1] = np.concatenate([np.ones((1, max_len - tlen)) * 2, d[-1]], axis=1) 
        
           rtg[-1] = np.concatenate([np.zeros((1, max_len - tlen, 1)), rtg[-1]], axis=1) / scale 
        
           timesteps[-1] = np.concatenate([np.zeros((1, max_len - tlen)), timesteps[-1]], axis=1) 
        
           mask.append(np.concatenate([np.zeros((1, max_len - tlen)), np.ones((1, tlen))], axis=1))

It's easy to understand padding the state with np.zero(,), but why use np.ones(,)* -10 to pad the action and np.ones(,) * 2 to pad the done flag?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the padding is different for state, action, reward? #50

Why the padding is different for state, action, reward? #50

CeyaoZhang commented Nov 13, 2022

Why the padding is different for state, action, reward? #50

Why the padding is different for state, action, reward? #50

Comments

CeyaoZhang commented Nov 13, 2022