-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] It's not clear how to call an advantage module with batched envs and pixel observations. #1522
Comments
Good point there |
@vmoens in some cases the env data may have an arbitrary batch size (*B) before the time dimension. Is the current approach, before we land smth like pytorch/tensordict#525, to try to flatten all these dims into one making sure to add terminations when doing so? |
I don't think so, as I said in my answer the proper approach should be to vmap over the leading dims up to the time dim. Wdyt? |
Somehow In the PPO example, the advantage module is called on the rollout batch shape Line 103 in 147de71
Line 341 in 147de71
I also managed to reproduce this with the I'm sending more details to compare the settings. |
Okay, so the rl/torchrl/modules/models/models.py Line 479 in 147de71
Maybe this could be made clearer to the user so that when designing custom models they know that they have to do something similar. Otherwise, |
@skandermoalla Looking back at this comment, I wonder why |
I'm not very familiar with |
Describe the bug
When you get a tensordict rollout of shape
(N_envs, N_steps, C, H, W)
out of a collector and you want to apply an advantage module that starts withconv2d
layers:conv2d
layer complaining about the input size e.g.RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [2, 128, 4, 84, 84]
rollout.reshape(-1)
so that it has shape[B, C, H, W]
and then calling the advantage module will run but issue the warningtorchrl/objectives/value/advantages.py:99: UserWarning: Got a tensordict without a time-marked dimension, assuming time is along the last dimension.
leaving you unsure of wether the advantages were computed correctly.So it's not clear how one should proceed.
The text was updated successfully, but these errors were encountered: