You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your great work and for making your code available!
A question regarding padding tokens: They seem to be handled slightly differently in different parts of the code. When loading the data to run the experiments, it appears that padding token values are informed by the environment characteristics (e.g. -10 for actions in mujoco, 2 for dones, and 0 for other types of tokens). However, on the model side for action prediction, all padding tokens are zeros. We were unsure about the reason behind this difference in representing padding tokens. However, we inferred that since the attention mask reflects the position of padding tokens, that would override these slight differences ultimately.
We noticed the same pattern in other work, such as GDT that was built on top of this repository.
Could you please let us know more about your implementation of padding tokens and why they are represented differently? And do their actual values matter when their position is reflected in the attention mask?
The text was updated successfully, but these errors were encountered:
Thank you for your great work and for making your code available!
A question regarding padding tokens: They seem to be handled slightly differently in different parts of the code. When loading the data to run the experiments, it appears that padding token values are informed by the environment characteristics (e.g. -10 for actions in mujoco, 2 for dones, and 0 for other types of tokens). However, on the model side for action prediction, all padding tokens are zeros. We were unsure about the reason behind this difference in representing padding tokens. However, we inferred that since the attention mask reflects the position of padding tokens, that would override these slight differences ultimately.
We noticed the same pattern in other work, such as GDT that was built on top of this repository.
Could you please let us know more about your implementation of padding tokens and why they are represented differently? And do their actual values matter when their position is reflected in the attention mask?
The text was updated successfully, but these errors were encountered: