-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding RoPE implementation #10
Comments
After our discussion, we believe that it is reasonable to flatten out one-dimensional indices of RoPE to ensure the permutation-equivariant between variables. We are arranging relevant experiments, and we'll see how that affects the performance. Please stay tuned and thanks again for your insightful question. |
Thanks for your response! I am looking forward to seeing the relevant experiments :) |
Thank you again for the detailed response! Could you elaborate on how this flattened RoPE achieves permutation invariance between variables? |
This is a very interesting and important question! Note that permutation equivariant between variables means shuffling the input order of variables should not affect anything other than the output order of variables. For example, if we have |
Further, to mark the variable position of these |
Thank you for your response! I agree that using 1D RoPE and repeating it is a more reasonable approach compared to the flattened 2D RoPE. |
I got a confusion about the size of RoPE matrix. The code implementation uses an First I wanna figure one thing out: For a token If so, the angle for |
Thanks for sharing the code of this interesting work!
I noticed an inconsistency between the paper’s description of the RoPE and its implementation in the code. According to the paper, the relative position should be calculated based on the temporal differences between patches,
However, the code seems to be using flattened 2D indices instead when applying RoPE:
Could you clarify the reasoning behind this discrepancy? Was this an intentional change, or might it affect the model’s performance?
The text was updated successfully, but these errors were encountered: