Replies: 1 comment 6 replies
-
|
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I notice that your implementation chooses to use
nn.shape[-1] // head_dimension
heads while in other implementations 8 heads are used. Is this a test-and-tell case?Beta Was this translation helpful? Give feedback.
All reactions