-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic dimension added #20
base: performance
Are you sure you want to change the base?
Conversation
EmbeddingBag UTs passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not say this is a good change to support dynamic emb-dim. A good convention is to implement some granularity in emb-dim (like 64, 128, 192, 256, 512...), and if the input dim is, e.g. 269, just choose the kernel of 512.
Another approach is to implement a emb=64 kernel with arbitrary loop along the row of such emb=64, so it can support 64, 128, 192, etc. And then when loading data inside current 64-wide row, always check if current index is out of range.
return; | ||
case 256: | ||
LAST_AND_STORE(256); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code size is enlarged 4x due to you expand everything in this loop, potentially will damage the I-CACHE and cost more power. Beside, emb=64 will lose the occupancy, because it will have the same register usage of emb=256. How would you gonna to support any other emb_dim not 64x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlushuang , I would agree that your first approach is more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weihanmines to the 2 macro functions, we may inline device function them, with or without inline depends on the performance.
Supports four values of dynamic dimension. They are
[64, 128, 192, 256]
.