Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic dimension added #20

Open
wants to merge 1 commit into
base: performance
Choose a base branch
from
Open

Conversation

weihanmines
Copy link

Supports four values of dynamic dimension. They are [64, 128, 192, 256].

@liligwu
Copy link
Collaborator

liligwu commented Sep 23, 2022

EmbeddingBag UTs passed.
tbe-fwd-dyn-dim_test_log.txt

Copy link

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say this is a good change to support dynamic emb-dim. A good convention is to implement some granularity in emb-dim (like 64, 128, 192, 256, 512...), and if the input dim is, e.g. 269, just choose the kernel of 512.
Another approach is to implement a emb=64 kernel with arbitrary loop along the row of such emb=64, so it can support 64, 128, 192, etc. And then when loading data inside current 64-wide row, always check if current index is out of range.

return;
case 256:
LAST_AND_STORE(256);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code size is enlarged 4x due to you expand everything in this loop, potentially will damage the I-CACHE and cost more power. Beside, emb=64 will lose the occupancy, because it will have the same register usage of emb=256. How would you gonna to support any other emb_dim not 64x?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlushuang , I would agree that your first approach is more straightforward.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weihanmines to the 2 macro functions, we may inline device function them, with or without inline depends on the performance.

liligwu added a commit that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants