-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webgpu: enlarge the splitted dimInner size #6755
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
let numTiles = ${splitK ? '1' : '(uniforms.dimInner - 1) / TileInner + 1'}; | ||
var kStart = ${splitK ? 'i32(globalId.z) * TileInner' : '0'}; | ||
let numTiles = ${ | ||
splitK ? `${splitedDimInner / tileInner}` : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Do we need to change it to (splitedDimInner - 1) / TileInner + 1
if splitedDimInner % tileInner
is not equal to 0
@@ -323,8 +326,9 @@ export function makeMatMulPackedSource( | |||
let globalRowStart = i32(workgroupId.y) * ${tileAOuter}; | |||
|
|||
let numTiles = ${ | |||
splitK ? '1' : '(uniforms.dimInner - 1) / TileInner + 1'}; | |||
var kStart = ${splitK ? 'i32(globalId.z) * TileInner' : '0'}; | |||
splitK ? `${splitedDimInner / tileInner}` : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to Math.ceil(splitedDimInner / tileInner)
. Thanks.
This PR enlarges the splitted dimInner size to reduce the number of calls to atomic operations.
Before
Conv2d 0.28ms input0: [5,5,2048] input1:[1,1,2048,32]
After
conv2d 0.12ms input0: [5,5,2048] input1:[1,1,2048,32]
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is