Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webgpu: Fix bank conflicts #6152

Closed
wants to merge 6 commits into from
Closed

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Feb 15, 2022

This PR fixed the shared memory bank conflicts issue on MatMulPackedProgram. Previously, it used interleaved loads, which resulted the bank conflict. This PR uses sequential accessing to resolve it.
Before:

shared[2*localId.x] = global[2*localId.x];
shared[2*localId.x + 1] = global[2*localId.x + 1];

After:

shared[localId.x] = global[localId.x];
shared[localId.x+ workGroupSizeX] = global[localId.x+ workGroupSizeX];

With this change, Conv2DDerInputMMProgram/Conv2DMMProgram/MatMulPackedProgram get about >20% improvement. For example, the total time of Conv2DBackpropInput in hand_detector becomes 7.46ms from 11.26ms. The total time Conv2D in AutoML Image becomes 4.05ms from 5.61ms.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@qjia7 qjia7 force-pushed the fix_bank_conflicts branch from 2159f25 to c685ff5 Compare February 15, 2022 08:23
Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhcao @axinging @haoyunfeix @gyagp Please take a look, thanks.

@@ -31,7 +31,7 @@ ENV.registerFlag('WEBGPU_CPU_FORWARD', () => true);
/**
* Thread register block size for matmul kernel.
*/
ENV.registerFlag('WEBGPU_MATMUL_WORK_PER_THREAD', () => 4);
ENV.registerFlag('WEBGPU_MATMUL_WORK_PER_THREAD', () => 2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the workgroup size from (8, 8, 1) to (16, 16, 1) to better align with bank size 16. However, due to the maximum shared memory size limitation, I have to change the work per thread size to 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this part of changes to make sure this PR only focuses on fixing bank conflicts. Please take another look, thanks.

@qjia7 qjia7 force-pushed the fix_bank_conflicts branch from c685ff5 to 0b35bec Compare February 24, 2022 02:15
@qjia7
Copy link
Contributor Author

qjia7 commented Oct 10, 2022

close it by #6862

@qjia7 qjia7 closed this Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant