webgpu: Fix bank conflicts #6152

qjia7 · 2022-02-15T07:48:28Z

This PR fixed the shared memory bank conflicts issue on MatMulPackedProgram. Previously, it used interleaved loads, which resulted the bank conflict. This PR uses sequential accessing to resolve it.
Before:

shared[2*localId.x] = global[2*localId.x];
shared[2*localId.x + 1] = global[2*localId.x + 1];

After:

shared[localId.x] = global[localId.x];
shared[localId.x+ workGroupSizeX] = global[localId.x+ workGroupSizeX];

With this change, Conv2DDerInputMMProgram/Conv2DMMProgram/MatMulPackedProgram get about >20% improvement. For example, the total time of Conv2DBackpropInput in hand_detector becomes 7.46ms from 11.26ms. The total time Conv2D in AutoML Image becomes 4.05ms from 5.61ms.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

qjia7

@xhcao @axinging @haoyunfeix @gyagp Please take a look, thanks.

qjia7 · 2022-02-15T08:39:19Z

tfjs-backend-webgpu/src/flags_webgpu.ts

@@ -31,7 +31,7 @@ ENV.registerFlag('WEBGPU_CPU_FORWARD', () => true);
 /**
 * Thread register block size for matmul kernel.
 */
-ENV.registerFlag('WEBGPU_MATMUL_WORK_PER_THREAD', () => 4);
+ENV.registerFlag('WEBGPU_MATMUL_WORK_PER_THREAD', () => 2);


I changed the workgroup size from (8, 8, 1) to (16, 16, 1) to better align with bank size 16. However, due to the maximum shared memory size limitation, I have to change the work per thread size to 2.

Remove this part of changes to make sure this PR only focuses on fixing bank conflicts. Please take another look, thanks.

qjia7 · 2022-10-10T07:48:18Z

close it by #6862

qjia7 force-pushed the fix_bank_conflicts branch from 2159f25 to c685ff5 Compare February 15, 2022 08:23

qjia7 commented Feb 15, 2022

View reviewed changes

qjia7 added 6 commits February 24, 2022 09:36

Make data access continous for threads in a work group

5e64315

Coalesced write

51f424f

work group size change (8,8) -> (16,16)

d0a7924

Remove some unused variables

3d5028e

nits

3ae8cab

Remove the changes of work group size

0b35bec

qjia7 force-pushed the fix_bank_conflicts branch from c685ff5 to 0b35bec Compare February 24, 2022 02:15

qjia7 closed this Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: Fix bank conflicts #6152

webgpu: Fix bank conflicts #6152

qjia7 commented Feb 15, 2022 •

edited

Loading

qjia7 left a comment

qjia7 Feb 15, 2022

qjia7 Feb 24, 2022

qjia7 commented Oct 10, 2022

webgpu: Fix bank conflicts #6152

webgpu: Fix bank conflicts #6152

Conversation

qjia7 commented Feb 15, 2022 • edited Loading

qjia7 left a comment

Choose a reason for hiding this comment

qjia7 Feb 15, 2022

Choose a reason for hiding this comment

qjia7 Feb 24, 2022

Choose a reason for hiding this comment

qjia7 commented Oct 10, 2022

qjia7 commented Feb 15, 2022 •

edited

Loading