[Fix issue #1956] use 64bit index in im2col3d.cl #1966

zjing14 · 2023-02-06T01:46:41Z

Full-blown fix for issue #1956.

atamazov

@zjing14 I suspect this affects performance. Can we introduce a macro, like MIOPEN_USE_64BIT_INDEX, define it as 1 when workspace size > 4 GiB or as 0 otherwise, and use it in the kernel? If you agree but do not have time, then I can do it for you.

Perhaps we shall use 2 GiB threshold because im_offset and ch_offset in the original code are of type int?

zjing14 · 2023-02-07T01:42:15Z

@atamazov Yes, it may impact performance for small problem size. Let me fix it.

atamazov · 2023-02-07T09:18:27Z

@zjing14

@atamazov Yes, it may impact performance for small problem size. Let me fix it.

[Recommendation] Use the following code in GemmBwdRest::GetSolution:

const bool use_64_bit_index =
    problem.GetConv().GetSpatialDimension() == 3
    && GetWorkspaceSize(context, problem) > 0xffffffffULL);

Then pass use_64_bit_index via Col2ImGPU() to both Col2Im3dGPU() and Col2Im2dGPU(). In the latter two, pass it to GetDataTypeKernelParams() and implement there support for generating the MIOPEN_USE_64BIT_INDEX macro. Then use the macro in MIOpenCol2Im3d.cl.

AFAICS the current implementation of 2D Col2Im does not use it, but that's not a problem; we can engage it in the future if need arise.

Hope this helps.

atamazov · 2023-02-07T09:21:13Z

@zjing14 Can you please update the PR description with:

Full-blown fix for issue #1956.

Please also set the following labels on this PR: https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_high

zjing14 · 2023-02-07T19:09:54Z

@atamazov Thanks very much for your suggestions. I will fix it ASAP.

zjing14 · 2023-02-08T02:02:38Z

@atamazov Added a switch that only enables 64bit for tensor larger than 4GB. Please review it.

junliume · 2023-02-09T04:44:35Z

@zjing14 @atamazov some other (potential) issues listed here: #1974 the original issue mentioned in #1956 is indeed fixed by this PR.

atamazov · 2023-02-09T23:52:51Z

Will review/test tomorrow

junliume

Ping @atamazov this issue and #1974 are for the same model, and #1974 might be different since 5D tensor is not supported in batchnorm OCL code

src/kernels/MIOpenCol2Im3d.cl

atamazov · 2023-02-14T20:29:31Z

Confirmed that this PR resolves issue #1956.

atamazov · 2023-02-14T20:31:37Z

@junliume Unfortunately I am unable to edit this branch directly. But I can copy it into my fork, update there, and open a new PR, if necessary due to urgency reasons.

junliume · 2023-02-15T00:13:32Z

@junliume Unfortunately I am unable to edit this branch directly. But I can copy it into my fork, update there, and open a new PR, if necessary due to urgency reasons.

If the suggested change above is the only place, I can commit it directly :)

Co-authored-by: Artem Tamazov <[email protected]>

atamazov · 2023-02-15T10:18:32Z

src/ocl/utilocl.cpp

+        std::size_t index_size = static_cast<size_t>(in_c) * out_d * out_h * out_w * wei_d * wei_w *
+                                 wei_h * sizeof(ConstData_t);
+
+        const bool use_64_bit_index = index_size > 0xffffffffULL;


[Performance]

Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,

ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

@zjing14 please comment. Thanks!

@zjing14 WRT question 1. ~~AFAICS sizeof(ConstData_t) it should be GetTypeSize(type), which matches sizeof(_FLOAT) in the kernel, right? It should be either 4 (fp32) or 2 (fp16), I think.~~

UPDATE: Please ignore point 1. We do not need to take the size of element into account because col_off is actually not an offset but an index in the array col. Index is multiplied by the sizeof(col[0]) automatically.

Question 2 remains the same.

[Performance]

Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,

ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

Yes, it should be sizeof(ConstData_t).

Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

Yes, 0x7fffffffULL (2 GiB - 1) makes sense or we change it to unsigned int.

@zjing14 Thanks for answering!

Yes, it should be sizeof(ConstData_t).

Excuse me, but I do not understand why. AFAIU the goal is to prevent overflows in the following computations in the shader:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L87-L91

and
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L111-L119
For computation of ch_offset we do not need the * sizeof(ConstData_t) multiplier at all.

Is this multiplier necessary to prevent overflow during computation of col_off?

Yes. You are right, the goal of int64_t is to prevent index overflow. We need to make sure all indices ch_offset and col_off do not overflow.

We may change indices in the kernel to unsigned int for 32bit. It may improve performance for some cases where indices > 16bit but < 32bit.

@zjing14 Thanks for feedback. And yes, I'll change to unsigned. But what about removal of * sizeof(ConstData_t) from the computation of index_size? Is it Ok to do that or I am missing something?

junliume · 2023-02-15T18:53:17Z

@zjing14 @atamazov Changing urgency level since we are in release mode now.

atamazov

🟢 This can be merged as is, provided that #1966 (comment) will be fixed later. The impact of unresolved comment is small: it affects performance for the GemmBwdRest in some relatively narrow range or FP16 problem configs (and also affects clarity/quality of the code).

* use 64bit index * add a switch that only enables 64bit index for large than 4GB * Update src/kernels/MIOpenCol2Im3d.cl Co-authored-by: Artem Tamazov <[email protected]> --------- Co-authored-by: Jun Liu <[email protected]> Co-authored-by: Artem Tamazov <[email protected]>

atamazov · 2023-04-04T20:52:53Z

@zjing14 What do you think about #1966 (review)? You can delegate the actual work to somebody else (even to me), if you do not have time, but we need your opinion prior starting the actual work.

junliume · 2023-04-07T00:52:28Z

@zjing14 What do you think about #1966 (review)? You can delegate the actual work to somebody else (even to me), if you do not have time, but we need your opinion prior starting the actual work.

Ping @zjing14 ...

@atamazov I briefly discussed with Jing on this issue and the main concern is about performance impacts.

zjing14 · 2023-04-07T00:56:50Z

@atamazov Sorry for the late reply. Yes, that will be great if someone else can take care of it. If any help is needed, just ping me.

atamazov · 2023-04-07T13:44:03Z

Okay, let's track this in #1985

atamazov · 2023-04-07T19:19:15Z

@zjing14 There is something that is not clear to me, pls look at #1966 (comment). Thanks!

use 64bit index

6e825a6

zjing14 requested a review from junliume February 6, 2023 01:46

formatting

5d67548

atamazov reviewed Feb 6, 2023

View reviewed changes

atamazov mentioned this pull request Feb 7, 2023

[tests] Regression test for #1956. [debugging] Logging improvements. #1965

Merged

junliume added bug urgency_high labels Feb 7, 2023

zjing14 changed the title ~~use 64bit index in im2col3d.cl~~ use 64bit index in im2col3d.cl (Full-blown fix for issue #1956.) Feb 7, 2023

Jing Zhang added 2 commits February 8, 2023 01:59

add a switch that only enables 64bit index for large than 4GB

9029877

formatting

7093e53

zjing14 requested a review from atamazov February 8, 2023 02:02

clean

fd03a75

junliume added the TESTING_CI_PASSED label Feb 9, 2023

junliume changed the title ~~use 64bit index in im2col3d.cl (Full-blown fix for issue #1956.)~~ [Fix issue #1956] use 64bit index in im2col3d.cl Feb 14, 2023

junliume previously approved these changes Feb 14, 2023

View reviewed changes

atamazov suggested changes Feb 14, 2023

View reviewed changes

src/kernels/MIOpenCol2Im3d.cl Outdated Show resolved Hide resolved

Update src/kernels/MIOpenCol2Im3d.cl

2cc4bf4

Co-authored-by: Artem Tamazov <[email protected]>

junliume dismissed their stale review via 2cc4bf4 February 15, 2023 06:58

atamazov suggested changes Feb 15, 2023

View reviewed changes

junliume removed the urgency_high label Feb 15, 2023

junliume added the urgency_blocker label Feb 15, 2023

junliume added this to the ROCm 5.5 milestone Feb 15, 2023

atamazov approved these changes Feb 16, 2023

View reviewed changes

junliume mentioned this pull request Feb 16, 2023

[Post-merge #1966] Code Quality enhancement #1985

Open

junliume approved these changes Feb 16, 2023

View reviewed changes

junliume merged commit 7ad167d into develop Feb 16, 2023

atamazov mentioned this pull request Feb 19, 2023

MIOpendriver Config Failure: GemmBwdRest MIOpenCol2Im3d.cl #1956

Open

junliume deleted the im2col3d_64bit_index branch April 17, 2023 14:50

atamazov mentioned this pull request Jun 5, 2023

[TensorFlow] Mysterious Multi-GPU ONLY perf regressions #2186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

zjing14 commented Feb 6, 2023 •

edited

Loading

atamazov left a comment •

edited

Loading

zjing14 commented Feb 7, 2023

atamazov commented Feb 7, 2023

atamazov commented Feb 7, 2023

zjing14 commented Feb 7, 2023

zjing14 commented Feb 8, 2023

junliume commented Feb 9, 2023

atamazov commented Feb 9, 2023

junliume left a comment

atamazov commented Feb 14, 2023

atamazov commented Feb 14, 2023

junliume commented Feb 15, 2023

atamazov Feb 15, 2023 •

edited

Loading

junliume Feb 16, 2023

atamazov Feb 16, 2023 •

edited

Loading

zjing14 Apr 7, 2023

atamazov Apr 7, 2023

zjing14 Apr 7, 2023

zjing14 Apr 7, 2023

atamazov Apr 8, 2023

junliume commented Feb 15, 2023

atamazov left a comment •

edited

Loading

atamazov commented Apr 4, 2023

junliume commented Apr 7, 2023

zjing14 commented Apr 7, 2023

atamazov commented Apr 7, 2023

atamazov commented Apr 7, 2023 •

edited

Loading

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

Conversation

zjing14 commented Feb 6, 2023 • edited Loading

atamazov left a comment • edited Loading

Choose a reason for hiding this comment

zjing14 commented Feb 7, 2023

atamazov commented Feb 7, 2023

atamazov commented Feb 7, 2023

zjing14 commented Feb 7, 2023

zjing14 commented Feb 8, 2023

junliume commented Feb 9, 2023

atamazov commented Feb 9, 2023

junliume left a comment

Choose a reason for hiding this comment

atamazov commented Feb 14, 2023

atamazov commented Feb 14, 2023

junliume commented Feb 15, 2023

atamazov Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

junliume Feb 16, 2023

Choose a reason for hiding this comment

atamazov Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

zjing14 Apr 7, 2023

Choose a reason for hiding this comment

atamazov Apr 7, 2023

Choose a reason for hiding this comment

zjing14 Apr 7, 2023

Choose a reason for hiding this comment

zjing14 Apr 7, 2023

Choose a reason for hiding this comment

atamazov Apr 8, 2023

Choose a reason for hiding this comment

junliume commented Feb 15, 2023

atamazov left a comment • edited Loading

Choose a reason for hiding this comment

atamazov commented Apr 4, 2023

junliume commented Apr 7, 2023

zjing14 commented Apr 7, 2023

atamazov commented Apr 7, 2023

atamazov commented Apr 7, 2023 • edited Loading

zjing14 commented Feb 6, 2023 •

edited

Loading

atamazov left a comment •

edited

Loading

atamazov Feb 15, 2023 •

edited

Loading

atamazov Feb 16, 2023 •

edited

Loading

atamazov left a comment •

edited

Loading

atamazov commented Apr 7, 2023 •

edited

Loading