Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

Merged
merged 6 commits into from
Feb 16, 2023

Conversation

zjing14
Copy link
Contributor

@zjing14 zjing14 commented Feb 6, 2023

Full-blown fix for issue #1956.

@zjing14 zjing14 requested a review from junliume February 6, 2023 01:46
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 I suspect this affects performance. Can we introduce a macro, like MIOPEN_USE_64BIT_INDEX, define it as 1 when workspace size > 4 GiB or as 0 otherwise, and use it in the kernel? If you agree but do not have time, then I can do it for you.

Perhaps we shall use 2 GiB threshold because im_offset and ch_offset in the original code are of type int?

@zjing14
Copy link
Contributor Author

zjing14 commented Feb 7, 2023

@atamazov Yes, it may impact performance for small problem size. Let me fix it.

@atamazov
Copy link
Contributor

atamazov commented Feb 7, 2023

@zjing14

@atamazov Yes, it may impact performance for small problem size. Let me fix it.

[Recommendation] Use the following code in GemmBwdRest::GetSolution:

const bool use_64_bit_index =
    problem.GetConv().GetSpatialDimension() == 3
    && GetWorkspaceSize(context, problem) > 0xffffffffULL);

Then pass use_64_bit_index via Col2ImGPU() to both Col2Im3dGPU() and Col2Im2dGPU(). In the latter two, pass it to GetDataTypeKernelParams() and implement there support for generating the MIOPEN_USE_64BIT_INDEX macro. Then use the macro in MIOpenCol2Im3d.cl.

AFAICS the current implementation of 2D Col2Im does not use it, but that's not a problem; we can engage it in the future if need arise.

Hope this helps.

@atamazov
Copy link
Contributor

atamazov commented Feb 7, 2023

@zjing14 Can you please update the PR description with:

Full-blown fix for issue #1956.

Please also set the following labels on this PR: https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_high

@zjing14 zjing14 changed the title use 64bit index in im2col3d.cl use 64bit index in im2col3d.cl (Full-blown fix for issue #1956.) Feb 7, 2023
@zjing14
Copy link
Contributor Author

zjing14 commented Feb 7, 2023

@atamazov Thanks very much for your suggestions. I will fix it ASAP.

@zjing14
Copy link
Contributor Author

zjing14 commented Feb 8, 2023

@atamazov Added a switch that only enables 64bit for tensor larger than 4GB. Please review it.

@zjing14 zjing14 requested a review from atamazov February 8, 2023 02:02
@junliume
Copy link
Contributor

junliume commented Feb 9, 2023

@zjing14 @atamazov some other (potential) issues listed here: #1974 the original issue mentioned in #1956 is indeed fixed by this PR.

@atamazov
Copy link
Contributor

atamazov commented Feb 9, 2023

Will review/test tomorrow

@junliume junliume changed the title use 64bit index in im2col3d.cl (Full-blown fix for issue #1956.) [Fix issue #1956] use 64bit index in im2col3d.cl Feb 14, 2023
junliume
junliume previously approved these changes Feb 14, 2023
Copy link
Contributor

@junliume junliume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @atamazov this issue and #1974 are for the same model, and #1974 might be different since 5D tensor is not supported in batchnorm OCL code

src/kernels/MIOpenCol2Im3d.cl Outdated Show resolved Hide resolved
@atamazov
Copy link
Contributor

Confirmed that this PR resolves issue #1956.

@atamazov
Copy link
Contributor

@junliume Unfortunately I am unable to edit this branch directly. But I can copy it into my fork, update there, and open a new PR, if necessary due to urgency reasons.

@junliume
Copy link
Contributor

@junliume Unfortunately I am unable to edit this branch directly. But I can copy it into my fork, update there, and open a new PR, if necessary due to urgency reasons.

If the suggested change above is the only place, I can commit it directly :)

Comment on lines +568 to +571
std::size_t index_size = static_cast<size_t>(in_c) * out_d * out_h * out_w * wei_d * wei_w *
wei_h * sizeof(ConstData_t);

const bool use_64_bit_index = index_size > 0xffffffffULL;
Copy link
Contributor

@atamazov atamazov Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance]

  1. Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,
ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

  1. Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 please comment. Thanks!

Copy link
Contributor

@atamazov atamazov Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 WRT question 1. AFAICS sizeof(ConstData_t) it should be GetTypeSize(type), which matches sizeof(_FLOAT) in the kernel, right? It should be either 4 (fp32) or 2 (fp16), I think.

UPDATE: Please ignore point 1. We do not need to take the size of element into account because col_off is actually not an offset but an index in the array col. Index is multiplied by the sizeof(col[0]) automatically.

Question 2 remains the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance]

  1. Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,
ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

Yes, it should be sizeof(ConstData_t).

  1. Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

Yes, 0x7fffffffULL (2 GiB - 1) makes sense or we change it to unsigned int.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 Thanks for answering!

Yes, it should be sizeof(ConstData_t).

Excuse me, but I do not understand why. AFAIU the goal is to prevent overflows in the following computations in the shader:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L87-L91

and
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L111-L119
For computation of ch_offset we do not need the * sizeof(ConstData_t) multiplier at all.

Is this multiplier necessary to prevent overflow during computation of col_off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right, the goal of int64_t is to prevent index overflow. We need to make sure all indices ch_offset and col_off do not overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change indices in the kernel to unsigned int for 32bit. It may improve performance for some cases where indices > 16bit but < 32bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 Thanks for feedback. And yes, I'll change to unsigned. But what about removal of * sizeof(ConstData_t) from the computation of index_size? Is it Ok to do that or I am missing something?

@junliume
Copy link
Contributor

@zjing14 @atamazov Changing urgency level since we are in release mode now.

@junliume junliume added this to the ROCm 5.5 milestone Feb 15, 2023
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 This can be merged as is, provided that #1966 (comment) will be fixed later. The impact of unresolved comment is small: it affects performance for the GemmBwdRest in some relatively narrow range or FP16 problem configs (and also affects clarity/quality of the code).

@junliume junliume merged commit 7ad167d into develop Feb 16, 2023
junliume added a commit that referenced this pull request Feb 18, 2023
* use 64bit index

* add a switch that only enables 64bit index for large than 4GB

* Update src/kernels/MIOpenCol2Im3d.cl

Co-authored-by: Artem Tamazov <[email protected]>

---------

Co-authored-by: Jun Liu <[email protected]>
Co-authored-by: Artem Tamazov <[email protected]>
@atamazov
Copy link
Contributor

atamazov commented Apr 4, 2023

@zjing14 What do you think about #1966 (review)? You can delegate the actual work to somebody else (even to me), if you do not have time, but we need your opinion prior starting the actual work.

@junliume
Copy link
Contributor

junliume commented Apr 7, 2023

@zjing14 What do you think about #1966 (review)? You can delegate the actual work to somebody else (even to me), if you do not have time, but we need your opinion prior starting the actual work.

Ping @zjing14 ...

@atamazov I briefly discussed with Jing on this issue and the main concern is about performance impacts.

@zjing14
Copy link
Contributor Author

zjing14 commented Apr 7, 2023

@atamazov Sorry for the late reply. Yes, that will be great if someone else can take care of it. If any help is needed, just ping me.

@atamazov
Copy link
Contributor

atamazov commented Apr 7, 2023

Okay, let's track this in #1985

@atamazov
Copy link
Contributor

atamazov commented Apr 7, 2023

@zjing14 There is something that is not clear to me, pls look at #1966 (comment). Thanks!

@junliume junliume deleted the im2col3d_64bit_index branch April 17, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants