Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix issue #1956] use 64bit index in im2col3d.cl #1966

Merged
merged 6 commits into from
Feb 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion src/kernels/MIOpenCol2Im3d.cl
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ __kernel void Col2Im3d(global _FLOAT* col,
const int height,
const int width,
global _FLOAT* im,
const int im_offset)
const unsigned long im_offset)
{
global _FLOAT* im_off = im + im_offset;
int gid = (int)get_global_id(0);
Expand Down Expand Up @@ -84,7 +84,12 @@ __kernel void Col2Im3d(global _FLOAT* col,
: (im_w - (dilation_w * (wei_w - 1) + 1)) / stride_w + 1;
int end_w = min(col_w, im_w / stride_w + 1);

#if MIOPEN_USE_64BIT_INDEX
long ch_offset = (long)im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h;
#else
int ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h;
#endif

col += ch_offset;

_FLOAT_ACCUM tmp = (_FLOAT_ACCUM)0;
Expand All @@ -103,8 +108,15 @@ __kernel void Col2Im3d(global _FLOAT* col,
int y = (im_h - cy * stride_h) / dilation_h;
int x = (im_w - cx * stride_w) / dilation_w;

#if MIOPEN_USE_64BIT_INDEX
long col_off =
((((((long)z * wei_h) + y) * wei_w + x) * col_d + cz) * col_h + cy) *
col_w +
cx;
#else
int col_off =
(((((z * wei_h) + y) * wei_w + x) * col_d + cz) * col_h + cy) * col_w + cx;
#endif

tmp += CVT_FLOAT2ACCUM(col[col_off]);
}
Expand Down
9 changes: 8 additions & 1 deletion src/ocl/utilocl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ float Col2Im3dGPU(const Handle& handle,
const int in_h,
const int in_w,
Data_t im,
int im_offset,
std::size_t im_offset,
miopenDataType_t type)
{
std::string program_name = "MIOpenCol2Im3d.cl";
Expand Down Expand Up @@ -565,8 +565,15 @@ float Col2Im3dGPU(const Handle& handle,
}
else
{
std::size_t index_size = static_cast<size_t>(in_c) * out_d * out_h * out_w * wei_d * wei_w *
wei_h * sizeof(ConstData_t);

const bool use_64_bit_index = index_size > 0xffffffffULL;
Comment on lines +568 to +571
Copy link
Contributor

@atamazov atamazov Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance]

  1. Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,
ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

  1. Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 please comment. Thanks!

Copy link
Contributor

@atamazov atamazov Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 WRT question 1. AFAICS sizeof(ConstData_t) it should be GetTypeSize(type), which matches sizeof(_FLOAT) in the kernel, right? It should be either 4 (fp32) or 2 (fp16), I think.

UPDATE: Please ignore point 1. We do not need to take the size of element into account because col_off is actually not an offset but an index in the array col. Index is multiplied by the sizeof(col[0]) automatically.

Question 2 remains the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance]

  1. Hmm, why * sizeof(ConstData_t), which is sizeof(void*) == 8 for HIP backend? In the shader,
ch_offset = im_ch * col_d * col_w * col_h * wei_d * wei_w * wei_h

where max value of im_ch is in_c, right?

Multiplying this to 8 looks like an overkill.

Yes, it should be sizeof(ConstData_t).

  1. Another question is about 0xffffffffULL (4 GiB - 1). In the kernel, computations are performed as int, which has max value of 0x7fffffffULL (2 GiB - 1).

Yes, 0x7fffffffULL (2 GiB - 1) makes sense or we change it to unsigned int.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 Thanks for answering!

Yes, it should be sizeof(ConstData_t).

Excuse me, but I do not understand why. AFAIU the goal is to prevent overflows in the following computations in the shader:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L87-L91

and
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/7ad167d2ecdc29ccebdf965055e1b1a8b9a835a8/src/kernels/MIOpenCol2Im3d.cl#L111-L119
For computation of ch_offset we do not need the * sizeof(ConstData_t) multiplier at all.

Is this multiplier necessary to prevent overflow during computation of col_off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right, the goal of int64_t is to prevent index overflow. We need to make sure all indices ch_offset and col_off do not overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change indices in the kernel to unsigned int for 32bit. It may improve performance for some cases where indices > 16bit but < 32bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjing14 Thanks for feedback. And yes, I'll change to unsigned. But what about removal of * sizeof(ConstData_t) from the computation of index_size? Is it Ok to do that or I am missing something?


std::string params = GetDataTypeKernelParams(type);

params += use_64_bit_index ? " -DMIOPEN_USE_64BIT_INDEX=1" : " -DMIOPEN_USE_64BIT_INDEX=0";

const std::vector<size_t> vld{256, 1, 1};
size_t global_threads = static_cast<size_t>(in_c) * in_d * in_h * in_w;
const std::vector<size_t> vgd{global_threads, 1, 1};
Expand Down