Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensor reorder] Universal tensor transform feature, a fallback of batched transpose kernel #1419

Merged
merged 75 commits into from
Mar 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
c52547a
test_file commit
aska-0096 Jan 24, 2022
60d4564
add all files
aska-0096 Jan 26, 2022
569044f
fix some bugs and try
aska-0096 Jan 27, 2022
682a725
fix bug
aska-0096 Jan 27, 2022
7fc0de7
fix bug
aska-0096 Jan 27, 2022
b1f5c89
fix bugs
aska-0096 Jan 27, 2022
9573861
fix bugs
aska-0096 Jan 27, 2022
ca1bb57
fix bug
aska-0096 Jan 27, 2022
b0c188c
fix bugs
aska-0096 Jan 27, 2022
57dab09
fix bug
aska-0096 Jan 27, 2022
45894a7
fixbug
aska-0096 Jan 27, 2022
84863c4
fixbug
aska-0096 Jan 27, 2022
54d1f2e
test 1
aska-0096 Jan 27, 2022
e5f8617
General test, (Batched passed)
aska-0096 Jan 27, 2022
4dba45c
0321 test
aska-0096 Jan 27, 2022
c3c5303
explicit template instance
aska-0096 Jan 27, 2022
b539c9c
fix bug
aska-0096 Jan 27, 2022
b9e8684
fix bug
aska-0096 Jan 27, 2022
c766a69
move instantiation into sol.hpp
aska-0096 Jan 27, 2022
37b1926
fix bug
aska-0096 Jan 27, 2022
a36ce98
fixbug
aska-0096 Jan 27, 2022
923e4b3
fix bug
aska-0096 Jan 27, 2022
7802205
fix bug
aska-0096 Jan 27, 2022
541a1e7
fixbug
aska-0096 Jan 27, 2022
45f1a6f
fix bug
aska-0096 Jan 27, 2022
3cc7c61
fixbug
aska-0096 Jan 27, 2022
08a9c82
fixbug
aska-0096 Jan 27, 2022
3374fa6
fixbug
aska-0096 Jan 27, 2022
0dfac32
fixbug
aska-0096 Jan 27, 2022
e9ac702
batched test
aska-0096 Jan 27, 2022
879694f
test batch
aska-0096 Jan 27, 2022
183e728
test
aska-0096 Jan 27, 2022
978f8e9
add kernel
aska-0096 Jan 27, 2022
ff5e47e
fixbugs
aska-0096 Jan 27, 2022
7845771
fixtypo
aska-0096 Jan 27, 2022
16dfe07
fixtypo
aska-0096 Jan 27, 2022
9ef53c0
addkerneltest
aska-0096 Jan 27, 2022
aa6a09d
try separated solution
aska-0096 Jan 27, 2022
108b80c
fixbug
aska-0096 Jan 27, 2022
47d4b3d
fix bug
aska-0096 Jan 27, 2022
74a7545
elimate some warnings
aska-0096 Jan 27, 2022
86c21af
fix some warnings
aska-0096 Jan 27, 2022
096c661
fix some warnings
aska-0096 Jan 27, 2022
32f21b0
fork should not call CI
aska-0096 Jan 28, 2022
eff9686
push & pull test on forked repo
aska-0096 Jan 28, 2022
c81cb86
try
aska-0096 Jan 28, 2022
6b9f145
try
aska-0096 Jan 28, 2022
ff7a441
fix typo
aska-0096 Jan 28, 2022
5d223eb
debug
aska-0096 Jan 28, 2022
acd2877
add debug points
aska-0096 Jan 28, 2022
3453bba
add checkpoints
aska-0096 Jan 28, 2022
0651536
add check point
aska-0096 Jan 28, 2022
a08f7ce
fixbugs
aska-0096 Jan 28, 2022
3196935
fixbug try
aska-0096 Jan 28, 2022
e1244fd
debug
aska-0096 Jan 29, 2022
8652207
cmake debug
aska-0096 Jan 29, 2022
6e98bfb
Before warning fixed
aska-0096 Feb 8, 2022
ad66328
Merge pull request #1411 from aska-0096/tensor_reorder
aska-0096 Feb 8, 2022
d97fc28
test all cases
aska-0096 Feb 8, 2022
1fe9254
Merge pull request #1417 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
a3aab19
local analyze passed
aska-0096 Feb 10, 2022
35aa269
fix typo
aska-0096 Feb 10, 2022
1072ac1
Merge pull request #1418 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
2870b32
fix typo
aska-0096 Feb 10, 2022
b6aa19b
Merge branch 'ROCmSoftwarePlatform:tensor_reorder' into tensor_reorder
aska-0096 Feb 10, 2022
1080341
Merge pull request #1420 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
24d8916
fix bug in order.hpp
aska-0096 Feb 10, 2022
0997915
Merge pull request #1421 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
8d6f995
fix bug in order.hpp to satisfy cxx11
aska-0096 Feb 10, 2022
41f2f35
Merge branch 'tensor_reorder' of https://github.com/aska-0096/MIOpen …
aska-0096 Feb 10, 2022
03e5c48
Merge pull request #1422 from aska-0096/tensor_reorder
aska-0096 Feb 11, 2022
04f48d6
fix format: add a new line
aska-0096 Feb 11, 2022
e42f13f
[skip ci] Update: add double data type suppport.
aska-0096 Feb 14, 2022
d0198e2
Merge branch 'tensor_reorder' of https://github.com/ROCmSoftwarePlatf…
aska-0096 Feb 14, 2022
a5099b0
Update: add explanation comments on specific order.
aska-0096 Feb 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -230,11 +230,15 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
file(GLOB_RECURSE GPU_REFERENCE_KERNEL_HIP "kernels/gpu_reference_kernel/*.cpp")
file(GLOB_RECURSE GPU_REFERENCE_KERNEL_ASM "kernels/gpu_reference_kernel/*.s")
file(GLOB_RECURSE GPU_BATCHED_TRANSPOSE_KERNEL_HIP "kernels/gpu_batched_transpose_kernel/*.cpp")
file(GLOB_RECURSE GPU_GENERAL_TENSOR_REORDER_KERNEL_HIP_INCLUDE "kernels/gpu_general_tensor_reorder_kernel/*.hpp")
file(GLOB_RECURSE GPU_GENERAL_TENSOR_REORDER_KERNEL_HIP_SOURCE "kernels/gpu_general_tensor_reorder_kernel/*.cpp")


set(MIOPEN_KERNEL_INCLUDES
${STATIC_COMPOSABLE_KERNEL_INCLUDE}
${COMPOSABLE_KERNEL_INCLUDE}
${COMPOSABLE_KERNEL_DYNAMIC_ASM_INCLUDE}
${GPU_GENERAL_TENSOR_REORDER_KERNEL_HIP_INCLUDE}
include/miopen/implicitgemm_params.hpp
kernels/Conv_Winograd_v13_3_12_fp16dot_stride1.inc
kernels/Conv_Winograd_v13_3_12_fp16dot_stride2_dec.inc
Expand Down Expand Up @@ -331,6 +335,7 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
${GPU_REFERENCE_KERNEL_HIP}
${GPU_REFERENCE_KERNEL_ASM}
${GPU_BATCHED_TRANSPOSE_KERNEL_HIP}
${GPU_GENERAL_TENSOR_REORDER_KERNEL_HIP_SOURCE}
kernels/detect_llvm_amdgcn_buffer_atomic_fadd_f32_float.cpp
kernels/MIOpenCheckNumerics.cl
kernels/MIOpenBatchNormActivBwdPerAct.cl
Expand Down Expand Up @@ -456,6 +461,7 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
ocl/rnn_util_ocl.cpp
hip/hip_build_utils.cpp
hip/batched_transpose_sol.cpp
hip/general_tensor_reorder_sol.cpp
pooling.cpp
ocl/fusionopconvocl.cpp
ocl/fusionopbiasbnactivocl.cpp
Expand Down
214 changes: 214 additions & 0 deletions src/hip/general_tensor_reorder_sol.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
/*******************************************************************************
*
* MIT License
*
* Copyright (c) 2020-2022 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*******************************************************************************/

#include <miopen/general_tensor_reorder_sol.hpp>
#include <miopen/tensor.hpp>
#include <miopen/magic_div.hpp>
#include <miopen/float_equal.hpp>
#include <string>
#include <vector>
#include <limits>
#include <iostream>
#include <sstream>

#define TENSOR_REORDER_BLOCK_SIZE 256

namespace miopen {
namespace tensor_reorder {

static inline std::string GetNameTrait(std::size_t type_size)
junliume marked this conversation as resolved.
Show resolved Hide resolved
{
if(type_size == 1)
return "byte";
if(type_size == 2)
return "half";
if(type_size == 4)
return "dword";
if(type_size == 8)
return "dwordx2";
MIOPEN_THROW("data type not supported");
}

static inline std::string GetKernelName(std::size_t data_size,
uint32_t order_0,
uint32_t order_1,
uint32_t order_2,
uint32_t order_3,
const GeneralReorderParam* kparam)
{
std::ostringstream kernel_name;
std::string type_trait = GetNameTrait(data_size);
junliume marked this conversation as resolved.
Show resolved Hide resolved
kernel_name << "general_4d_reorder_" << kparam->tile_x << "x" << kparam->tile_y << "_";
if(!(kparam->pack_x == 1 && kparam->pack_y == 1 && kparam->ediv_x == 1 && kparam->ediv_y == 1))
junliume marked this conversation as resolved.
Show resolved Hide resolved
{
kernel_name << "pack_" << kparam->pack_x << "x" << kparam->pack_y << "_ediv_"
<< kparam->ediv_x << "x" << kparam->ediv_y << "_";
}
kernel_name << type_trait << "_r" << order_0 << order_1 << order_2 << order_3;
return kernel_name.str();
}

static inline GeneralReorderParam
HeuristicGet(std::size_t data_size, uint32_t dim_0, uint32_t dim_1, uint32_t dim_2, uint32_t dim_3)
{
/*
* TODO:
junliume marked this conversation as resolved.
Show resolved Hide resolved
* Design a algorithm to determine general tensor reorder tile size.
*/
GeneralReorderParam default_kernel;
if(data_size <= 8 && dim_0 >= 1 && dim_1 >= 1 && dim_2 >= 1 && dim_3 >= 1)
{
if(dim_3 >= 16)
{
return GeneralReorderParam{16, 256, 1, 1, 1, 1};
junliume marked this conversation as resolved.
Show resolved Hide resolved
}
else if(dim_3 >= 8)
{
return GeneralReorderParam{8, 256, 1, 1, 1, 1};
}
else if(dim_3 >= 4)
{
return GeneralReorderParam{4, 256, 1, 1, 1, 1};
}
else if(dim_3 >= 2)
{
return GeneralReorderParam{2, 256, 1, 1, 1, 1};
}
else
{
return GeneralReorderParam{1, 256, 1, 1, 1, 1};
}
}
else
{
return default_kernel;
}
}

} // namespace tensor_reorder
GeneralReorderSolution::GeneralReorderSolution(const ExecutionContext& ctx,
junliume marked this conversation as resolved.
Show resolved Hide resolved
junliume marked this conversation as resolved.
Show resolved Hide resolved
miopenDataType_t data_type_,
uint32_t dim_0_,
uint32_t dim_1_,
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_)
: data_type(data_type_),
dim_0(dim_0_),
dim_1(dim_1_),
dim_2(dim_2_),
dim_3(dim_3_),
order_0(order_0_),
order_1(order_1_),
order_2(order_2_),
order_3(order_3_)
{
if(data_type == miopenInt8x4)
MIOPEN_THROW("These data type are not supported");
num_cu = ctx.GetStream().GetMaxComputeUnits();
std::size_t data_size = miopen::GetTypeSize(data_type);
kernel_param_heuristic = tensor_reorder::HeuristicGet(data_size, dim_0, dim_1, dim_2, dim_3);
}

solver::KernelInfo GeneralReorderSolution::GetKernel() const
{
std::size_t block_size = TENSOR_REORDER_BLOCK_SIZE;
uint32_t pixel_total = dim_0 * dim_1 * dim_2 * dim_3;
uint32_t dim_total = (pixel_total + block_size * kernel_param_heuristic.tile_x - 1) /
(block_size * kernel_param_heuristic.tile_x);
std::size_t grid_size = dim_total;

std::string kernel_name = GetKernelName();
solver::KernelInfo kernel;
kernel.kernel_file = "general_tensor_reorder.cpp";
kernel.kernel_name = kernel_name;
kernel.g_wk.clear();
kernel.g_wk.push_back(grid_size * block_size);
kernel.g_wk.push_back(1);
kernel.g_wk.push_back(1);
kernel.l_wk.clear();
kernel.l_wk.push_back(block_size);
kernel.l_wk.push_back(1);
kernel.l_wk.push_back(1);

MIOPEN_LOG_I2("GeneralReorderSolution use kernel: " + kernel_name);
junliume marked this conversation as resolved.
Show resolved Hide resolved

return kernel;
}

std::vector<OpKernelArg> GeneralReorderSolution::GetKernelArg() const
{
std::size_t block_size = TENSOR_REORDER_BLOCK_SIZE;
uint32_t pixel_total = dim_0 * dim_1 * dim_2 * dim_3;
uint32_t dim_total = (pixel_total + block_size * kernel_param_heuristic.tile_x - 1) /
(block_size * kernel_param_heuristic.tile_x);
std::size_t grid_size = dim_total;

magic_div_u32_t magic_stride0 = magic_div_u32_gen(dim_1 * dim_2 * dim_3);
magic_div_u32_t magic_stride1 = magic_div_u32_gen(dim_2 * dim_3);
magic_div_u32_t magic_stride2 = magic_div_u32_gen(dim_3);

std::vector<OpKernelArg> opArgs;
opArgs.emplace_back(0); // placeholder
opArgs.emplace_back(0); // placeholder
opArgs.emplace_back(dim_0);
opArgs.emplace_back(dim_1);
opArgs.emplace_back(dim_2);
opArgs.emplace_back(dim_3);
opArgs.emplace_back(static_cast<uint32_t>(grid_size));
opArgs.emplace_back(dim_total);
opArgs.emplace_back(magic_stride0.magic);
opArgs.emplace_back(static_cast<uint32_t>(magic_stride0.shift));
opArgs.emplace_back(magic_stride1.magic);
opArgs.emplace_back(static_cast<uint32_t>(magic_stride1.shift));
opArgs.emplace_back(magic_stride2.magic);
opArgs.emplace_back(static_cast<uint32_t>(magic_stride2.shift));
aska-0096 marked this conversation as resolved.
Show resolved Hide resolved

return opArgs;
}

std::string GeneralReorderSolution::GetKernelName() const
{
std::size_t data_size = miopen::GetTypeSize(data_type);
return tensor_reorder::GetKernelName(
data_size, order_0, order_1, order_2, order_3, &kernel_param_heuristic);
junliume marked this conversation as resolved.
Show resolved Hide resolved
}

bool GeneralReorderSolution::IsSkippable() const
{
// Disable the IsSkippable funciton
return dim_0 == 0 || dim_1 == 0 || dim_2 == 0 || dim_3 == 0;
}

size_t GeneralReorderSolution::GetSize() const
{
return miopen::GetTypeSize(data_type) * dim_0 * dim_1 * dim_2 * dim_3;
}

} // namespace miopen
80 changes: 80 additions & 0 deletions src/include/miopen/general_tensor_reorder_sol.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
/*******************************************************************************
*
* MIT License
*
* Copyright (c) 2020-2022 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*******************************************************************************/
#ifndef GUARD_GENERAL_MIOPEN_TENSOR_REORDER_SOL_HPP
#define GUARD_GENERAL_MIOPEN_TENSOR_REORDER_SOL_HPP

#include <miopen/miopen.h>
#include <miopen/kernel_info.hpp>
#include <miopen/op_kernel_args.hpp>
#include <miopen/execution_context.hpp>
#include <vector>
junliume marked this conversation as resolved.
Show resolved Hide resolved

namespace miopen {

struct GeneralReorderParam
{
int tile_x{0};
int tile_y{0};
int pack_x{0};
int pack_y{0};
int ediv_x{0};
int ediv_y{0};
};

struct GeneralReorderSolution
{
GeneralReorderSolution(const ExecutionContext& ctx_,
miopenDataType_t data_type_,
uint32_t dim_0_,
Copy link
Contributor

@atamazov atamazov Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need uint32_t here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer required.

Copy link
Collaborator Author

@aska-0096 aska-0096 Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dim_0_ is the one of the dimension of the tensor, I think 2^31 is safe to cover its range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4GiB tensors are not rare these days. IIRC tensors use 64 bits for each dimension. Can you please change this to a suitable 64-bit type (better) or MIOPEN_THROW if dim0 exceeds MAX_UINT (at the place where 64 bits are being cut to 32 bits). Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanna choose the latter solution at the moment, the reason is that 32-bit magic division used in address computation and it will take a lot of time to modify it to 64-bit while simply add a uint32_t value guarder is much easier. I'll add 64-bit type reorder as a TODO.

Copy link
Contributor

@atamazov atamazov Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it so that MakeTensorReorderAttributes returns empty std::unique_ptr<TensorReorderAttributesBase> when dim0 exceeds MAX_UINT? It's better than THROW. The semantic of this return value is "tensor transform is not applicable", which is indeed so. Then the caller can use different implementation of transform (if it exists).

Copy link
Contributor

@atamazov atamazov Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add 64-bit type reorder as a TODO.

Ok!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, I'll add this feature in next PR later this week.

uint32_t dim_1_,
junliume marked this conversation as resolved.
Show resolved Hide resolved
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_);
// TODO batched transpose API
solver::KernelInfo GetKernel() const;
std::vector<OpKernelArg> GetKernelArg() const;
std::string GetKernelName() const;
bool IsSkippable() const;
size_t GetSize() const;

miopenDataType_t data_type;
uint32_t dim_0;
uint32_t dim_1;
uint32_t dim_2;
uint32_t dim_3;
uint32_t order_0;
uint32_t order_1;
uint32_t order_2;
uint32_t order_3;
int num_cu;
junliume marked this conversation as resolved.
Show resolved Hide resolved

GeneralReorderParam kernel_param_heuristic;
};
} // namespace miopen
#endif
Loading