Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensor reorder][Quality][#issue 1476] Improve naming style and CTest design #1481

Merged
merged 88 commits into from
Apr 7, 2022
Merged
Show file tree
Hide file tree
Changes from 86 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
c52547a
test_file commit
aska-0096 Jan 24, 2022
60d4564
add all files
aska-0096 Jan 26, 2022
569044f
fix some bugs and try
aska-0096 Jan 27, 2022
682a725
fix bug
aska-0096 Jan 27, 2022
7fc0de7
fix bug
aska-0096 Jan 27, 2022
b1f5c89
fix bugs
aska-0096 Jan 27, 2022
9573861
fix bugs
aska-0096 Jan 27, 2022
ca1bb57
fix bug
aska-0096 Jan 27, 2022
b0c188c
fix bugs
aska-0096 Jan 27, 2022
57dab09
fix bug
aska-0096 Jan 27, 2022
45894a7
fixbug
aska-0096 Jan 27, 2022
84863c4
fixbug
aska-0096 Jan 27, 2022
54d1f2e
test 1
aska-0096 Jan 27, 2022
e5f8617
General test, (Batched passed)
aska-0096 Jan 27, 2022
4dba45c
0321 test
aska-0096 Jan 27, 2022
c3c5303
explicit template instance
aska-0096 Jan 27, 2022
b539c9c
fix bug
aska-0096 Jan 27, 2022
b9e8684
fix bug
aska-0096 Jan 27, 2022
c766a69
move instantiation into sol.hpp
aska-0096 Jan 27, 2022
37b1926
fix bug
aska-0096 Jan 27, 2022
a36ce98
fixbug
aska-0096 Jan 27, 2022
923e4b3
fix bug
aska-0096 Jan 27, 2022
7802205
fix bug
aska-0096 Jan 27, 2022
541a1e7
fixbug
aska-0096 Jan 27, 2022
45f1a6f
fix bug
aska-0096 Jan 27, 2022
3cc7c61
fixbug
aska-0096 Jan 27, 2022
08a9c82
fixbug
aska-0096 Jan 27, 2022
3374fa6
fixbug
aska-0096 Jan 27, 2022
0dfac32
fixbug
aska-0096 Jan 27, 2022
e9ac702
batched test
aska-0096 Jan 27, 2022
879694f
test batch
aska-0096 Jan 27, 2022
183e728
test
aska-0096 Jan 27, 2022
978f8e9
add kernel
aska-0096 Jan 27, 2022
ff5e47e
fixbugs
aska-0096 Jan 27, 2022
7845771
fixtypo
aska-0096 Jan 27, 2022
16dfe07
fixtypo
aska-0096 Jan 27, 2022
9ef53c0
addkerneltest
aska-0096 Jan 27, 2022
aa6a09d
try separated solution
aska-0096 Jan 27, 2022
108b80c
fixbug
aska-0096 Jan 27, 2022
47d4b3d
fix bug
aska-0096 Jan 27, 2022
74a7545
elimate some warnings
aska-0096 Jan 27, 2022
86c21af
fix some warnings
aska-0096 Jan 27, 2022
096c661
fix some warnings
aska-0096 Jan 27, 2022
32f21b0
fork should not call CI
aska-0096 Jan 28, 2022
eff9686
push & pull test on forked repo
aska-0096 Jan 28, 2022
c81cb86
try
aska-0096 Jan 28, 2022
6b9f145
try
aska-0096 Jan 28, 2022
ff7a441
fix typo
aska-0096 Jan 28, 2022
5d223eb
debug
aska-0096 Jan 28, 2022
acd2877
add debug points
aska-0096 Jan 28, 2022
3453bba
add checkpoints
aska-0096 Jan 28, 2022
0651536
add check point
aska-0096 Jan 28, 2022
a08f7ce
fixbugs
aska-0096 Jan 28, 2022
3196935
fixbug try
aska-0096 Jan 28, 2022
e1244fd
debug
aska-0096 Jan 29, 2022
8652207
cmake debug
aska-0096 Jan 29, 2022
6e98bfb
Before warning fixed
aska-0096 Feb 8, 2022
ad66328
Merge pull request #1411 from aska-0096/tensor_reorder
aska-0096 Feb 8, 2022
d97fc28
test all cases
aska-0096 Feb 8, 2022
1fe9254
Merge pull request #1417 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
a3aab19
local analyze passed
aska-0096 Feb 10, 2022
35aa269
fix typo
aska-0096 Feb 10, 2022
1072ac1
Merge pull request #1418 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
2870b32
fix typo
aska-0096 Feb 10, 2022
b6aa19b
Merge branch 'ROCmSoftwarePlatform:tensor_reorder' into tensor_reorder
aska-0096 Feb 10, 2022
1080341
Merge pull request #1420 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
24d8916
fix bug in order.hpp
aska-0096 Feb 10, 2022
0997915
Merge pull request #1421 from aska-0096/tensor_reorder
aska-0096 Feb 10, 2022
8d6f995
fix bug in order.hpp to satisfy cxx11
aska-0096 Feb 10, 2022
41f2f35
Merge branch 'tensor_reorder' of https://github.com/aska-0096/MIOpen …
aska-0096 Feb 10, 2022
03e5c48
Merge pull request #1422 from aska-0096/tensor_reorder
aska-0096 Feb 11, 2022
04f48d6
fix format: add a new line
aska-0096 Feb 11, 2022
e42f13f
[skip ci] Update: add double data type suppport.
aska-0096 Feb 14, 2022
d0198e2
Merge branch 'tensor_reorder' of https://github.com/ROCmSoftwarePlatf…
aska-0096 Feb 14, 2022
a5099b0
Update: add explanation comments on specific order.
aska-0096 Feb 14, 2022
2c44205
Respond to reivew suggestions
aska-0096 Mar 25, 2022
c223c94
Merge remote-tracking branch 'origin/develop' into tensor_reorder
aska-0096 Mar 25, 2022
209cd1c
Delete .gitignore
aska-0096 Mar 26, 2022
4dd545e
Delete settings.json
aska-0096 Mar 26, 2022
ffb5a10
clang-format check
aska-0096 Mar 26, 2022
d124707
post review
aska-0096 Mar 26, 2022
a2c3202
update on ctest
aska-0096 Mar 27, 2022
edfbbba
resolve M/D conflict
aska-0096 Mar 27, 2022
21c3ec5
Merge branch 'tensor_reorder' of https://github.com/ROCmSoftwarePlatf…
aska-0096 Mar 27, 2022
5fb50b7
re-clang format check
aska-0096 Mar 27, 2022
5cc064c
fix opencl tidy
aska-0096 Mar 28, 2022
f5411ca
bug fix
aska-0096 Mar 29, 2022
3210867
header file fix
aska-0096 Mar 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions src/conv/invokers/impl_gemm_dynamic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -569,9 +569,9 @@ InvokerFactory MakeImplGemmDynamicForwardXdlopsNHWCInvokerFactory(
if(!trans_output_skippable)
opArgsTrans.emplace_back(trans_output.GetKernelArg());

trans_input_size = trans_input_skippable ? 0 : trans_input.GetSize();
trans_weight_size = trans_weight_skippable ? 0 : trans_weight.GetSize();
trans_output_size = trans_output_skippable ? 0 : trans_output.GetSize();
trans_input_size = trans_input_skippable ? 0 : trans_input.GetOutputTensorSize();
trans_weight_size = trans_weight_skippable ? 0 : trans_weight.GetOutputTensorSize();
trans_output_size = trans_output_skippable ? 0 : trans_output.GetOutputTensorSize();

int idx = 0;
if(!trans_input_skippable)
Expand Down Expand Up @@ -887,9 +887,9 @@ InvokerFactory MakeImplGemmDynamicBackwardDataXdlopsNHWCInvokerFactory(
if(!trans_output_skippable)
opArgsTrans.emplace_back(trans_output.GetKernelArg());

trans_input_size = trans_input_skippable ? 0 : trans_input.GetSize();
trans_weight_size = trans_weight_skippable ? 0 : trans_weight.GetSize();
trans_output_size = trans_output_skippable ? 0 : trans_output.GetSize();
trans_input_size = trans_input_skippable ? 0 : trans_input.GetOutputTensorSize();
trans_weight_size = trans_weight_skippable ? 0 : trans_weight.GetOutputTensorSize();
trans_output_size = trans_output_skippable ? 0 : trans_output.GetOutputTensorSize();

int idx = 0;
if(!trans_input_skippable)
Expand Down
8 changes: 5 additions & 3 deletions src/hip/batched_transpose_sol.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ BatchedTransposeSolution::BatchedTransposeSolution(const ExecutionContext& ctx,
kernel_param_heuristic = batched_transpose::HeuristicGet(data_size, batch, height, width);
}

solver::KernelInfo BatchedTransposeSolution::GetKernel() const
solver::KernelInfo BatchedTransposeSolution::GetKernelInfo() const
{
std::size_t block_size = BATCHED_TRANSPOSE_BLOCK_SIZE;
#if BATCHED_TRANSPOSE_PERSISTENT
Expand All @@ -327,7 +327,7 @@ solver::KernelInfo BatchedTransposeSolution::GetKernel() const
kernel.l_wk.push_back(1);
kernel.l_wk.push_back(1);

MIOPEN_LOG_I2("BatchedTransposeSolution use kernel: " + kernel_name);
MIOPEN_LOG_T(kernel_name);

return kernel;
}
Expand All @@ -351,6 +351,8 @@ std::vector<OpKernelArg> BatchedTransposeSolution::GetKernelArg() const
opArgs.emplace_back(0); // placeholder
opArgs.emplace_back(height);
opArgs.emplace_back(width);
if(grid_size != static_cast<uint32_t>(grid_size))
MIOPEN_THROW("Variable grid size can't be casted to uint32_t safely");
opArgs.emplace_back(static_cast<uint32_t>(grid_size));
opArgs.emplace_back(dim_total);
opArgs.emplace_back(magic_h.magic);
Expand All @@ -374,7 +376,7 @@ bool BatchedTransposeSolution::IsSkippable() const
return height == 1 || width == 1;
}

size_t BatchedTransposeSolution::GetSize() const
size_t BatchedTransposeSolution::GetOutputTensorSize() const
{
return miopen::GetTypeSize(data_type) * batch * height * width;
}
Expand Down
65 changes: 33 additions & 32 deletions src/hip/general_tensor_reorder_sol.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
namespace miopen {
namespace tensor_reorder {

static inline std::string GetNameTrait(std::size_t type_size)
static inline std::string GetKernelNameType(std::size_t type_size)
{
if(type_size == 1)
return "byte";
Expand All @@ -59,47 +59,45 @@ static inline std::string GetKernelName(std::size_t data_size,
uint32_t order_3,
const GeneralReorderParam* kparam)
{
if(kparam == nullptr)
MIOPEN_THROW("Memory access fault, kparam is a nullptr");
std::ostringstream kernel_name;
std::string type_trait = GetNameTrait(data_size);
kernel_name << "general_4d_reorder_" << kparam->tile_x << "x" << kparam->tile_y << "_";
if(!(kparam->pack_x == 1 && kparam->pack_y == 1 && kparam->ediv_x == 1 && kparam->ediv_y == 1))
{
kernel_name << "pack_" << kparam->pack_x << "x" << kparam->pack_y << "_ediv_"
<< kparam->ediv_x << "x" << kparam->ediv_y << "_";
}
kernel_name << type_trait << "_r" << order_0 << order_1 << order_2 << order_3;
kernel_name << GetKernelNameType(data_size) << "_r" << order_0 << order_1 << order_2 << order_3;
return kernel_name.str();
}

static inline GeneralReorderParam
HeuristicGet(std::size_t data_size, uint32_t dim_0, uint32_t dim_1, uint32_t dim_2, uint32_t dim_3)
{
/*
* TODO:
* Design a algorithm to determine general tensor reorder tile size.
*/
///\todo Design a algorithm to determine general tensor reorder tile size.
GeneralReorderParam default_kernel;
if(data_size <= 8 && dim_0 >= 1 && dim_1 >= 1 && dim_2 >= 1 && dim_3 >= 1)
{
if(dim_3 >= 16)
{
return GeneralReorderParam{16, 256, 1, 1, 1, 1};
return GeneralReorderParam{16, TENSOR_REORDER_BLOCK_SIZE, 1, 1, 1, 1};
}
else if(dim_3 >= 8)
{
return GeneralReorderParam{8, 256, 1, 1, 1, 1};
return GeneralReorderParam{8, TENSOR_REORDER_BLOCK_SIZE, 1, 1, 1, 1};
}
else if(dim_3 >= 4)
{
return GeneralReorderParam{4, 256, 1, 1, 1, 1};
return GeneralReorderParam{4, TENSOR_REORDER_BLOCK_SIZE, 1, 1, 1, 1};
}
else if(dim_3 >= 2)
{
return GeneralReorderParam{2, 256, 1, 1, 1, 1};
return GeneralReorderParam{2, TENSOR_REORDER_BLOCK_SIZE, 1, 1, 1, 1};
}
else
{
return GeneralReorderParam{1, 256, 1, 1, 1, 1};
return GeneralReorderParam{1, TENSOR_REORDER_BLOCK_SIZE, 1, 1, 1, 1};
}
}
else
Expand All @@ -109,16 +107,15 @@ HeuristicGet(std::size_t data_size, uint32_t dim_0, uint32_t dim_1, uint32_t dim
}

} // namespace tensor_reorder
GeneralReorderSolution::GeneralReorderSolution(const ExecutionContext& ctx,
miopenDataType_t data_type_,
uint32_t dim_0_,
uint32_t dim_1_,
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_)
GenericReorderSolutionImpl::GenericReorderSolutionImpl(miopenDataType_t data_type_,
uint32_t dim_0_,
uint32_t dim_1_,
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_)
: data_type(data_type_),
dim_0(dim_0_),
dim_1(dim_1_),
Expand All @@ -131,12 +128,11 @@ GeneralReorderSolution::GeneralReorderSolution(const ExecutionContext& ctx,
{
if(data_type == miopenInt8x4)
MIOPEN_THROW("These data type are not supported");
num_cu = ctx.GetStream().GetMaxComputeUnits();
std::size_t data_size = miopen::GetTypeSize(data_type);
kernel_param_heuristic = tensor_reorder::HeuristicGet(data_size, dim_0, dim_1, dim_2, dim_3);
}

solver::KernelInfo GeneralReorderSolution::GetKernel() const
solver::KernelInfo GenericReorderSolutionImpl::GetKernelInfo() const
{
std::size_t block_size = TENSOR_REORDER_BLOCK_SIZE;
uint32_t pixel_total = dim_0 * dim_1 * dim_2 * dim_3;
Expand All @@ -157,12 +153,12 @@ solver::KernelInfo GeneralReorderSolution::GetKernel() const
kernel.l_wk.push_back(1);
kernel.l_wk.push_back(1);

MIOPEN_LOG_I2("GeneralReorderSolution use kernel: " + kernel_name);
MIOPEN_LOG_T(kernel_name);

return kernel;
}

std::vector<OpKernelArg> GeneralReorderSolution::GetKernelArg() const
std::vector<OpKernelArg> GenericReorderSolutionImpl::GetKernelArg() const
{
std::size_t block_size = TENSOR_REORDER_BLOCK_SIZE;
uint32_t pixel_total = dim_0 * dim_1 * dim_2 * dim_3;
Expand All @@ -181,6 +177,8 @@ std::vector<OpKernelArg> GeneralReorderSolution::GetKernelArg() const
opArgs.emplace_back(dim_1);
opArgs.emplace_back(dim_2);
opArgs.emplace_back(dim_3);
if(grid_size != static_cast<uint32_t>(grid_size))
MIOPEN_THROW("Variable grid size can't be casted to uint32_t safely");
opArgs.emplace_back(static_cast<uint32_t>(grid_size));
opArgs.emplace_back(dim_total);
opArgs.emplace_back(magic_stride0.magic);
Expand All @@ -193,20 +191,23 @@ std::vector<OpKernelArg> GeneralReorderSolution::GetKernelArg() const
return opArgs;
}

std::string GeneralReorderSolution::GetKernelName() const
std::string GenericReorderSolutionImpl::GetKernelName() const
{
std::size_t data_size = miopen::GetTypeSize(data_type);
return tensor_reorder::GetKernelName(
data_size, order_0, order_1, order_2, order_3, &kernel_param_heuristic);
return tensor_reorder::GetKernelName(miopen::GetTypeSize(data_type),
order_0,
order_1,
order_2,
order_3,
&kernel_param_heuristic);
}

bool GeneralReorderSolution::IsSkippable() const
bool GenericReorderSolutionImpl::IsSkippable() const
{
// Disable the IsSkippable funciton
return dim_0 == 0 || dim_1 == 0 || dim_2 == 0 || dim_3 == 0;
}

size_t GeneralReorderSolution::GetSize() const
size_t GenericReorderSolutionImpl::GetOutputTensorSize() const
{
return miopen::GetTypeSize(data_type) * dim_0 * dim_1 * dim_2 * dim_3;
}
Expand Down
4 changes: 2 additions & 2 deletions src/include/miopen/batched_transpose_sol.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@ struct BatchedTransposeSolution
uint32_t batch_,
uint32_t height_,
uint32_t width_);
solver::KernelInfo GetKernel() const;
solver::KernelInfo GetKernelInfo() const;
std::vector<OpKernelArg> GetKernelArg() const;
std::string GetKernelName() const;
bool IsSkippable() const;
size_t GetSize() const;
size_t GetOutputTensorSize() const;

miopenDataType_t data_type;
uint32_t batch;
Expand Down
28 changes: 13 additions & 15 deletions src/include/miopen/general_tensor_reorder_sol.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@
#ifndef GUARD_GENERAL_MIOPEN_TENSOR_REORDER_SOL_HPP
#define GUARD_GENERAL_MIOPEN_TENSOR_REORDER_SOL_HPP

#include <miopen/miopen.h>
#include <miopen/kernel_info.hpp>
#include <miopen/op_kernel_args.hpp>
#include <miopen/execution_context.hpp>
#include <cstdlib>
junliume marked this conversation as resolved.
Show resolved Hide resolved
#include <vector>

namespace miopen {
Expand All @@ -44,24 +44,23 @@ struct GeneralReorderParam
int ediv_y{0};
};

struct GeneralReorderSolution
struct GenericReorderSolutionImpl
{
GeneralReorderSolution(const ExecutionContext& ctx_,
miopenDataType_t data_type_,
uint32_t dim_0_,
uint32_t dim_1_,
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_);
GenericReorderSolutionImpl(miopenDataType_t data_type_,
uint32_t dim_0_,
uint32_t dim_1_,
uint32_t dim_2_,
uint32_t dim_3_,
uint32_t order_0_,
uint32_t order_1_,
uint32_t order_2_,
uint32_t order_3_);
// TODO batched transpose API
solver::KernelInfo GetKernel() const;
solver::KernelInfo GetKernelInfo() const;
std::vector<OpKernelArg> GetKernelArg() const;
std::string GetKernelName() const;
bool IsSkippable() const;
size_t GetSize() const;
size_t GetOutputTensorSize() const;

miopenDataType_t data_type;
uint32_t dim_0;
Expand All @@ -72,7 +71,6 @@ struct GeneralReorderSolution
uint32_t order_1;
uint32_t order_2;
uint32_t order_3;
int num_cu;

GeneralReorderParam kernel_param_heuristic;
};
Expand Down
Loading