-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2e gpu convolution test #18125
base: main
Are you sure you want to change the base?
E2e gpu convolution test #18125
Conversation
Signed-off-by: Bangtian Liu <[email protected]>
Signed-off-by: Bangtian Liu <[email protected]>
Signed-off-by: Bangtian Liu <[email protected]>
### BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE ### | ||
|
||
# To distinguish between CDNA(gfx9) and RDNA3(gfx11) | ||
if(IREE_HIP_TEST_TARGET_CHIP MATCHES "^gfx9") | ||
|
||
unset(IREE_HIP_TEST_COMPILER_FLAGS) | ||
list(APPEND IREE_HIP_TEST_COMPILER_FLAGS | ||
"--iree-rocm-target-chip=${IREE_HIP_TEST_TARGET_CHIP}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I made it possible for some of this logic to be handled in the BUILD.bazel file in #17843. I didn't look specifically at folders like tests/e2e/matmul though. Generally speaking, while still using Bazel and CMake for tests, we should avoid using the BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE
escape hatch if possible.
# Enumerates of the collections of compilation info that we can generate tests | ||
# for. The values are the accepted values for the --compilation_info= flag. | ||
@enum.unique | ||
class CompilationInfoId(enum.Enum): | ||
NONE = "" | ||
LLVMGPUVectorDistributeMFMA = "LLVMGPUVectorDistributeMFMA" | ||
LLVMGPUVectorDistributeWMMA = "LLVMGPUVectorDistributeWMMA" | ||
LLVMGPUVectorizeCDNA = "LLVMGPUVectorizeCDNA" | ||
LLVMGPUVectorizeRDNA = "LLVMGPUVectorizeRDNA" | ||
|
||
|
||
# Describes how to construct compilation info for the testcase. | ||
@dataclasses.dataclass | ||
class CompilationInfo: | ||
# Lowering Config | ||
tile_sizes: typing.List[typing.List[int]] | ||
# Translation Info | ||
dispatch_lowering_pass_pipeline: str | ||
software_pipeline_depth: int | ||
mma_schedule: typing.Optional[MMASchedule] | ||
# Compilation info | ||
workgroup_size: typing.List[int] | ||
subgroup_size: Optional[int] = None | ||
|
||
# Prints the workgroup size | ||
def workgroup_size_str(self): | ||
return "workgroup_size = [" + ", ".join(map(str, self.workgroup_size)) + "]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these "compilation infos" part of the public compiler API? How are users expected to make use of these vectorize/vectordistribute MFMA/WMMA/CDNA/RDNA pipelines? I'm worried that this is reaching too deeply into implementation details, when test suites should be modeled more closely after how developers are realistically expected to use the tools.
I would expect:
- program definitions to have computations in them: "multiply these two tensors/matrices with these shapes, then do this other math"
- users/developers to enumerate the explicit devices they want to target, including which features those devices support (e.g. cpu intrinsics, gpu API extensions)
- users/developers to provide tuning configurations per each device that give hints about branches to take during compilation, heuristics to use, etc. (maybe that's this "compilation info"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also sketched out by get_rocm_test_compilation_infos()
needing to exist at all here - we shouldn't need anything specific to a single target in a test generator like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. I would expect these e2e tests to look nearly the same as what we have done for winograd convolutions:
iree/tests/e2e/convolution/CMakeLists.txt
Lines 163 to 189 in 31bfc93
iree_generated_e2e_runner_test( | |
NAME | |
e2e_winograd_conv2d_cpu_f32_f32_f32_small | |
TEST_TYPE | |
conv2d | |
GENERATOR | |
"generate_e2e_conv2d_tests.py" | |
GENERATOR_ARGS | |
"--input_type=f32" | |
"--kernel_type=f32" | |
"--acc_type=f32" | |
"--shapes=small" | |
TEST_RUNNER | |
iree_tools_testing_e2e_iree-e2e-conv2d-test | |
TARGET_BACKENDS | |
"llvm-cpu" | |
DRIVERS | |
"local-task" | |
COMPILER_FLAGS | |
"--iree-preprocessing-pass-pipeline=builtin.module\(func.func\(iree-linalg-ext-convert-conv2d-to-winograd{replace-all-convs=true}\)\)" | |
LABELS | |
"hostonly" | |
"local" | |
TARGET_CPU_FEATURES_VARIANTS | |
"default" | |
) | |
You should be able to use the same test generator but choose a different target, and you can add additional compiler flags as needed with COMPILER_FLAGS
. The test generator should simply create some source IR that performs a convolution, which already exists for winograd. Adding new e2e convolution tests should be able to use that same input IR, but with different compiler specifications. Setting the compilation info is the job of the specific backend codegen, and baking that into the test generator seems like duplicating logic unnecessarily.
for (iree_hal_dim_t ic = 0; ic < c_size; ++ic) { | ||
for (iree_hal_dim_t kh = 0; kh < kh_size; ++kh) { | ||
for (iree_hal_dim_t kw = 0; kw < kw_size; ++kw) { | ||
iree_hal_dim_t inp_idx = convert_to_1d_index( | ||
c_size, h_size, w_size, n, ic, (oh * sh_size + kh * dh_size), | ||
(ow * sw_size + kw * dw_size)); | ||
iree_hal_dim_t krnl_idx = | ||
convert_to_1d_index(c_size, kh_size, kw_size, oc, ic, kh, kw); | ||
|
||
acc += iree_math_f16_to_f32(input_data[inp_idx]) * | ||
iree_math_f16_to_f32(kernel_data[krnl_idx]); | ||
} | ||
} | ||
result_data[out_idx] = acc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this math in C++ scares me, especially for lower precision types. Would like to at least validate the C++ against reference python framework code then compare our own compiler/runtime/codegen systems against the C++ once we trust it
Convert this PR to draft now, I can work on this after we are clear about the design of the e2e test. |
Progress on #2. This PR refactors, adds, and migrates conv2d e2e tests from the PRs here: iree-org/iree#16849 and iree-org/iree#18125 --------- Signed-off-by: erman-gurses <[email protected]>
acc_type
is as same asinput_type
, thekernel_layout
remains same while it should be different, etc.Note: The numerical behavior of the VectorDistribute pipeline is unstable. Occasionally, tests related to this pipeline may fail.