Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2e gpu convolution test #18125

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bangtianliu
Copy link
Contributor

  • Added end-to-end (e2e) tests for the VectorDistribute pipeline.
  • Added end-to-end (e2e) tests for the GPUVectorize pipeline.
  • Updated the correctness check to be layout-based, considering that the output tensor can have different layouts.
  • Correct some hidden bugs in the existing code base like acc_type is as same as input_type, the kernel_layout remains same while it should be different, etc.

Note: The numerical behavior of the VectorDistribute pipeline is unstable. Occasionally, tests related to this pipeline may fail.

@kuhar kuhar requested a review from ScottTodd August 6, 2024 18:43
Signed-off-by: Bangtian Liu <[email protected]>
Comment on lines 325 to +333
### BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE ###

# To distinguish between CDNA(gfx9) and RDNA3(gfx11)
if(IREE_HIP_TEST_TARGET_CHIP MATCHES "^gfx9")

unset(IREE_HIP_TEST_COMPILER_FLAGS)
list(APPEND IREE_HIP_TEST_COMPILER_FLAGS
"--iree-rocm-target-chip=${IREE_HIP_TEST_TARGET_CHIP}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I made it possible for some of this logic to be handled in the BUILD.bazel file in #17843. I didn't look specifically at folders like tests/e2e/matmul though. Generally speaking, while still using Bazel and CMake for tests, we should avoid using the BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE escape hatch if possible.

Comment on lines +76 to +102
# Enumerates of the collections of compilation info that we can generate tests
# for. The values are the accepted values for the --compilation_info= flag.
@enum.unique
class CompilationInfoId(enum.Enum):
NONE = ""
LLVMGPUVectorDistributeMFMA = "LLVMGPUVectorDistributeMFMA"
LLVMGPUVectorDistributeWMMA = "LLVMGPUVectorDistributeWMMA"
LLVMGPUVectorizeCDNA = "LLVMGPUVectorizeCDNA"
LLVMGPUVectorizeRDNA = "LLVMGPUVectorizeRDNA"


# Describes how to construct compilation info for the testcase.
@dataclasses.dataclass
class CompilationInfo:
# Lowering Config
tile_sizes: typing.List[typing.List[int]]
# Translation Info
dispatch_lowering_pass_pipeline: str
software_pipeline_depth: int
mma_schedule: typing.Optional[MMASchedule]
# Compilation info
workgroup_size: typing.List[int]
subgroup_size: Optional[int] = None

# Prints the workgroup size
def workgroup_size_str(self):
return "workgroup_size = [" + ", ".join(map(str, self.workgroup_size)) + "]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these "compilation infos" part of the public compiler API? How are users expected to make use of these vectorize/vectordistribute MFMA/WMMA/CDNA/RDNA pipelines? I'm worried that this is reaching too deeply into implementation details, when test suites should be modeled more closely after how developers are realistically expected to use the tools.

I would expect:

  • program definitions to have computations in them: "multiply these two tensors/matrices with these shapes, then do this other math"
  • users/developers to enumerate the explicit devices they want to target, including which features those devices support (e.g. cpu intrinsics, gpu API extensions)
  • users/developers to provide tuning configurations per each device that give hints about branches to take during compilation, heuristics to use, etc. (maybe that's this "compilation info"?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also sketched out by get_rocm_test_compilation_infos() needing to exist at all here - we shouldn't need anything specific to a single target in a test generator like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. I would expect these e2e tests to look nearly the same as what we have done for winograd convolutions:

iree_generated_e2e_runner_test(
NAME
e2e_winograd_conv2d_cpu_f32_f32_f32_small
TEST_TYPE
conv2d
GENERATOR
"generate_e2e_conv2d_tests.py"
GENERATOR_ARGS
"--input_type=f32"
"--kernel_type=f32"
"--acc_type=f32"
"--shapes=small"
TEST_RUNNER
iree_tools_testing_e2e_iree-e2e-conv2d-test
TARGET_BACKENDS
"llvm-cpu"
DRIVERS
"local-task"
COMPILER_FLAGS
"--iree-preprocessing-pass-pipeline=builtin.module\(func.func\(iree-linalg-ext-convert-conv2d-to-winograd{replace-all-convs=true}\)\)"
LABELS
"hostonly"
"local"
TARGET_CPU_FEATURES_VARIANTS
"default"
)

You should be able to use the same test generator but choose a different target, and you can add additional compiler flags as needed with COMPILER_FLAGS. The test generator should simply create some source IR that performs a convolution, which already exists for winograd. Adding new e2e convolution tests should be able to use that same input IR, but with different compiler specifications. Setting the compilation info is the job of the specific backend codegen, and baking that into the test generator seems like duplicating logic unnecessarily.

Comment on lines +108 to +121
for (iree_hal_dim_t ic = 0; ic < c_size; ++ic) {
for (iree_hal_dim_t kh = 0; kh < kh_size; ++kh) {
for (iree_hal_dim_t kw = 0; kw < kw_size; ++kw) {
iree_hal_dim_t inp_idx = convert_to_1d_index(
c_size, h_size, w_size, n, ic, (oh * sh_size + kh * dh_size),
(ow * sw_size + kw * dw_size));
iree_hal_dim_t krnl_idx =
convert_to_1d_index(c_size, kh_size, kw_size, oc, ic, kh, kw);

acc += iree_math_f16_to_f32(input_data[inp_idx]) *
iree_math_f16_to_f32(kernel_data[krnl_idx]);
}
}
result_data[out_idx] = acc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this math in C++ scares me, especially for lower precision types. Would like to at least validate the C++ against reference python framework code then compare our own compiler/runtime/codegen systems against the C++ once we trust it

@bangtianliu
Copy link
Contributor Author

Convert this PR to draft now, I can work on this after we are clear about the design of the e2e test.

@bangtianliu bangtianliu marked this pull request as draft August 7, 2024 18:36
erman-gurses added a commit to iree-org/iree-test-suites that referenced this pull request Oct 30, 2024
Progress on #2.
This PR refactors, adds, and migrates conv2d e2e tests from the PRs
here: iree-org/iree#16849 and
iree-org/iree#18125

---------

Signed-off-by: erman-gurses <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants