[LLVMGPU] Set prefetching on translation info #17744

Groverkss · 2024-06-26T14:42:41Z

This patch makes prefetch_shared_memory part of translation_info config dictionary, allowing us to control prefetching at dispatch level, instead of globally turning it on/off. Prefetching is still off by default, the flag makes KernelConfig add prefetch_shared_memory unit attribute to config dictionary.

Groverkss · 2024-06-26T16:27:56Z

@kuhar You would need to change your tuning script to explicitly put prefetching in tuning config after this patch

kuhar · 2024-06-26T16:28:32Z

cc: @RattataKing ☝️

ScottTodd · 2024-07-01T15:11:16Z

FYI this improved the SDXL benchmarks we run in this repo (on mi250 using rocm):

E2E Benchmark Time: 1330.0 ms (golden time 1661.5 ms)
Scheduled Unet Benchmark Time: 340.0 ms (golden time 450.5 ms)
Prompt Encoder Benchmark Time: 16.8 ms (golden time 17.5 ms)
VAE Decode Benchmark Time: 288.0 ms (golden time 288.5 ms)

(we set --iree-llvmgpu-enable-prefetch=true in https://github.com/iree-org/iree/blob/main/build_tools/pkgci/external_test_suite/sdxl_vae_decode_gpu_rocm_gfx90a.json)

Results were visible on presubmit here: https://github.com/iree-org/iree/actions/runs/9681801742
"Golden" times can be updated at the bottom of this file to pin down the improvements: https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci_regression_test.yml

Groverkss · 2024-07-01T17:56:10Z

FYI this improved the SDXL benchmarks we run in this repo (on mi250 using rocm):
E2E Benchmark Time: 1330.0 ms (golden time 1661.5 ms)
Scheduled Unet Benchmark Time: 340.0 ms (golden time 450.5 ms)
Prompt Encoder Benchmark Time: 16.8 ms (golden time 17.5 ms)
VAE Decode Benchmark Time: 288.0 ms (golden time 288.5 ms)
(we set --iree-llvmgpu-enable-prefetch=true in https://github.com/iree-org/iree/blob/main/build_tools/pkgci/external_test_suite/sdxl_vae_decode_gpu_rocm_gfx90a.json)

Results were visible on presubmit here: https://github.com/iree-org/iree/actions/runs/9681801742

"Golden" times can be updated at the bottom of this file to pin down the improvements: https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci_regression_test.yml

This would mean that not using prefetching on the tuned configurations is actually better. Adding a parameter for prefetching to the tuning script might be a good idea @kuhar @RattataKing

Benchmark metrics improved (when using `--iree-llvmgpu-enable-prefetch=true`), so locking in the improvements. Context: #17744 (comment) Presubmit results: https://github.com/iree-org/iree/actions/runs/9765047731/attempts/1#summary-26955236756

This patch makes prefetch_shared_memory part of translation_info config dictionary, allowing us to control prefetching at dispatch level, instead of globally turning it on/off. Prefetching is still off by default, the flag makes KernelConfig add prefetch_shared_memory unit attribute to config dictionary. Signed-off-by: Lubo Litchev <[email protected]>

Benchmark metrics improved (when using `--iree-llvmgpu-enable-prefetch=true`), so locking in the improvements. Context: iree-org#17744 (comment) Presubmit results: https://github.com/iree-org/iree/actions/runs/9765047731/attempts/1#summary-26955236756 Signed-off-by: Lubo Litchev <[email protected]>

[LLVMGPU] Set prefetching on translation info

d2e270a

Groverkss requested review from MaheshRavishankar, qedawkins and kuhar as code owners June 26, 2024 14:42

Groverkss requested review from antiagainst and raikonenfnu June 26, 2024 14:43

Groverkss mentioned this pull request Jun 26, 2024

[LLVMGPU] VectorDistribution pipeline for attention #17716

Closed

qedawkins approved these changes Jun 26, 2024

View reviewed changes

Groverkss merged commit 9da0309 into iree-org:main Jun 26, 2024
51 checks passed

ScottTodd mentioned this pull request Jul 2, 2024

Sync SDXL benchmark metrics with latest values. #17795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPU] Set prefetching on translation info #17744

[LLVMGPU] Set prefetching on translation info #17744

Groverkss commented Jun 26, 2024

Groverkss commented Jun 26, 2024

kuhar commented Jun 26, 2024

ScottTodd commented Jul 1, 2024

Groverkss commented Jul 1, 2024

[LLVMGPU] Set prefetching on translation info #17744

[LLVMGPU] Set prefetching on translation info #17744

Conversation

Groverkss commented Jun 26, 2024

Groverkss commented Jun 26, 2024

kuhar commented Jun 26, 2024

ScottTodd commented Jul 1, 2024

Groverkss commented Jul 1, 2024