Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU] Set prefetching on translation info #17744

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

Groverkss
Copy link
Contributor

This patch makes prefetch_shared_memory part of translation_info config dictionary, allowing us to control prefetching at dispatch level, instead of globally turning it on/off. Prefetching is still off by default, the flag makes KernelConfig add prefetch_shared_memory unit attribute to config dictionary.

@Groverkss Groverkss merged commit 9da0309 into iree-org:main Jun 26, 2024
51 checks passed
@Groverkss
Copy link
Contributor Author

@kuhar You would need to change your tuning script to explicitly put prefetching in tuning config after this patch

@kuhar
Copy link
Member

kuhar commented Jun 26, 2024

cc: @RattataKing ☝️

@ScottTodd
Copy link
Member

FYI this improved the SDXL benchmarks we run in this repo (on mi250 using rocm):

E2E Benchmark Time: 1330.0 ms (golden time 1661.5 ms)
Scheduled Unet Benchmark Time: 340.0 ms (golden time 450.5 ms)
Prompt Encoder Benchmark Time: 16.8 ms (golden time 17.5 ms)
VAE Decode Benchmark Time: 288.0 ms (golden time 288.5 ms)

(we set --iree-llvmgpu-enable-prefetch=true in https://github.com/iree-org/iree/blob/main/build_tools/pkgci/external_test_suite/sdxl_vae_decode_gpu_rocm_gfx90a.json)

@Groverkss
Copy link
Contributor Author

FYI this improved the SDXL benchmarks we run in this repo (on mi250 using rocm):

E2E Benchmark Time: 1330.0 ms (golden time 1661.5 ms)
Scheduled Unet Benchmark Time: 340.0 ms (golden time 450.5 ms)
Prompt Encoder Benchmark Time: 16.8 ms (golden time 17.5 ms)
VAE Decode Benchmark Time: 288.0 ms (golden time 288.5 ms)

(we set --iree-llvmgpu-enable-prefetch=true in https://github.com/iree-org/iree/blob/main/build_tools/pkgci/external_test_suite/sdxl_vae_decode_gpu_rocm_gfx90a.json)

This would mean that not using prefetching on the tuned configurations is actually better. Adding a parameter for prefetching to the tuning script might be a good idea @kuhar @RattataKing

ScottTodd added a commit that referenced this pull request Jul 2, 2024
Benchmark metrics improved (when using
`--iree-llvmgpu-enable-prefetch=true`), so locking in the improvements.
Context:
#17744 (comment)

Presubmit results:
https://github.com/iree-org/iree/actions/runs/9765047731/attempts/1#summary-26955236756
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This patch makes prefetch_shared_memory part of translation_info config
dictionary, allowing us to control prefetching at dispatch level,
instead of globally turning it on/off. Prefetching is still off by
default, the flag makes KernelConfig add prefetch_shared_memory unit
attribute to config dictionary.

Signed-off-by: Lubo Litchev <[email protected]>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
Benchmark metrics improved (when using
`--iree-llvmgpu-enable-prefetch=true`), so locking in the improvements.
Context:
iree-org#17744 (comment)

Presubmit results:
https://github.com/iree-org/iree/actions/runs/9765047731/attempts/1#summary-26955236756

Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants