Add @hongxiayang updates to MI300X workload tuning guide (ROCm#4123)

minor fixes to formatting fix spelling errors more spelling fixes quantization update fix format simplify wording in tunableops and format fix Apply suggestions from code review review feedback by Peter Co-authored-by: Peter Park <[email protected]> Apply suggestions from code review addressing feedback Co-authored-by: Peter Park <[email protected]> Apply suggestions from code review feedback again Co-authored-by: Peter Park <[email protected]> add hipblaslt yaml file figure feedback and minor formatting formatting update wordlist.txt remove outdated sentence regarding fsdp and rccl (cherry picked from commit 87fa9fd) update wordlist Co-authored-by: hongxyan <[email protected]> (cherry picked from commit b0722b3)
peterjunpark · Dec 6, 2024 · 9f6757b · 9f6757b
1 parent 7b57247
commit 9f6757b
Show file tree

Hide file tree

Showing 5 changed files with 617 additions and 298 deletions.
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -159,6 +159,7 @@ HWS
 Haswell
 Higgs
 Hyperparameters
+Huggingface
 ICD
 ICV
 IDE
@@ -381,6 +382,7 @@ TCR
 TF
 TFLOPS
 TP
+TPS
 TPU
 TPUs
 TSME
@@ -457,10 +459,12 @@ api
 atmi
 atomics
 autogenerated
+autotune
 avx
 awk
 backend
 backends
+benchmarked
 benchmarking
 bfloat
 bilinear
@@ -530,6 +534,7 @@ disambiguates
 distro
 distros
 dkms
+dtype
 el
 embeddings
 enablement
@@ -562,6 +567,7 @@ heterogenous
 hipBLAS
 hipBLASLt
 hipBLASLt's
+hipblaslt
 hipCUB
 hipFFT
 hipLIB
@@ -605,7 +611,9 @@ ipo
 jax
 kdb
 kfd
+kv
 latencies
+len
 libfabric
 libjpeg
 libs
@@ -631,6 +639,7 @@ mutex
 mvffr
 namespace
 namespaces
+num
 numref
 ocl
 opencl
@@ -726,7 +735,9 @@ runtimes
 sL
 scalability
 scalable
+seealso
 sendmsg
+seqs
 serializers
 shader
 sharding
@@ -767,6 +778,7 @@ txt
 uarch
 uncached
 uncorrectable
+underoptimized
 unhandled
 uninstallation
 unmapped

diff --git a/docs/data/how-to/tuning-guides/hipblaslt_auto_tuning_output_files.png b/docs/data/how-to/tuning-guides/hipblaslt_auto_tuning_output_files.png
diff --git a/docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png b/docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png
diff --git a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
@@ -135,11 +135,13 @@ Installing vLLM
 
             {"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
 
-Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
+.. seealso::
 
-ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
-on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
-format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
+   See :ref:`mi300x-vllm-optimization` for performance optimization tips.
+
+   ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
+   on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
+   format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
 
 .. _fine-tuning-llms-tgi: