Skip to content

Commit

Permalink
Add @hongxiayang updates to MI300X workload tuning guide (ROCm#4123)
Browse files Browse the repository at this point in the history
minor fixes to formatting

fix spelling errors

more spelling

fixes

quantization update

fix format

simplify wording in tunableops and format fix

Apply suggestions from code review

review feedback by Peter

Co-authored-by: Peter Park <[email protected]>

Apply suggestions from code review

addressing feedback

Co-authored-by: Peter Park <[email protected]>

Apply suggestions from code review

feedback again

Co-authored-by: Peter Park <[email protected]>

add hipblaslt yaml file figure

feedback and minor formatting

formatting

update wordlist.txt

remove outdated sentence regarding fsdp and rccl

(cherry picked from commit 87fa9fd)

update wordlist

Co-authored-by: hongxyan <[email protected]>
(cherry picked from commit b0722b3)
  • Loading branch information
peterjunpark committed Dec 6, 2024
1 parent 7b57247 commit 9f6757b
Show file tree
Hide file tree
Showing 5 changed files with 617 additions and 298 deletions.
12 changes: 12 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ HWS
Haswell
Higgs
Hyperparameters
Huggingface
ICD
ICV
IDE
Expand Down Expand Up @@ -381,6 +382,7 @@ TCR
TF
TFLOPS
TP
TPS
TPU
TPUs
TSME
Expand Down Expand Up @@ -457,10 +459,12 @@ api
atmi
atomics
autogenerated
autotune
avx
awk
backend
backends
benchmarked
benchmarking
bfloat
bilinear
Expand Down Expand Up @@ -530,6 +534,7 @@ disambiguates
distro
distros
dkms
dtype
el
embeddings
enablement
Expand Down Expand Up @@ -562,6 +567,7 @@ heterogenous
hipBLAS
hipBLASLt
hipBLASLt's
hipblaslt
hipCUB
hipFFT
hipLIB
Expand Down Expand Up @@ -605,7 +611,9 @@ ipo
jax
kdb
kfd
kv
latencies
len
libfabric
libjpeg
libs
Expand All @@ -631,6 +639,7 @@ mutex
mvffr
namespace
namespaces
num
numref
ocl
opencl
Expand Down Expand Up @@ -726,7 +735,9 @@ runtimes
sL
scalability
scalable
seealso
sendmsg
seqs
serializers
shader
sharding
Expand Down Expand Up @@ -767,6 +778,7 @@ txt
uarch
uncached
uncorrectable
underoptimized
unhandled
uninstallation
unmapped
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -135,11 +135,13 @@ Installing vLLM
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
.. seealso::

ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
See :ref:`mi300x-vllm-optimization` for performance optimization tips.

ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.

.. _fine-tuning-llms-tgi:

Expand Down
Loading

0 comments on commit 9f6757b

Please sign in to comment.