Add mark steps to prevent OOM in static moe op #65

jkaniecki · 2024-06-19T11:57:25Z

Adding mark steps inside static MoE op to prevent OOMs when using higher bs values

remove expert_max hard code (#47) vLLM-Ext: Full enabling of ALiBi (#34) Add version inference via setuptools-scm (#58) Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59) Remove punica_hpu.py from vllm_hpu_extension (#66) Removed previous (not-pipelined) pa implementation (#72) Add flag to enable running softmax in fp32 (#71) Update calibration readme link (#73) allow lm_head quantization in calibration process (#65) Pad to bmin if value is less (#67) Update pyproject.toml (#75) --------- Co-authored-by: Michał Kuligowski <[email protected]>

Add mark steps to prevent oom in static moe op

a78343c

szutenberg approved these changes Jun 21, 2024

View reviewed changes

szutenberg requested a review from kzawora-intel June 21, 2024 08:37

kzawora-intel approved these changes Jun 24, 2024

View reviewed changes

kzawora-intel merged commit 11f047c into HabanaAI:habana_main Jun 24, 2024

mfylcek mentioned this pull request Jan 14, 2025

Set vllm-hpu-extension to 6ac93fb #684

Merged

michalkuligowski mentioned this pull request Jan 15, 2025

Update requirements-hpu.txt #685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mark steps to prevent OOM in static moe op #65

Add mark steps to prevent OOM in static moe op #65

jkaniecki commented Jun 19, 2024

Add mark steps to prevent OOM in static moe op #65

Add mark steps to prevent OOM in static moe op #65

Conversation

jkaniecki commented Jun 19, 2024