[Bug] Softmax op is very slow #3132

gesanqiu · 2025-02-13T08:56:24Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

run any model by mlc_llm serve and set --enable-tracing --enable-debug arguments, for example: mlc_llm serve /workdir/Qwen2-1.5B-Instruct-mlc/ --device cuda --model-lib /workdir/Qwen2-1.5B-Instruct-mlc/qwen2-1.5b.so --port 8090 --host 0.0.0.0 --enable-tracing --enable-debug
get the Chrome Trace by curl -X POST http://127.0.0.1:8000/debug/dump_event_trace -H "Content-Type: application/json" -d '{"model": "dist/llama"}'
parse the tracing log, you will find the softmax operator takes over 65% of the total time.

embedding (12) time cost: 0.129 ms
apply logit bias (12) time cost: 0.004 ms
apply penalty (12) time cost: 0.005 ms
apply logit mask (12) time cost: 0.004 ms
update logits (12) time cost: 0.024 ms
softmax (12) time cost: 6.229 ms
renormalization by top p (12) time cost: 0.21 ms
sampling (12) time cost: 0.114 ms
detokenization (12) time cost: 0.052 ms
callback (12) time cost: 0.104 ms
decode (12) time cost: 2.51 ms

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu22.04
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) Jetson AGX Orin 64GB Develop Kit
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): source
Python version (e.g. 3.10): 3.12
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable): 12.6
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

gesanqiu added the bug Confirmed bugs label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Softmax op is very slow #3132

[Bug] Softmax op is very slow #3132

gesanqiu commented Feb 13, 2025

[Bug] Softmax op is very slow #3132

[Bug] Softmax op is very slow #3132

Comments

gesanqiu commented Feb 13, 2025

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context