[Bug] Mlc cli server gets stuck #3145

falkbene · 2025-02-26T20:56:15Z

🐛 Bug

I started a mlc_llm cli server with

mlc_llm serve --host localhost HF://mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC

on an EC2 instance. Parallel to that, I started an LM-Evaluation-Harness run with [MMLU_PRO,IFEVAL,HELLASWAG]. After 1957 requests the server and lm-eval output just stopped. I tried to figured out what causes this to happen and this is what I noticed

The moment the problem appears mlc-llm uses 100% of CPU, while during normal execution it would just appear sometimes with a much smaller percentage. When I use cached requests, and restart lm-eval after it hung up, it would just continue and get stuck after some time again.

I am on Ubuntu 24.04 and using a V100. Nevertheless, the same problem appeared on a local Linux machine with a Nvidia graphics card.

To Reproduce

Steps to reproduce the behavior:

Start mlc_llm serve
Start lm_eval run against local server
Wait

Expected behavior

Environment

Platform: CUDA
Operating system: Ubuntu
Device: EC2 V100
How you installed MLC-LLM (conda, source): tried both
How you installed TVM-Unity (pip, source): source
Python version (e.g. 3.10): 3.12.0
GPU driver version (if applicable): Nvidia-smi 535.183.01
CUDA/cuDNN version (if applicable): Cuda 12.6

The text was updated successfully, but these errors were encountered:

falkbene added the bug Confirmed bugs label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Mlc cli server gets stuck #3145

[Bug] Mlc cli server gets stuck #3145

falkbene commented Feb 26, 2025

[Bug] Mlc cli server gets stuck #3145

[Bug] Mlc cli server gets stuck #3145

Comments

falkbene commented Feb 26, 2025

🐛 Bug

To Reproduce

Expected behavior

Environment