Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Mlc cli server gets stuck #3145

Open
falkbene opened this issue Feb 26, 2025 · 0 comments
Open

[Bug] Mlc cli server gets stuck #3145

falkbene opened this issue Feb 26, 2025 · 0 comments
Labels
bug Confirmed bugs

Comments

@falkbene
Copy link

🐛 Bug

I started a mlc_llm cli server with

mlc_llm serve --host localhost HF://mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC

on an EC2 instance. Parallel to that, I started an LM-Evaluation-Harness run with [MMLU_PRO,IFEVAL,HELLASWAG]. After 1957 requests the server and lm-eval output just stopped. I tried to figured out what causes this to happen and this is what I noticed

The moment the problem appears mlc-llm uses 100% of CPU, while during normal execution it would just appear sometimes with a much smaller percentage. When I use cached requests, and restart lm-eval after it hung up, it would just continue and get stuck after some time again.

I am on Ubuntu 24.04 and using a V100. Nevertheless, the same problem appeared on a local Linux machine with a Nvidia graphics card.

To Reproduce

Steps to reproduce the behavior:

  1. Start mlc_llm serve
  2. Start lm_eval run against local server
  3. Wait

Expected behavior

Environment

  • Platform: CUDA
  • Operating system: Ubuntu
  • Device: EC2 V100
  • How you installed MLC-LLM (conda, source): tried both
  • How you installed TVM-Unity (pip, source): source
  • Python version (e.g. 3.10): 3.12.0
  • GPU driver version (if applicable): Nvidia-smi 535.183.01
  • CUDA/cuDNN version (if applicable): Cuda 12.6
@falkbene falkbene added the bug Confirmed bugs label Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

1 participant