You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
on an EC2 instance. Parallel to that, I started an LM-Evaluation-Harness run with [MMLU_PRO,IFEVAL,HELLASWAG]. After 1957 requests the server and lm-eval output just stopped. I tried to figured out what causes this to happen and this is what I noticed
The moment the problem appears mlc-llm uses 100% of CPU, while during normal execution it would just appear sometimes with a much smaller percentage. When I use cached requests, and restart lm-eval after it hung up, it would just continue and get stuck after some time again.
I am on Ubuntu 24.04 and using a V100. Nevertheless, the same problem appeared on a local Linux machine with a Nvidia graphics card.
To Reproduce
Steps to reproduce the behavior:
Start mlc_llm serve
Start lm_eval run against local server
Wait
Expected behavior
Environment
Platform: CUDA
Operating system: Ubuntu
Device: EC2 V100
How you installed MLC-LLM (conda, source): tried both
How you installed TVM-Unity (pip, source): source
Python version (e.g. 3.10): 3.12.0
GPU driver version (if applicable): Nvidia-smi 535.183.01
CUDA/cuDNN version (if applicable): Cuda 12.6
The text was updated successfully, but these errors were encountered:
🐛 Bug
I started a mlc_llm cli server with
mlc_llm serve --host localhost HF://mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC
on an EC2 instance. Parallel to that, I started an LM-Evaluation-Harness run with [MMLU_PRO,IFEVAL,HELLASWAG]. After 1957 requests the server and lm-eval output just stopped. I tried to figured out what causes this to happen and this is what I noticed
The moment the problem appears mlc-llm uses 100% of CPU, while during normal execution it would just appear sometimes with a much smaller percentage. When I use cached requests, and restart lm-eval after it hung up, it would just continue and get stuck after some time again.
I am on Ubuntu 24.04 and using a V100. Nevertheless, the same problem appeared on a local Linux machine with a Nvidia graphics card.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment
conda
, source): tried bothpip
, source): sourceThe text was updated successfully, but these errors were encountered: