-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vllm infer for Qwen2-VL-72B-Instruct-GPTQ-Int8
bug
Something isn't working
#10650
opened Nov 26, 2024 by
DoctorTar
1 task done
[Feature]: Mixtral manual
head_dim
feature request
#10649
opened Nov 26, 2024 by
wavy-jung
1 task done
[Bug]: Llama 3.2 90b crash
bug
Something isn't working
#10648
opened Nov 26, 2024 by
yessenzhar
1 task done
[Bug]: GPU Memory Accounting Issue with Multiple vLLM Instances
bug
Something isn't working
#10643
opened Nov 25, 2024 by
brokenlander
1 task done
[Feature]: if vllm supports explicitly specifying GPU devices for a model instance.
feature request
#10638
opened Nov 25, 2024 by
wlll123456
1 task done
[Bug]:The parameter gpu_memory_utilization does not take effect
bug
Something isn't working
#10637
opened Nov 25, 2024 by
liutao053877
1 task done
[Feature]: Initial Idea and Design for Asynchronous Scheduling
feature request
#10634
opened Nov 25, 2024 by
lixiaolx
1 task done
[Bug]: GPU memory leak when using bad_words feature
bug
Something isn't working
#10630
opened Nov 25, 2024 by
wsp317
1 task done
[Performance]: It seems that using bge-m3 for performance acceleration did not achieve the expected results.
performance
Performance-related issues
#10628
opened Nov 25, 2024 by
Jay-ju
1 task done
[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing
bug
Something isn't working
#10627
opened Nov 25, 2024 by
jiahansu
1 task done
tracking torch.compile compatibility with lora serving
bug
Something isn't working
#10617
opened Nov 25, 2024 by
youkaichao
1 task done
[Usage]: Does speculative decoding support pipeline parallelism ?
usage
How to use vllm
#10615
opened Nov 25, 2024 by
wanghongyu2001
1 task done
tracking torch.compile compatibility with cpu offloading
bug
Something isn't working
#10612
opened Nov 25, 2024 by
youkaichao
1 task done
[Feature]: load and save kv cache from disk
feature request
#10611
opened Nov 25, 2024 by
duyongtju
1 task done
[Feature]: When apply prompt_logprobs for OpenAI server, the prompt_logprobs field in respnose does not show which token is chosen
feature request
#10607
opened Nov 24, 2024 by
DIYer22
1 task done
[Usage]: How to make model response information appear in the vllm backend logs
usage
How to use vllm
#10602
opened Nov 24, 2024 by
nora647
1 task done
[Bug]: GGUF Model Output Repeats Nonsensically
bug
Something isn't working
#10600
opened Nov 24, 2024 by
Mayflyyh
1 task done
[Usage]: While loading model get 'layers.0.mlp.down_proj.weight' after merge_and_unload()
usage
How to use vllm
#10598
opened Nov 24, 2024 by
alex2romanov
1 task done
[Bug]: Memory allocation with echo=True
bug
Something isn't working
#10596
opened Nov 23, 2024 by
ArtemBiliksin
1 task done
[Performance]: Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
performance
Performance-related issues
#10592
opened Nov 23, 2024 by
Weishaoya
1 task done
[Bug]: Error loading bitsandbytes 4bit model when the quant_storage is torch.bfloat16
bug
Something isn't working
#10590
opened Nov 23, 2024 by
AaronZLT
[Bug] Streaming output error of tool calling has still not been resolved.
#10589
opened Nov 23, 2024 by
Sala8888
[Bug]: Qwen2-VL-7B with sglang (vLLM-back) Performance Degradation on MME benchmark
bug
Something isn't working
#10588
opened Nov 23, 2024 by
thusinh1969
1 task done
[Bug]: Duplicate request_id breaks the engine
bug
Something isn't working
#10583
opened Nov 22, 2024 by
tjohnson31415
1 task done
Previous Next
ProTip!
Adding no:label will show everything without a label.