vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.7k
Star 30.8k

Code
Issues 1.7k
Pull requests 381
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 22

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 9

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,698 Open 3,655 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: vllm infer for Qwen2-VL-72B-Instruct-GPTQ-Int8 bug

Something isn't working

#10650 opened Nov 26, 2024 by DoctorTar

1 task done

[Feature]: Mixtral manual head_dim feature request

#10649 opened Nov 26, 2024 by wavy-jung

1 task done

[Bug]: Llama 3.2 90b crash bug

Something isn't working

#10648 opened Nov 26, 2024 by yessenzhar

1 task done

[RFC]: Support KV Cache Compaction RFC

#10646 opened Nov 25, 2024 by YaoJiayi

1 task done

[Bug]: GPU Memory Accounting Issue with Multiple vLLM Instances bug

Something isn't working

#10643 opened Nov 25, 2024 by brokenlander

1 task done

[Feature]: if vllm supports explicitly specifying GPU devices for a model instance. feature request

#10638 opened Nov 25, 2024 by wlll123456

1 task done

[Bug]:The parameter gpu_memory_utilization does not take effect bug

Something isn't working

#10637 opened Nov 25, 2024 by liutao053877

1 task done

[Feature]: Initial Idea and Design for Asynchronous Scheduling feature request

#10634 opened Nov 25, 2024 by lixiaolx

1 task done

[Bug]: GPU memory leak when using bad_words feature bug

Something isn't working

#10630 opened Nov 25, 2024 by wsp317

1 task done

[Performance]: It seems that using bge-m3 for performance acceleration did not achieve the expected results. performance

Performance-related issues

#10628 opened Nov 25, 2024 by Jay-ju

1 task done

[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing bug

Something isn't working

#10627 opened Nov 25, 2024 by jiahansu

1 task done

tracking torch.compile compatibility with lora serving bug

Something isn't working

#10617 opened Nov 25, 2024 by youkaichao

1 task done

[Usage]: Does speculative decoding support pipeline parallelism ? usage

How to use vllm

#10615 opened Nov 25, 2024 by wanghongyu2001

1 task done

tracking torch.compile compatibility with cpu offloading bug

Something isn't working

#10612 opened Nov 25, 2024 by youkaichao

1 task done

[Feature]: load and save kv cache from disk feature request

#10611 opened Nov 25, 2024 by duyongtju

1 task done

[Feature]: When apply prompt_logprobs for OpenAI server, the prompt_logprobs field in respnose does not show which token is chosen feature request

#10607 opened Nov 24, 2024 by DIYer22

1 task done

[Usage]: How to make model response information appear in the vllm backend logs usage

How to use vllm

#10602 opened Nov 24, 2024 by nora647

1 task done

[Bug]: GGUF Model Output Repeats Nonsensically bug

Something isn't working

#10600 opened Nov 24, 2024 by Mayflyyh

1 task done

[Usage]: While loading model get 'layers.0.mlp.down_proj.weight' after merge_and_unload() usage

How to use vllm

#10598 opened Nov 24, 2024 by alex2romanov

1 task done

[Bug]: Memory allocation with echo=True bug

Something isn't working

#10596 opened Nov 23, 2024 by ArtemBiliksin

1 task done

[Performance]: Cannot use FlashAttention-2 backend for Volta and Turing GPUs. performance

Performance-related issues

#10592 opened Nov 23, 2024 by Weishaoya

1 task done

[Bug]: Error loading bitsandbytes 4bit model when the quant_storage is torch.bfloat16 bug

Something isn't working

#10590 opened Nov 23, 2024 by AaronZLT

[Bug] Streaming output error of tool calling has still not been resolved.

#10589 opened Nov 23, 2024 by Sala8888

[Bug]: Qwen2-VL-7B with sglang (vLLM-back) Performance Degradation on MME benchmark bug

Something isn't working

#10588 opened Nov 23, 2024 by thusinh1969

1 task done

[Bug]: Duplicate request_id breaks the engine bug

Something isn't working

#10583 opened Nov 22, 2024 by tjohnson31415

1 task done

Previous 1 2 3 4 5 … 67 68 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly