[Feature]: MLA Support #4625

chengtbf · 2024-05-06T15:09:35Z

🚀 The feature, motivation and pitch

DeepSeek-V2 design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.

Can VLLM support MLA for accelerated inference?

@misc{deepseek-v2,
author = {DeepSeek-AI},
title = {DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
year = {2024},
note = {GitHub repository},
url = {https://github.com/deepseek-ai/deepseek-v2}
}

https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat
https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf

Alternatives

No response

Additional context

No response

datalee · 2024-05-07T00:49:08Z

mark

obitoquilt · 2024-05-20T01:19:09Z

mark

chenrui17 · 2024-06-12T09:28:48Z

mark

lengyueyang · 2024-07-10T09:45:38Z

mark

RanchiZhao · 2024-07-11T04:57:30Z

mark

Jiayi-Pan · 2024-07-11T22:22:39Z

mark

quwu0820 · 2024-07-13T07:16:19Z

mark

qianchen94 · 2024-07-16T09:23:00Z

mark

lumosity4tpj · 2024-08-15T06:51:34Z

mark

zhyncs · 2024-08-19T17:25:52Z

ref #4650 (comment)

github-actions · 2024-11-19T02:03:35Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2024-12-19T02:05:07Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

chengtbf added the feature request label May 6, 2024

github-actions bot added the stale label Nov 19, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: MLA Support #4625

[Feature]: MLA Support #4625

chengtbf commented May 6, 2024

datalee commented May 7, 2024

obitoquilt commented May 20, 2024

chenrui17 commented Jun 12, 2024

lengyueyang commented Jul 10, 2024

RanchiZhao commented Jul 11, 2024

Jiayi-Pan commented Jul 11, 2024

quwu0820 commented Jul 13, 2024

qianchen94 commented Jul 16, 2024

lumosity4tpj commented Aug 15, 2024

zhyncs commented Aug 19, 2024

github-actions bot commented Nov 19, 2024

github-actions bot commented Dec 19, 2024

[Feature]: MLA Support #4625

[Feature]: MLA Support #4625

Comments

chengtbf commented May 6, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

datalee commented May 7, 2024

obitoquilt commented May 20, 2024

chenrui17 commented Jun 12, 2024

lengyueyang commented Jul 10, 2024

RanchiZhao commented Jul 11, 2024

Jiayi-Pan commented Jul 11, 2024

quwu0820 commented Jul 13, 2024

qianchen94 commented Jul 16, 2024

lumosity4tpj commented Aug 15, 2024

zhyncs commented Aug 19, 2024

github-actions bot commented Nov 19, 2024

github-actions bot commented Dec 19, 2024