Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch #8132

iseeyuan · 2025-02-02T18:26:45Z

🚀 The feature, motivation and pitch

DeepSeek recently released their Mixture-of-Experts Vision-Language Models, DeepSeek-VL2

DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively.

The Tiny and Small versions are suitable for on-device usage. The unique features of this model include:

Mixture-of-Experts (MoE)
Multi-head Latent Attention (MLA) mechanism for KV Cache efficiency
Multimodal image understanding

Alternatives

There are distilled reasoning models for DeepSeek R1 (it's mentioned in this #7981). However, those models use the same architecture as the targeting models (Llama and Qwen). They don't have the three unique features mentioned above. Especially, MoE and MLA look promising for on-device inference efficiency.

Additional context

No response

RFC (Optional)

Suggested process:

Leverage the existing llama_transformer, with a set of ExecuTorch infra built around it, like export_llama(https://github.com/pytorch/executorch/blob/main/examples/models/llama/export_llama.py) for export, quantization and lowering to backends.
MoE: there's an initial version of MoE definition. Feel free to use and extend it.
MLA: subclass/register the MLA definition of this abstract class, so that it can be re-used and further optimized in future.
The DeepSeek-VL2 logic and be implemented in its own model.py, in a specific model folder. A reference example is llava model, with similar multimodal structure.

cc @mergennachin @cccclai @helunwencser @dvorjackz

iseeyuan added the module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code label Feb 2, 2025

iseeyuan changed the title ~~Support DeepSeek-VL2 Tiny and Small models in ExecuTorch~~ Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch` Feb 2, 2025

iseeyuan changed the title ~~Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch`~~ Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch Feb 2, 2025

digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch #8132

Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch #8132

iseeyuan commented Feb 2, 2025 •

edited by pytorch-bot bot

Loading

Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch #8132

Support DeepSeek-VL2 models with MoE and MLA in ExecuTorch #8132

Comments

iseeyuan commented Feb 2, 2025 • edited by pytorch-bot bot Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

iseeyuan commented Feb 2, 2025 •

edited by pytorch-bot bot

Loading