[VLM] Merged multi-modal processor for Pixtral #12211

Flechman · 2025-01-20T09:25:12Z

This PR aims at implementing the merged multi-modal processor for Pixtral as an effort to contribute to the V1 re-arch for multi-modal models.

Signed-off-by: remi <[email protected]>

github-actions · 2025-01-20T09:25:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mgoin · 2025-01-20T17:38:29Z

vllm/model_executor/models/pixtral.py

+@MULTIMODAL_REGISTRY.register_processor(PixtralHFMultiModalProcessor,
+                                        info=PixtralHFProcessingInfo,
+                                        dummy_inputs=PixtralHFDummyInputBuilder
+                                        )
 class PixtralForConditionalGeneration(nn.Module, SupportsMultiModal,
                                      SupportsPP):


We can use the same processor for the mistral-format and the hf-format now?

Signed-off-by: remi <[email protected]>

DarkLight1337 · 2025-02-05T08:11:18Z

#12767 should make it easier to pass the image token ID

Adjustment first version

fbe6a9d

Signed-off-by: remi <[email protected]>

This was referenced Jan 20, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

mgoin reviewed Jan 20, 2025

View reviewed changes

ywang96 assigned ywang96 and DarkLight1337 Jan 20, 2025

Flechman added 2 commits January 22, 2025 14:59

Merge with main

46c142f

Revert changes

4af1716

Signed-off-by: remi <[email protected]>

Flechman force-pushed the pixtral-mm-processor branch from 41c423a to 4af1716 Compare January 26, 2025 12:19

Flechman added 5 commits January 26, 2025 12:32

Add pixtral dummy inputs builder

8a75f3a

Signed-off-by: remi <[email protected]>

Fix naming

2e346d3

Signed-off-by: remi <[email protected]>

HF processor not supported

c9c082b

Signed-off-by: remi <[email protected]>

Add tokenizer mode

869a620

Signed-off-by: remi <[email protected]>

Override pixtral processor apply

a6392cb

Signed-off-by: remi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Merged multi-modal processor for Pixtral #12211

[VLM] Merged multi-modal processor for Pixtral #12211

Flechman commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

mgoin Jan 20, 2025

DarkLight1337 commented Feb 5, 2025

[VLM] Merged multi-modal processor for Pixtral #12211

Are you sure you want to change the base?

[VLM] Merged multi-modal processor for Pixtral #12211

Conversation

Flechman commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

mgoin Jan 20, 2025

Choose a reason for hiding this comment

DarkLight1337 commented Feb 5, 2025