Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VLM] Fully dynamic prompt replacement in merged input processor #11199

Merged
merged 14 commits into from
Dec 14, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Dec 14, 2024

This PR enables the prompt replacement sequence (which can be text or a list of token IDs) to be fully computed based on the input. It also improves the placeholder search logic to be able to match the exact placeholder tokens for each multi-modal input. Furthermore, the input processor is now applied automatically when generating the dummy data, so developers only need to specify the raw input multi-modal data instead of the processed data.

This fixes an issue in Pixtral-HF preprocessing where the PromptReplacement is incorrect.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 changed the title [VLM] Enable fully dynamic prompt replacement [VLM] Enable fully dynamic prompt replacement sequence Dec 14, 2024
Comment on lines +1533 to +1534
if strict:
return key in self.data
Copy link
Member Author

@DarkLight1337 DarkLight1337 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes false positive warnings being emitted when using Mantis model, arising from both Mantis and its base class (Llava) having corresponding items in the registry.

@DarkLight1337 DarkLight1337 changed the title [VLM] Enable fully dynamic prompt replacement sequence [VLM] Enable fully dynamic prompt replacement sequence in merged input processor Dec 14, 2024
@DarkLight1337 DarkLight1337 changed the title [VLM] Enable fully dynamic prompt replacement sequence in merged input processor [VLM] Fully dynamic prompt replacement sequence in merged input processor Dec 14, 2024
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for fixing this!

tests/multimodal/test_processing.py Outdated Show resolved Hide resolved
vllm/model_executor/models/llava.py Outdated Show resolved Hide resolved
@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Dec 14, 2024

I found that the processor in this PR doesn't work for Phi-3-Vision (though it passes for Phi-3.5-Vision). Looks like there is still a need to compute the number of image tokens according to the image...

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 changed the title [VLM] Fully dynamic prompt replacement sequence in merged input processor [VLM] Fully dynamic prompt replacement in merged input processor Dec 14, 2024
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 14, 2024
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 14, 2024 15:57
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 force-pushed the dynamic-repl-unit branch 4 times, most recently from 1b3b22d to 653424e Compare December 14, 2024 16:07
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 merged commit 93abf23 into vllm-project:main Dec 14, 2024
57 checks passed
@DarkLight1337 DarkLight1337 deleted the dynamic-repl-unit branch December 14, 2024 17:54
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the fix

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025
abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025
abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants