-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM] Fully dynamic prompt replacement in merged input processor #11199
[VLM] Fully dynamic prompt replacement in merged input processor #11199
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: DarkLight1337 <[email protected]>
if strict: | ||
return key in self.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes false positive warnings being emitted when using Mantis model, arising from both Mantis and its base class (Llava) having corresponding items in the registry.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for fixing this!
I found that the processor in this PR doesn't work for Phi-3-Vision (though it passes for Phi-3.5-Vision). Looks like there is still a need to compute the number of image tokens according to the image... |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
8cf2512
to
7ea8e20
Compare
Signed-off-by: DarkLight1337 <[email protected]>
7ea8e20
to
096633e
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
1b3b22d
to
653424e
Compare
Signed-off-by: DarkLight1337 <[email protected]>
653424e
to
5285257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the fix
…m-project#11199) Signed-off-by: DarkLight1337 <[email protected]>
…m-project#11199) Signed-off-by: DarkLight1337 <[email protected]>
…m-project#11199) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Bowen Wang <[email protected]>
…m-project#11199) Signed-off-by: DarkLight1337 <[email protected]>
This PR enables the prompt replacement sequence (which can be text or a list of token IDs) to be fully computed based on the input. It also improves the placeholder search logic to be able to match the exact placeholder tokens for each multi-modal input. Furthermore, the input processor is now applied automatically when generating the dummy data, so developers only need to specify the raw input multi-modal data instead of the processed data.
This fixes an issue in Pixtral-HF preprocessing where the
PromptReplacement
is incorrect.