Basic pixtral support, paving the way for vision models 🖼️ #153
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Conversion
Conversion is made to work with the
mistral-community
models, and was for now only tested withmistral-community/pixtral-12b
.The official mistralai models lack quite some information to be usable in our context. (HF checkpoints lack information as well, but it's more manageable.)
What works
The provided
test_inference.py
script in thepixtral
recipe allows to run inference on a few examples (grabbed from the Pixtral blog post).The configuration is using bitsandbytes quantization by default, to allow running on a 24G VRAM GPU (tested on 3090).
Some differences in a few methods (notably rope) lead to slight numerical differences with the HF implementation (as most of our models anyways).
What does not work (yet)
What needs to be improved
eole.models.model
? allow for differentadapter
classes?)