-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support More Pre-Converted VLM Models #5
Comments
RKLLM only supports the LLM portion of it, but you can use Happyme's MiniCPMV-2_6 implementation here: https://huggingface.co/happyme531/MiniCPM-V-2_6-rkllm MiniCPM-V-2_6 LLM (instead of just Qwen, if you want to swap that out) is here: https://huggingface.co/c01zaut/MiniCPM-V-2_6-rk3588-1.1.2 |
With the release of 1.1.4, there is now a full VLM pipeline for this platform. I am going to be doing some testing this week to get things working. |
That’s fantastic! Looking forward to your new version~ |
@ZackPu - multimodal conversion completed. I have some tweaks I need to make to the pipeline, and then I will be pushing that to my toolkit repo for converting on x86_64 to use on the RK3588. The recent update to 1.1.4 now supports Qwen2VL, in addition to MiniCPMV 2_6. The pipeline is a lot more complex, though, since the embedder + mmproj need to be utilized as separate models in RKNN (basically modified ONNX) format. It's similar to happyme531's implementation. |
Thanks! maybe i can use your pipeline to build my deploy env? my VLM model: siglip + projector+finetuned llama2 7b |
@ZackPu - https://github.com/c0zaut/rkllm-mm-export This script converts Qwen2VL -> image encoder in rknn, and llm in rkllm. The siglip + mmproj is configured with fixed shapes and exported to onnx, followed by rknn. You'll definitely need to tweak it for your purpose, and it doesn't have a deployment component yet; but it will at least allow you to play around with converting a custom VLM. Since the LLM is Llama2 7b FT, you should have no issue converting. Also, you may want to check out this new model, which has a bunch of tool calling and web search features: https://huggingface.co/Infinigence/Megrez-3B-Omni/ For that one to work properly with RKLLM, you will need to set the eos token to <|turn_end|> on line 214 of tokenizer_config.json, as well as change the eos token array to a single value, id 120005, in generation_config.json, as well as config.json. That model is also Llama-based, uses Whisper for the audio encoder, and siglip image encoder for vision. It is also only 3B for the LLM component. I just tested the LLM component and it runs at about 7 tok/s. |
Would it be possible to provide more pre-converted Vision-Language Models (VLMs) on Hugging Face ?i cant use this model on RK3588 board. https://huggingface.co/openvla/openvla-7b
The text was updated successfully, but these errors were encountered: