-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning Script #6
Comments
Hi, thank you for your interest. We are currently busy iterating DeepSeek-VL. The community has already started supporting DeepSeek-VL (#10 ). Have fun! |
@RERV It seems as if swift does not support finetuning of the vision encoder (it seems that way from my quick glance over the source code, I hope I'm wrong) Given that you are internally training deepseek VL somehow, could you provide training code snippets so that the community can work on an LLM and vision encoder finetuning script? |
Internally we train DeepSeek-VL with hai-llm (as mentioned in the paper), which is a closed source training framework. We do hope to open source hai-llm someday, but that is a really big project, involving our training cluster configuration/management and other internal libraries. I'm afraid that we don't have any bandwidth working on cleaning up & open sourcing hai-llm core code right now. |
@soloice Hi, I see, thanks. Would it be possible to just release the backprop code of the vision encoder, no framework around it, no clustering, just a starting point for the community to work upon? |
Well, I can describe how to do this briefly. Basically you don't need to write backprop code, because torch will take care of everything. Just build the model, then set the requires_grad attribute in visual encoder will work:
What you really need to care about is distributed strategy. If you are using DDP or 3D parallel with TP=1, the above code is all you need; If you are using 3D parallel with TP>1, you will need to average the gradient of visual encoders on all tp ranks with an NCCL call looks like |
@soloice Thank you very much for the information! Given the PyTorch grad, how would you go about training? In our use-case we need to add a bit of grounding by implementing a cursor as an output. |
Jintao-Huang implemented it! |
Congratulations to DeepSeek for the wonderful work. I wonder if there is a script for fine-tuning DeepSeek-VL? Thanks!
The text was updated successfully, but these errors were encountered: