-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support qwen2 #1820
Support qwen2 #1820
Conversation
Ctranslate2==4.5.1 release please? I can confirm that the conversion (for int8 at least, which is i had time to test) for: Qwen 2.5: .5, 1.5, 3, 7b instruct models I used: Platform:Microsoft Windows 10 Dependencies:nvidia-cublas-cu12==12.4.2.65 Preliminary benchmarks (int8, rtx4090, cuda):Qwen--Qwen2.5-0.5B-Instruct-ct2-int8 Qwen--Qwen2.5-1.5B-Instruct-ct2-int8 Qwen--Qwen2.5-3B-Instruct-ct2-int8 Qwen--Qwen2.5-7B-Instruct-ct2-int8 Qwen--Qwen2.5-Coder-3B-Instruct-ct2-int8 Qwen--Qwen2.5-Coder-14B-Instruct-ct2-int8 32B RUNNING AT int8 DOES NOT FIT ON 24gb of vram~ However, for reasons I don't fully understanding it's necessary to add this line to the benchmarking script:
Here is the benchmarking script. NOTE, it relies on pip installing the necessary CUDA libraries, which I find infinitely more convenient than having to install/re-install CUDA system wide to simply test different versions. BENCHMARKING SCRIPT IS HERE
|
I will release CTranslate2 when some features are done. For the next point, using KMP_DUPLICATE_LIB_OK can prevent OpenMP warnings about runtime library duplication. Ensure you use the correct supported version. |
No description provided.