-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix coreml ANE optimized encoder #1716
Conversation
Thanks for looking into this - I will recheck the results now For reference, here is the discussion back then: #548 (reply in thread) |
Indeed, the ANE-optimized Core ML models work correctly and are faster than the original models. Here are the results that I get on M2 Ultra with 76 GPU cores and 32 ANE cores (only the "Enc." column is relevant for this change): master + Core ML ANE
PR + Core ML ANE
Notice however that the ANE-optimized Core ML models are not suitable to run on the GPU (i.e. master + Core ML GPU
PR + Core ML GPU
For reference, here are the results for running the entire computation on the GPU with Metal (i.e. no Core ML): Full Metal (no Core ML)
|
2024-01-05 09:24:22.023112+0800 [4575:1504107] Error: Transpose unit is not supported. with this update, I generated new coreml encoder and run on iPhone XR with iOS 16, it output above errors |
Hm, interesting. I actually didn't test if the ANE models work on iOS. Maybe this is the problem that we observed in the past, and now there is an error actually being reported |
I just tested it on iPhone 13 Mini (A15) with iOS 17.2.1 and it works without errors
Could it be related to the iOS version? |
I think it could be the clip's problem, iPhone XR is quite a old device,it use A12. |
Not sure if it is due to changes in this PR, or something else, but the Second time running was quick. Overall performance on a 10 minute podcast was 157 seconds… on my little M1 Mac Mini, about the same as previous The GPU was going all out according to Activity Monitor, but at least according to asitop, the ANE doesn't appear to be doing much? I see the |
What happens if you switch to |
Changed Still not seeing much ANE usage in |
On asitop It's expected to be mostly GPU and CPU because the decoder is running on GPU and it's much more expensive than the encoder, and the pre/post processing is all CPU. You should see a small ANE usage though, but in my testing the ANE is 10x more power efficient than GPU so the usage is very minimal. |
It's good enough for realtime on iPhone now? |
The ANE optimized encoder generated by
Maybe we need to update the files shared in huggingface? |
as I reported the old device crash issue above, I think it need some patch to avoid this. |
I see. But this PR makes |
Transpose the result back to the format that's accepted by the decoder.
I tested with tiny, small, and base models, and ran
./tests/run-tests.sh
, result all looks good. @ggerganov I am not sure why your previous attempt didn't work, can you double-check?Performance-wise, this is my result on M3 pro with a 30min audio and the base model(I used a longer audio to get a better average encode time per segment so you can ignore the inital coreml model load overhead). The encode time is ~2x faster than metal.
With ANE optimized model:
With vanilla openai whisper model:
Metal: