Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilise MLX framework on Apple Silicon #1598

Open
randomeizer opened this issue Dec 7, 2023 · 5 comments
Open

Utilise MLX framework on Apple Silicon #1598

randomeizer opened this issue Dec 7, 2023 · 5 comments
Labels
question Further information is requested

Comments

@randomeizer
Copy link

Apple released this framework:

https://t.co/uA2ZbYC13I

It has a C++ core with python bindings, and a python python example of a port for whisper. Unfortunately no C++ example.

Would be great if it could be integrated here.

@astrowonk
Copy link

astrowonk commented Dec 9, 2023

FWIW, I played around with Whisper implementation in the MLX-examples repo (using python bindings), and while the mlx version definitely uses my M1 GPU, it was about half as fast on the same models than the current CoreML-enabled whisper.cpp build.

But there could be various overhead from how they implemented it in python.

@oliverwehrens
Copy link

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See https://owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx/ for the full post. Your mileage may vary.

@astrowonk
Copy link

astrowonk commented Dec 12, 2023

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx for the full post. Your mileage may vary.

This is interesting. Was the 4090 whisper.cpp built with CUDA (feels like it isn't)? Were the whisper.cpp Mac builds built with CoreML?

So far the best Apple Silicon performance I've seen has been the CoreML builds of Whisper.cpp; would be curious if others find that the current MLX example code isn't quite as good as whisper.cpp + CoreML.

My overall best performance has been on Ubuntu with the CUDA enabled build of Whisper.cpp. (with a 3070), better than Python + pytorch + CUDA (though I should test that again.)

Here are times on medium.en for a ~10 minute episode of Robot or Not.

Computer and code Time
Ubuntu i5 12600K with RTX 3070 w/ Whisper.cpp CUDA Build 29.67s
M1 16GB RAM w/ mlx-example python code 267.07s
M1 16GB RAM w/ whisper.cpp CoreML enabled (Sonoma) 153.62s
M1 16GB RAM w/ standard whisper.cpp build 188.12 s

So, MLX so far doesn't seem as good as the CoreML enabled whisper.cpp builds on my lowly M1, but that may not be true of other Apple Silicon chips.

EDITED to add regular make build of whisper on my M1. I did not appreciate how well it uses Metal and the GPU, it's only slightly slower than the CoreML version and faster than Apple's mlx-example code.

@bobqianic
Copy link
Collaborator

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See https://owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx/ for the full post. Your mileage may vary.

Your result might not be accurate. I ran a test on my 2080ti with the 10-minute audio file you mentioned, and it only took 45 seconds to process in total. I used the large-v2 model for this. Considering that the 4090 is significantly more powerful than my older 2080ti, the processing time should be even shorter. Audio file: https://www.zeit.de/politik/2023-12/streik-gdl-deutsche-bahn-claus-weselsky-nachrichtenpodcast

image

image

@bobqianic bobqianic added the question Further information is requested label Dec 18, 2023
@astrowonk astrowonk mentioned this issue Feb 15, 2024
10 tasks
@astrowonk
Copy link

I retested with mlx==0.6.0 and the latest mlx-example repo and the same whisper test I ran above now finishes with mlx-example repo code in ~178s. Still not as good as whisper.cpp with CoreML, but a lot better than the test I ran in December.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants