Utilise MLX framework on Apple Silicon #1598

randomeizer · 2023-12-07T01:12:41Z

Apple released this framework:

It has a C++ core with python bindings, and a python python example of a port for whisper. Unfortunately no C++ example.

Would be great if it could be integrated here.

astrowonk · 2023-12-09T18:47:20Z

FWIW, I played around with Whisper implementation in the MLX-examples repo (using python bindings), and while the mlx version definitely uses my M1 GPU, it was about half as fast on the same models than the current CoreML-enabled whisper.cpp build.

But there could be various overhead from how they implemented it in python.

oliverwehrens · 2023-12-12T19:38:01Z

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See https://owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx/ for the full post. Your mileage may vary.

astrowonk · 2023-12-12T21:00:19Z

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx for the full post. Your mileage may vary.

This is interesting. Was the 4090 whisper.cpp built with CUDA (feels like it isn't)? Were the whisper.cpp Mac builds built with CoreML?

So far the best Apple Silicon performance I've seen has been the CoreML builds of Whisper.cpp; would be curious if others find that the current MLX example code isn't quite as good as whisper.cpp + CoreML.

My overall best performance has been on Ubuntu with the CUDA enabled build of Whisper.cpp. (with a 3070), better than Python + pytorch + CUDA (though I should test that again.)

Here are times on medium.en for a ~10 minute episode of Robot or Not.

Computer and code	Time
Ubuntu i5 12600K with RTX 3070 w/ Whisper.cpp CUDA Build	29.67s
M1 16GB RAM w/ mlx-example python code	267.07s
M1 16GB RAM w/ whisper.cpp CoreML enabled (Sonoma)	153.62s
M1 16GB RAM w/ standard whisper.cpp build	188.12 s

So, MLX so far doesn't seem as good as the CoreML enabled whisper.cpp builds on my lowly M1, but that may not be true of other Apple Silicon chips.

EDITED to add regular make build of whisper on my M1. I did not appreciate how well it uses Metal and the GPU, it's only slightly slower than the CoreML version and faster than Apple's mlx-example code.

bobqianic · 2023-12-18T20:54:59Z

I did some simple benchmarking with a 10 minute audio file, the large model, M1Pro and MLX. Some else contributed M2 Ultra and M3 Max Numbers. See https://owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx/ for the full post. Your mileage may vary.

Your result might not be accurate. I ran a test on my 2080ti with the 10-minute audio file you mentioned, and it only took 45 seconds to process in total. I used the large-v2 model for this. Considering that the 4090 is significantly more powerful than my older 2080ti, the processing time should be even shorter. Audio file: https://www.zeit.de/politik/2023-12/streik-gdl-deutsche-bahn-claus-weselsky-nachrichtenpodcast

astrowonk · 2024-03-08T21:32:20Z

I retested with mlx==0.6.0 and the latest mlx-example repo and the same whisper test I ran above now finishes with mlx-example repo code in ~178s. Still not as good as whisper.cpp with CoreML, but a lot better than the test I ran in December.

bobqianic added the question Further information is requested label Dec 18, 2023

astrowonk mentioned this issue Jan 6, 2024

fix coreml ANE optimized encoder #1716

Merged

astrowonk mentioned this issue Feb 15, 2024

Core ML support #566

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilise MLX framework on Apple Silicon #1598

Utilise MLX framework on Apple Silicon #1598

randomeizer commented Dec 7, 2023

astrowonk commented Dec 9, 2023 •

edited

Loading

oliverwehrens commented Dec 12, 2023

astrowonk commented Dec 12, 2023 •

edited

Loading

bobqianic commented Dec 18, 2023

astrowonk commented Mar 8, 2024

Utilise MLX framework on Apple Silicon #1598

Utilise MLX framework on Apple Silicon #1598

Comments

randomeizer commented Dec 7, 2023

astrowonk commented Dec 9, 2023 • edited Loading

oliverwehrens commented Dec 12, 2023

astrowonk commented Dec 12, 2023 • edited Loading

bobqianic commented Dec 18, 2023

astrowonk commented Mar 8, 2024

astrowonk commented Dec 9, 2023 •

edited

Loading

astrowonk commented Dec 12, 2023 •

edited

Loading