Performance of llama.cpp on Apple Silicon M-series #4167
Replies: 63 comments 106 replies
-
M2 Mac Mini, 4+4 CPU, 10 GPU, 24 GB Memory (@QueryType) ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M2 Max Studio, 8+4 CPU, 38 GPU ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M2 Ultra, 16+8 CPU, 60 GPU (@crasm) ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M3 Max (MBP 16), 12+4 CPU, 40 GPU (@ymcui) ✅
build: 55978ce (1555) Short Note: mostly similar to the one reported by @slaren . But for Q4_0 |
Beta Was this translation helpful? Give feedback.
-
In the graph, why is PP t/s plotted against bandwidth and TG t/s plotted against GPU cores? Seems like GPU cores have more effect on PP t/s. |
Beta Was this translation helpful? Give feedback.
-
How about also sharing the largest model sizes and context lengths people can run with their amount of RAM? It's important to get the amount of RAM right when buying Apple computers because you can't upgrade later. |
Beta Was this translation helpful? Give feedback.
-
M2 Pro, 6+4 CPU, 16 GPU (@minosvasilias) ✅
build: e9c13ff (1560) |
Beta Was this translation helpful? Give feedback.
-
Would love to see how M1 Max and M1 Ultra fare given their high memory bandwidth. |
Beta Was this translation helpful? Give feedback.
-
M2 MAX (MBP 16) 8+4 CPU, 38 GPU, 96 GB RAM (@MrSparc) ✅
build: e9c13ff (1560) |
Beta Was this translation helpful? Give feedback.
-
M1 Max (MBP 16) 8+2 CPU, 32 GPU, 64GB RAM (@CedricYauLBD) ✅
build: e9c13ff (1560) Note: M1 Max RAM Bandwidth is 400GB/s |
Beta Was this translation helpful? Give feedback.
-
Look at what I started |
Beta Was this translation helpful? Give feedback.
-
M3 Pro (MBP 14), 5+6 CPU, 14 GPU (@paramaggarwal) ✅
build: e9c13ff (1560) |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
### M2 MAX (MBP 16) 38 Core 32GB ✅
build: 795cd5a (1493) |
Beta Was this translation helpful? Give feedback.
-
I'm looking at the summary plot about "PP performance vs GPU cores" and evidence that original unquantised fp16 model always delivers more performance than quantized models. |
Beta Was this translation helpful? Give feedback.
-
What data/prompts are used for this? |
Beta Was this translation helpful? Give feedback.
-
I have just tested the latest Apple M4 chip equipped on iPad Pro 2024 11-inch (256GB). The main difference between two different versions of M4 is the number of performance cores. Also 1TB/2TB iPad Pro has a doubled memory size (16GB).
The following is a quick test for benchmarking M4. M4 (iPad Pro 2024 256GB), 3+6 CPU, 10 GPUtinyllama 1.1b
phi-2 2.7BTBA mistral 7b
|
Beta Was this translation helpful? Give feedback.
-
Can you explain how to do that? |
Beta Was this translation helpful? Give feedback.
-
Well, the HuggingFace repo says that Gemma-7B-it.gguf is 34.7GB, so I
haven't tried because it looks obviously too big to run, but if you can buy
a 64GB machine, I'd recommend that you do.
…On Sat, Jun 15, 2024 at 2:53 PM Alptekin ***@***.***> wrote:
@pudepiedj <https://github.com/pudepiedj> So you cannot run
Gemma-7B-it.gguf on your M2 max 12-core CPU 38-core GPU? I am considering
to buy a similar config (with 64gb ram) on mac studio so I am curious.
Thanks.
—
Reply to this email directly, view it on GitHub
<#4167 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGG22YNBCPRBRG43ZURHI5TZHRBOLAVCNFSM6AAAAAA7V5XCOKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TOOBRHE2DI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Where can I get the models listed in the command line?
|
Beta Was this translation helpful? Give feedback.
-
There won't be an M3 Ultra, you should remove that from the list |
Beta Was this translation helpful? Give feedback.
-
M4 Pro, 10+4 CPU, 20 GPU, 24 GB Memory (@miccou) ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M4 (Mac Mini 2024), 4+6 CPU, 10GPU, 32 GB Memory ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M4 Max (Macbook Pro 16" 2024), 12+4 CPU, 40GPU, 128 GB Memory ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
Have someone tried M4 Pro 64G, is it possible to run a 70B model in a usable speed? |
Beta Was this translation helpful? Give feedback.
-
If i'm reading correctly, the m3 pro is slower than the m2 pro?? |
Beta Was this translation helpful? Give feedback.
-
M4 Pro, 8+4 CPU, 16 GPU, 24 GB Memory (MBP 14) ✅
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
M4 Max (Macbook Pro 14" 2024), 12+4 CPU, 40 GPU, 128 GB Memory
build: 8e672ef (1550) |
Beta Was this translation helpful? Give feedback.
-
Which models can my M3 16GB MacBook Air support? |
Beta Was this translation helpful? Give feedback.
-
why specifically is the M2 so cracked compared to the M3 and M4? |
Beta Was this translation helpful? Give feedback.
-
Summary
LLaMA 7B
[GB/s]
Cores
[t/s]
[t/s]
[t/s]
[t/s]
[t/s]
[t/s]
plot.py
Description
This is a collection of short
llama.cpp
benchmarks on various Apple Silicon hardware. It can be useful to compare the performance thatllama.cpp
achieves across the M-series chips and hopefully answer questions of people wondering if they should upgrade or not. Collecting info here just for Apple Silicon for simplicity. Similar collection for A-series chips is available here: #4508If you are a collaborator to the project and have an Apple Silicon device, please add your device, results and optionally username for the following command directly into this post (requires LLaMA 7B v2):
PP
means "prompt processing" (bs = 512
),TG
means "text-generation" (bs = 1
),t/s
means "tokens per second"Note that in this benchmark we are evaluating the performance against the same build 8e672ef (2023 Nov 13) in order to keep all performance factors even. Since then, there have been multiple improvements resulting in better absolute performance. As an example, here is how the same test compares against the build 86ed72d (2024 Nov 21) on M2 Ultra:
[GB/s]
Cores
[t/s]
[t/s]
[t/s]
[t/s]
[t/s]
[t/s]
M1 Pro, 8+2 CPU, 16 GPU (@ggerganov) ✅
build: 8e672ef (1550)
M2 Ultra, 16+8 CPU, 76 GPU (@ggerganov) ✅
build: 8e672ef (1550)
M3 Max (MBP 14), 12+4 CPU, 40 GPU (@slaren) ✅
build: d103d93 (1553)
Footnotes
https://en.wikipedia.org/wiki/Apple_M1#Variants ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
https://en.wikipedia.org/wiki/Apple_M2#Variants ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
https://en.wikipedia.org/wiki/Apple_M3#Variants ↩ ↩2 ↩3 ↩4 ↩5 ↩6
https://en.wikipedia.org/wiki/Apple_M4#Variants ↩ ↩2 ↩3 ↩4 ↩5 ↩6
Beta Was this translation helpful? Give feedback.
All reactions