Performance degradation with P40 on larger models #6814

samr7 · 2024-04-21T23:06:07Z

I have a machine with a lot of old parts in it, including 8 P40s and 2 Xeon E5-2667v2 CPUs.

I build llama.cpp using:
cmake -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on

Using a llama2-70b-Q8_0 model, I see good results with release b1842 and earlier. With b1843 and newer, from January 12, with #4766, I see a ~62% drop:

bin/main -m ../text-generation-webui/models/Synthia-70b-v1.2.Q8_0.gguf -ngl 99 -p "Why is the sky blue?" -n 128

b1691: 10.76 t/s
b1767: 9.75 t/s
b1808: 9.76 t/s
b1832: 9.77 t/s
b1842: 9.76 t/s
b1843: 3.73 t/s
b2400: 3.83 t/s
b2709: 3.84 t/s

Trying the test with some other models, the discrepancy is much less in smaller models, to the point that the 8B model is considerably faster with the latest release:

Model	b1842	b1843	b2709
Synthia-70b-v1.2.Q8_0	9.76 t/s	3.73 t/s	3.84 t/s
phind-codellama-34b-v2.Q8_0	16.99 t/s	7.54 t/s	7.78 t/s
llama-2-13b-Q8_0	21.10 t/s	17.67 t/s	18.63 t/s
Meta-Llama-3-8B-Instruct.Q8_0	25.66 t/s	33.27 t/s	31.83 t/s

Using fewer GPUs for this test (with the 70b model) makes b1842 a bit slower, but otherwise doesn't seem to change the result much:

GPUs	b1842	b1843	b2709
8	9.76 t/s	3.73 t/s	3.84 t/s
4	9.61 t/s	3.77 t/s	3.89 t/s
3	8.32 t/s	3.77 t/s	3.91 t/s

Changing the CPU thread count (with the 70b model) shows relative improvements for each build, but does not resolve the bigger discrepancies:

Threads	b1842	b2709
-t 1	10.05 t/s	3.90 t/s
-t 4	10.06 t/s	3.90 t/s
-t 8	10.09 t/s	3.90 t/s

The system is similar in topology to a Supermicro SYS-4028GR-TR2. The GPUs are all PCIe 3.0x16 attached to PLX switches and have relatively good CPU and P2P bandwidth over PCIe -- 11-13GB/s between any pair.

Any ideas?

slaren · 2024-04-21T23:07:10Z

Try -sm row.

samr7 · 2024-04-21T23:58:53Z

-sm row seems to improve things a lot! Thanks.

samr7 added the bug-unconfirmed label Apr 21, 2024

samr7 closed this as completed Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation with P40 on larger models #6814

Performance degradation with P40 on larger models #6814

samr7 commented Apr 21, 2024

slaren commented Apr 21, 2024

samr7 commented Apr 21, 2024

Performance degradation with P40 on larger models #6814

Performance degradation with P40 on larger models #6814

Comments

samr7 commented Apr 21, 2024

slaren commented Apr 21, 2024

samr7 commented Apr 21, 2024