CUDA: use min compute capability of GPUs actually used #2506

cebtenzzre · 2023-08-03T20:49:02Z

These are my currently installed GPUs:

ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla P40, compute capability 6.1
  Device 1: NVIDIA GeForce GTX 970, compute capability 5.2

I found that if I disabled my GTX 970 using --tensor-split 1,0, I still couldn't use the full compute capability of my Telsa P40. With this change, I am able to benefit from the --mul-mat-q option without physically removing the GTX 970 from my PC, which my displays are connected to.

JohannesGaessler

Alternatively you could have used the environmental variable CUDA_VISIBLE_DEVICES to hide the 970.

CUDA: use min compute capability of GPUs actually used

c1320fd

JohannesGaessler approved these changes Aug 4, 2023

View reviewed changes

JohannesGaessler merged commit 4329d1a into ggml-org:master Aug 4, 2023

This was referenced Nov 1, 2023

CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling #3110

Merged

cuda : fix disabling device with --tensor-split 1,0 #3951

Merged

cebtenzzre mentioned this pull request Nov 27, 2023

Assertion failure in ggml_mul_mat_q4_0_q8_1_cuda (g_compute_capabilities[id] >= MIN_CC_DP4A) #4229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: use min compute capability of GPUs actually used #2506

CUDA: use min compute capability of GPUs actually used #2506

cebtenzzre commented Aug 3, 2023

JohannesGaessler left a comment

CUDA: use min compute capability of GPUs actually used #2506

CUDA: use min compute capability of GPUs actually used #2506

Conversation

cebtenzzre commented Aug 3, 2023

JohannesGaessler left a comment

Choose a reason for hiding this comment