You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(cpppython) root@oc:/ai/llama-cpp-python/vendor/llama.cpp/build/bin# ./benchmark
main: build = 2074 (098f6d73)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
Creating new tensors
$ python3 --version
Python 3.10.13
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation,
$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Failure Information (for bugs)
It doesn't block work, it just affects performance. Is there any way I can toggle these settings?
GGML_CUDA_FORCE_MMQ to YES
CUDA_USE_TENSOR_CORES to NO
Example environment info:
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
after start python3 -m llama_cpp.server --model /ai/models/functionary-7b-v1.Q5_K.gguf --n_gpu_layers 99 --main_gpu 1 --tensor_split 0.45 0.55 --n_ctx 4096 --host 192.168.0.55 --port 5000 --api_key toofoo
I expect the compiled parameters llama.cpp to be like this:
Current Behavior
When i start llama.cpp or localai , all is ok:
Environment and Context
using last llama-cpp-pythom from source
LLama.cpp compiled and linked as symlink to /ai/llama-cpp-python/vendor/llama.cpp
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
CPU family: 6
Model: 45
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
Stepping: 7
BogoMIPS: 5799.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good n
opl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm cpuid_fault pti
ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust xsaveopt arat umip md_clear arch_capabilities
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
Operating System, e.g. for Linux:
Linux oc 5.15.0-92-generic (windows) model doesn't generate nothing and stays running #102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
SDK version, e.g. for Linux:
Failure Information (for bugs)
It doesn't block work, it just affects performance. Is there any way I can toggle these settings?
GGML_CUDA_FORCE_MMQ to YES
CUDA_USE_TENSOR_CORES to NO
Example environment info:
The text was updated successfully, but these errors were encountered: