-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
参考指引安装docker后,运行cli_demo.py,提示killed #558
Labels
bug
Something isn't working
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
root@b3d1bd08095c:/chatGLM# python3 cli_demo.py
INFO 2023-06-06 21:17:40,876-1d:
loading model config
llm device: cuda
embedding device: cuda
dir: /chatGLM
flagging username: 27903ec3559f49dd9f8fdb1a1c0830e8
INFO 2023-06-06 21:17:42,295-1d: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO 2023-06-06 21:17:42,295-1d: NumExpr defaulting to 8 threads.
Loading chatglm-6b-int4...
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Kernels compiled : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 6
Using quantization cache
Applying quantization to glm layers
Killed
The text was updated successfully, but these errors were encountered: