We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow guide: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html command: main.exe -m "C:\Users\admin\Desktop\ipex\llama-cpp\llama-cpp\libs\models\qwen1_5-1_8b-chat-q5_k_m.gguf" -n 512 --prompt "我的问题是:什么是零售?请把回答限制在五 十个汉字以内。" -t 8 -e -ngl 33 --color --no-mmap --temp 0
main.exe -m "C:\Users\admin\Desktop\ipex\llama-cpp\llama-cpp\libs\models\qwen1_5-1_8b-chat-q5_k_m.gguf" -n 512 --prompt "我的问题是:什么是零售?请把回答限制在五 十个汉字以内。" -t 8 -e -ngl 33 --color --no-mmap --temp 0
log:
(cpp) C:\Users\admin\Desktop\cpp>main.exe -m "C:\Users\admin\Desktop\ipex\llama-cpp\llama-cpp\libs\models\qwen1_5-1_8b-chat-q5_k_m.gguf" -n 512 --prompt "请基于```内的 内容回答。```Context:问题是:商品 攀升现象的盛行说明了什么?答案是:商品 攀升现象的盛行说明不同类型零售商之间的竞争加剧。Context:问题是:零售竞争如何体现?答案是:垄断竞争。Context:问题是:什么 是零售业态?答案是:零售业态是指零售企业为满足不同的消费者需求而形成的不同的经营形态,例如百货商店、折扣商店、仓库商店等 。```我的问题是:什么是多渠道零售?请把回答限制在五 十个汉字以内。" -t 8 -e -ngl 999 --color Log start main: build = 1 (c26dd9e) main: built with IntelLLVM 2024.0.0 for main: seed = 1715406086 llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\admin\Desktop\ipex\llama-cpp\llama-cpp\libs\models\qwen1_5-1_8b-chat-q5_k_m.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.name str = Qwen1.5-1.8B-Chat-AWQ-fp16 llama_model_loader: - kv 2: qwen2.block_count u32 = 24 llama_model_loader: - kv 3: qwen2.context_length u32 = 32768 llama_model_loader: - kv 4: qwen2.embedding_length u32 = 2048 llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 5504 llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 16 llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 16 llama_model_loader: - kv 8: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 9: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 10: qwen2.use_parallel_residual bool = true llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 13: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 14: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 16: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 18: tokenizer.chat_template str = {% for message in messages %}{{'<|im_... llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - kv 20: general.file_type u32 = 17 llama_model_loader: - type f32: 121 tensors llama_model_loader: - type q5_1: 12 tensors llama_model_loader: - type q8_0: 12 tensors llama_model_loader: - type q5_K: 133 tensors llama_model_loader: - type q6_K: 13 tensors llm_load_vocab: missing pre-tokenizer type, using: 'default' llm_load_vocab: llm_load_vocab: ************************************ llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! llm_load_vocab: CONSIDER REGENERATING THE MODEL llm_load_vocab: ************************************ llm_load_vocab: llm_load_vocab: special tokens definition check successful ( 293/151936 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = qwen2 llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 151936 llm_load_print_meta: n_merges = 151387 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_layer = 24 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 2048 llm_load_print_meta: n_embd_v_gqa = 2048 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5504 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 1.84 B llm_load_print_meta: model size = 1.28 GiB (5.97 BPW) llm_load_print_meta: general.name = Qwen1.5-1.8B-Chat-AWQ-fp16 llm_load_print_meta: BOS token = 151643 '<|endoftext|>' llm_load_print_meta: EOS token = 151645 '<|im_end|>' llm_load_print_meta: PAD token = 151643 '<|endoftext|>' llm_load_print_meta: LF token = 148848 'ÄĬ' llm_load_print_meta: EOT token = 151645 '<|im_end|>' [SYCL] call ggml_init_sycl ggml_init_sycl: GGML_SYCL_DEBUG: 0 ggml_init_sycl: GGML_SYCL_F16: no found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | | |ID| Device Type| Name|Version|units |group |group|size | Driver version| |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| | 0| [level_zero:gpu:0]| Intel Arc Graphics| 1.3| 128| 1024| 32| 7446M| 1.3.28328| ggml_backend_sycl_set_mul_device_mode: true detect 1 SYCL GPUs: [0] with top Max compute units:128 llm_load_tensors: ggml ctx size = 0.28 MiB llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: SYCL0 buffer size = 1103.31 MiB llm_load_tensors: CPU buffer size = 204.02 MiB .................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: SYCL0 KV buffer size = 96.00 MiB llama_new_context_with_model: KV self size = 96.00 MiB, K (f16): 48.00 MiB, V (f16): 48.00 MiB llama_new_context_with_model: SYCL_Host output buffer size = 0.58 MiB llama_new_context_with_model: SYCL0 compute buffer size = 300.75 MiB llama_new_context_with_model: SYCL_Host compute buffer size = 5.01 MiB llama_new_context_with_model: graph nodes = 870 llama_new_context_with_model: graph splits = 2 system_info: n_threads = 8 / 22 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
CPU: MTL Ultra 7 165HL GPU version: 5333 or 5448 version: ipex-llm: 2.1.0b20240510
Could you please take a look on this issue? Thanks.
The text was updated successfully, but these errors were encountered:
After offline sync with @violet17 , we found this issue only happens with Chinese prompt. Apply below setting can solve this issue:
Sorry, something went wrong.
@rnwang04 Thanks for quick reply.
Maybe also update the quickstart?
Ok, I will do it.
rnwang04
No branches or pull requests
Follow guide: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html
command:
main.exe -m "C:\Users\admin\Desktop\ipex\llama-cpp\llama-cpp\libs\models\qwen1_5-1_8b-chat-q5_k_m.gguf" -n 512 --prompt "我的问题是:什么是零售?请把回答限制在五 十个汉字以内。" -t 8 -e -ngl 33 --color --no-mmap --temp 0
log:
CPU: MTL Ultra 7 165HL
GPU version: 5333 or 5448
version:
ipex-llm: 2.1.0b20240510
Could you please take a look on this issue? Thanks.
The text was updated successfully, but these errors were encountered: