Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: Problems with official jinja templates (Gemma 2, Llama 3.2, Qwen 2.5) #11866

Open
MoonRide303 opened this issue Feb 14, 2025 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@MoonRide303
Copy link
Contributor

Name and Version

llama-cli --version
version: 4713 (a4f011e)
built with MSVC 19.42.34436.0 for x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

1. llama-server -ngl 99 -m gemma-2-2b-it-Q8_0.gguf --jinja --chat-template-file gemma2.jinja -c 8192
2. llama-server -ngl 99 -m Llama-3.2-3B-Instruct-Q8_0.gguf --jinja --chat-template-file llama3.2.jinja -c 8192
3. llama-server -ngl 99 -m Qwen2.5-1.5B-Instruct-Q8_0.gguf --jinja --chat-template-file qwen2.5.jinja -c 8192

Problem description & steps to reproduce

Extracting official chat templates from chat_template field in tokenizer_config.json (Gemma 2, Llama 3.2, Qwen 2.5), storing them in files, and then trying to use them with llama-server results in errors.

  1. Gemma 2: parse: error parsing grammar: expecting name at after each message.
  2. Llama 3.2: server doesn't start.
  3. Qwen 2.5: parse: error parsing grammar: expecting name at after each message.

@ochafik Could you look into this? It would be nice to have jinja implementation fully working with official templates, at least for major models.

First Bad Commit

No response

Relevant log output

@henryclw
Copy link

Thanks for pointing out. I'm having the same error as well. I didn't use jinja template until llama.cpp supports tool calling, so didn't notice until I switch to tool calling.

Right now I'm trying to location which commit introduce the bug.

@ochafik
Copy link
Collaborator

ochafik commented Feb 14, 2025

Hey @MoonRide303 , @henryclw , thanks for reporting this! Are you both experiencing this on Windows?

Could you try fetching the template with ./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja ? (or probably with something like py script\get_chat_template.py google/gemma-2-2b-it > gemma2.jinja if not running inside a WSL shell)

(these templates seem to work on my mac, maybe some line ending issue or bad unescaping of the JSON string if editing them manually?)

@henryclw
Copy link

@ochafik

My finding:
Since 4a2b196d , if you use llama.cpp with --jinja but doesn't provide the tools in the API call, would produce this error logs:

2025-02-14T22:43:34.104015028Z parse: error parsing grammar: expecting name at �
2025-02-14T22:43:34.104022587Z 
2025-02-14T22:43:34.104025971Z �
2025-02-14T22:43:34.104161292Z slot launch_slot_: id  0 | task 2 | processing task
2025-02-14T22:43:34.104171597Z que    start_loop: update slots
2025-02-14T22:43:34.104174354Z srv  update_slots: posting NEXT_RESPONSE

But if you use llama.cpp with --jinja, and provide a tools in the API call, even with empty tools, there would be no error logs

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-3.5-turbo",
  "tools": [  ],
  "messages": [
    {
      "role": "user",
      "content": "Print a hello world message with python."
    }
  ]
}'

I'm not sure if the jinja template option must comes with the tools option, and what is the expected behavior and usage?

Hope this finding might be helpful. If you need any help please feel free to reply.

@henryclw
Copy link

More detailed logs:

Without tools:

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Where is Vancouver?"
    }
  ]
}'
Logs for API call without tools
2025-02-14T22:56:16.236296080Z }
2025-02-14T22:56:16.236298714Z [common_chat_params_init] has_tools=false
2025-02-14T22:56:16.236301194Z Prompt: <|im_start|>system
2025-02-14T22:56:16.236304013Z You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
2025-02-14T22:56:16.236306627Z <|im_start|>user
2025-02-14T22:56:16.236309096Z Where is Vancouver?<|im_end|>
2025-02-14T22:56:16.236311576Z <|im_start|>assistant
2025-02-14T22:56:16.236314004Z 
2025-02-14T22:56:16.238224159Z Grammar: �
2025-02-14T22:56:16.238263195Z Grammar lazy: false
2025-02-14T22:56:16.238270758Z Chat format: Content-only
2025-02-14T22:56:16.238274102Z srv  add_waiting_: add task 51 to waiting list. current waiting = 0 (before add)
2025-02-14T22:56:16.238276653Z que          post: new task, id = 51/1, front = 0
2025-02-14T22:56:16.238279030Z que    start_loop: processing new tasks
2025-02-14T22:56:16.238281376Z que    start_loop: processing task, id = 51
2025-02-14T22:56:16.238283763Z slot get_availabl: id  0 | task 6 | selected slot by lru, t_last = 238580703706
2025-02-14T22:56:16.238286253Z slot        reset: id  0 | task 6 | 
2025-02-14T22:56:16.238433085Z slot launch_slot_: id  0 | task 51 | launching slot : {"id":0,"id_task":51,"n_ctx":4096,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"\u0001","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","next_token":{"has_next_token":false,"has_new_line":false,"n_remain":-1,"n_decoded":44,"stopping_word":""}}
2025-02-14T22:56:16.238463211Z parse: error parsing grammar: expecting name at �
2025-02-14T22:56:16.238466751Z 
2025-02-14T22:56:16.238469210Z �
2025-02-14T22:56:16.238471587Z slot launch_slot_: id  0 | task 51 | processing task
2025-02-14T22:56:16.238474262Z que    start_loop: update slots
2025-02-14T22:56:16.238476628Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.238478943Z que          post: new task, id = 52, front = 0
2025-02-14T22:56:16.238481320Z slot update_slots: id  0 | task 51 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 33
2025-02-14T22:56:16.238483707Z slot update_slots: id  0 | task 51 | prompt token   0: 151644 '<|im_start|>'
2025-02-14T22:56:16.238486145Z slot update_slots: id  0 | task 51 | prompt token   1:   8948 'system'
2025-02-14T22:56:16.238488471Z slot update_slots: id  0 | task 51 | prompt token   2:    198 '
2025-02-14T22:56:16.238490817Z '
2025-02-14T22:56:16.238493101Z slot update_slots: id  0 | task 51 | prompt token   3:   2610 'You'
2025-02-14T22:56:16.238496424Z slot update_slots: id  0 | task 51 | prompt token   4:    525 ' are'
2025-02-14T22:56:16.238499048Z slot update_slots: id  0 | task 51 | prompt token   5:   1207 ' Q'
2025-02-14T22:56:16.238501394Z slot update_slots: id  0 | task 51 | prompt token   6:  16948 'wen'
2025-02-14T22:56:16.238503956Z slot update_slots: id  0 | task 51 | prompt token   7:     11 ','
2025-02-14T22:56:16.238519502Z slot update_slots: id  0 | task 51 | prompt token   8:   3465 ' created'
2025-02-14T22:56:16.238526653Z slot update_slots: id  0 | task 51 | prompt token   9:    553 ' by'
2025-02-14T22:56:16.238529565Z slot update_slots: id  0 | task 51 | prompt token  10:  54364 ' Alibaba'
2025-02-14T22:56:16.238532024Z slot update_slots: id  0 | task 51 | prompt token  11:  14817 ' Cloud'
2025-02-14T22:56:16.238534534Z slot update_slots: id  0 | task 51 | prompt token  12:     13 '.'
2025-02-14T22:56:16.238536891Z slot update_slots: id  0 | task 51 | prompt token  13:   1446 ' You'
2025-02-14T22:56:16.238539216Z slot update_slots: id  0 | task 51 | prompt token  14:    525 ' are'
2025-02-14T22:56:16.238541551Z slot update_slots: id  0 | task 51 | prompt token  15:    264 ' a'
2025-02-14T22:56:16.238543897Z slot update_slots: id  0 | task 51 | need to evaluate at least 1 token to generate logits, n_past = 33, n_prompt_tokens = 33
2025-02-14T22:56:16.238546295Z slot update_slots: id  0 | task 51 | kv cache rm [32, end)
2025-02-14T22:56:16.238557376Z slot update_slots: id  0 | task 51 | prompt processing progress, n_past = 33, n_tokens = 1, progress = 0.030303
2025-02-14T22:56:16.238559856Z slot update_slots: id  0 | task 51 | prompt done, n_past = 33, n_tokens = 1
2025-02-14T22:56:16.238562387Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.252891531Z slot process_toke: id  0 | task 51 | n_decoded = 1, n_remaining = -1, next token:    53 'V'
2025-02-14T22:56:16.252931504Z srv  update_slots: run slots completed
2025-02-14T22:56:16.252940383Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.252944540Z que    start_loop: processing new tasks
2025-02-14T22:56:16.252947709Z que    start_loop: processing task, id = 52
2025-02-14T22:56:16.252950785Z que    start_loop: update slots
2025-02-14T22:56:16.252954047Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.252990346Z que          post: new task, id = 53, front = 0
2025-02-14T22:56:16.252993978Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 34, n_cache_tokens = 34, truncated = 0
2025-02-14T22:56:16.252997230Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.264186466Z slot process_toke: id  0 | task 51 | n_decoded = 2, n_remaining = -1, next token: 20471 'ancouver'
2025-02-14T22:56:16.264237211Z srv  update_slots: run slots completed
2025-02-14T22:56:16.264245411Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.264248735Z que    start_loop: processing new tasks
2025-02-14T22:56:16.264251328Z que    start_loop: processing task, id = 53
2025-02-14T22:56:16.264253838Z que    start_loop: update slots
2025-02-14T22:56:16.264256225Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.264258643Z que          post: new task, id = 54, front = 0
2025-02-14T22:56:16.264261061Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 35, n_cache_tokens = 35, truncated = 0
2025-02-14T22:56:16.264263633Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.275229823Z slot process_toke: id  0 | task 51 | n_decoded = 3, n_remaining = -1, next token:   374 ' is'
2025-02-14T22:56:16.275266184Z srv  update_slots: run slots completed
2025-02-14T22:56:16.275273428Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.275276597Z que    start_loop: processing new tasks
2025-02-14T22:56:16.275279190Z que    start_loop: processing task, id = 54
2025-02-14T22:56:16.275281803Z que    start_loop: update slots
2025-02-14T22:56:16.275284169Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.275286515Z que          post: new task, id = 55, front = 0
2025-02-14T22:56:16.275289046Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 36, n_cache_tokens = 36, truncated = 0
2025-02-14T22:56:16.275301547Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.286878187Z slot process_toke: id  0 | task 51 | n_decoded = 4, n_remaining = -1, next token:   264 ' a'
2025-02-14T22:56:16.286928315Z srv  update_slots: run slots completed
2025-02-14T22:56:16.286936319Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.286940260Z que    start_loop: processing new tasks
2025-02-14T22:56:16.286943758Z que    start_loop: processing task, id = 55
2025-02-14T22:56:16.286947627Z que    start_loop: update slots
2025-02-14T22:56:16.286951012Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.286954243Z que          post: new task, id = 56, front = 0
2025-02-14T22:56:16.286957648Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 37, n_cache_tokens = 37, truncated = 0
2025-02-14T22:56:16.286961034Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.297429910Z slot process_toke: id  0 | task 51 | n_decoded = 5, n_remaining = -1, next token:  3598 ' major'
2025-02-14T22:56:16.297465963Z srv  update_slots: run slots completed
2025-02-14T22:56:16.297473278Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.297476581Z que    start_loop: processing new tasks
2025-02-14T22:56:16.297479102Z que    start_loop: processing task, id = 56
2025-02-14T22:56:16.297481530Z que    start_loop: update slots
2025-02-14T22:56:16.297483948Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.297486345Z que          post: new task, id = 57, front = 0
2025-02-14T22:56:16.297489226Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 38, n_cache_tokens = 38, truncated = 0
2025-02-14T22:56:16.297491798Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.308420791Z slot process_toke: id  0 | task 51 | n_decoded = 6, n_remaining = -1, next token:  3283 ' city'
2025-02-14T22:56:16.308463532Z srv  update_slots: run slots completed
2025-02-14T22:56:16.308484573Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.308490540Z que    start_loop: processing new tasks
2025-02-14T22:56:16.308493576Z que    start_loop: processing task, id = 57
2025-02-14T22:56:16.308496127Z que    start_loop: update slots
2025-02-14T22:56:16.308498545Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.308501220Z que          post: new task, id = 58, front = 0
2025-02-14T22:56:16.308503679Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 39, n_cache_tokens = 39, truncated = 0
2025-02-14T22:56:16.308506365Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.319715997Z slot process_toke: id  0 | task 51 | n_decoded = 7, n_remaining = -1, next token:  7407 ' located'
2025-02-14T22:56:16.319800933Z srv  update_slots: run slots completed
2025-02-14T22:56:16.319811047Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.319815039Z que    start_loop: processing new tasks
2025-02-14T22:56:16.319818455Z que    start_loop: processing task, id = 58
2025-02-14T22:56:16.319821490Z que    start_loop: update slots
2025-02-14T22:56:16.319824536Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.319827571Z que          post: new task, id = 59, front = 0
2025-02-14T22:56:16.319830637Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 40, n_cache_tokens = 40, truncated = 0
2025-02-14T22:56:16.319833837Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.331440241Z slot process_toke: id  0 | task 51 | n_decoded = 8, n_remaining = -1, next token:   389 ' on'
2025-02-14T22:56:16.331479792Z srv  update_slots: run slots completed
2025-02-14T22:56:16.331487406Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.331490668Z que    start_loop: processing new tasks
2025-02-14T22:56:16.331493281Z que    start_loop: processing task, id = 59
2025-02-14T22:56:16.331495678Z que    start_loop: update slots
2025-02-14T22:56:16.331498055Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.331500463Z que          post: new task, id = 60, front = 0
2025-02-14T22:56:16.331502881Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 41, n_cache_tokens = 41, truncated = 0
2025-02-14T22:56:16.331505473Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.342951902Z slot process_toke: id  0 | task 51 | n_decoded = 9, n_remaining = -1, next token:   279 ' the'
2025-02-14T22:56:16.342994869Z srv  update_slots: run slots completed
2025-02-14T22:56:16.343002256Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.343005487Z que    start_loop: processing new tasks
2025-02-14T22:56:16.343008018Z que    start_loop: processing task, id = 60
2025-02-14T22:56:16.343010436Z que    start_loop: update slots
2025-02-14T22:56:16.343012977Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.343015375Z que          post: new task, id = 61, front = 0
2025-02-14T22:56:16.343017741Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 42, n_cache_tokens = 42, truncated = 0
2025-02-14T22:56:16.343020241Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.354398013Z slot process_toke: id  0 | task 51 | n_decoded = 10, n_remaining = -1, next token:  9710 ' west'
2025-02-14T22:56:16.354446978Z srv  update_slots: run slots completed
2025-02-14T22:56:16.354455178Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.354473163Z que    start_loop: processing new tasks
2025-02-14T22:56:16.354475962Z que    start_loop: processing task, id = 61
2025-02-14T22:56:16.354478349Z que    start_loop: update slots
2025-02-14T22:56:16.354481075Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.354483534Z que          post: new task, id = 62, front = 0
2025-02-14T22:56:16.354485993Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 43, n_cache_tokens = 43, truncated = 0
2025-02-14T22:56:16.354488514Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.365622027Z slot process_toke: id  0 | task 51 | n_decoded = 11, n_remaining = -1, next token: 13648 ' coast'
2025-02-14T22:56:16.365663780Z srv  update_slots: run slots completed
2025-02-14T22:56:16.365671064Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.365674717Z que    start_loop: processing new tasks
2025-02-14T22:56:16.365677207Z que    start_loop: processing task, id = 62
2025-02-14T22:56:16.365679522Z que    start_loop: update slots
2025-02-14T22:56:16.365681868Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.365684203Z que          post: new task, id = 63, front = 0
2025-02-14T22:56:16.365686673Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 44, n_cache_tokens = 44, truncated = 0
2025-02-14T22:56:16.365689152Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.376510832Z slot process_toke: id  0 | task 51 | n_decoded = 12, n_remaining = -1, next token:   315 ' of'
2025-02-14T22:56:16.376565446Z srv  update_slots: run slots completed
2025-02-14T22:56:16.376574696Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.376578873Z que    start_loop: processing new tasks
2025-02-14T22:56:16.376582351Z que    start_loop: processing task, id = 63
2025-02-14T22:56:16.376585695Z que    start_loop: update slots
2025-02-14T22:56:16.376588658Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.376591642Z que          post: new task, id = 64, front = 0
2025-02-14T22:56:16.376594677Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 45, n_cache_tokens = 45, truncated = 0
2025-02-14T22:56:16.376597826Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.387883679Z slot process_toke: id  0 | task 51 | n_decoded = 13, n_remaining = -1, next token:  6864 ' Canada'
2025-02-14T22:56:16.387925719Z srv  update_slots: run slots completed
2025-02-14T22:56:16.387933292Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.387936461Z que    start_loop: processing new tasks
2025-02-14T22:56:16.387938920Z que    start_loop: processing task, id = 64
2025-02-14T22:56:16.387941235Z que    start_loop: update slots
2025-02-14T22:56:16.387953911Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.387956350Z que          post: new task, id = 65, front = 0
2025-02-14T22:56:16.387958706Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 46, n_cache_tokens = 46, truncated = 0
2025-02-14T22:56:16.387961083Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.398513621Z slot process_toke: id  0 | task 51 | n_decoded = 14, n_remaining = -1, next token:    13 '.'
2025-02-14T22:56:16.398559479Z srv  update_slots: run slots completed
2025-02-14T22:56:16.398566774Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.398569912Z que    start_loop: processing new tasks
2025-02-14T22:56:16.398572392Z que    start_loop: processing task, id = 65
2025-02-14T22:56:16.398574892Z que    start_loop: update slots
2025-02-14T22:56:16.398577228Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.398579841Z que          post: new task, id = 66, front = 0
2025-02-14T22:56:16.398582280Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 47, n_cache_tokens = 47, truncated = 0
2025-02-14T22:56:16.398584934Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.409952873Z slot process_toke: id  0 | task 51 | n_decoded = 15, n_remaining = -1, next token:  1084 ' It'
2025-02-14T22:56:16.409996262Z srv  update_slots: run slots completed
2025-02-14T22:56:16.410003670Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.410006757Z que    start_loop: processing new tasks
2025-02-14T22:56:16.410009319Z que    start_loop: processing task, id = 66
2025-02-14T22:56:16.410011603Z que    start_loop: update slots
2025-02-14T22:56:16.410013918Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.410016233Z que          post: new task, id = 67, front = 0
2025-02-14T22:56:16.410018558Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 48, n_cache_tokens = 48, truncated = 0
2025-02-14T22:56:16.410021141Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.421285060Z slot process_toke: id  0 | task 51 | n_decoded = 16, n_remaining = -1, next token:   374 ' is'
2025-02-14T22:56:16.421335620Z srv  update_slots: run slots completed
2025-02-14T22:56:16.421344489Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.421348615Z que    start_loop: processing new tasks
2025-02-14T22:56:16.421351969Z que    start_loop: processing task, id = 67
2025-02-14T22:56:16.421355036Z que    start_loop: update slots
2025-02-14T22:56:16.421358122Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.421361085Z que          post: new task, id = 68, front = 0
2025-02-14T22:56:16.421375747Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 49, n_cache_tokens = 49, truncated = 0
2025-02-14T22:56:16.421379348Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.432129889Z slot process_toke: id  0 | task 51 | n_decoded = 17, n_remaining = -1, next token:   279 ' the'
2025-02-14T22:56:16.432172876Z srv  update_slots: run slots completed
2025-02-14T22:56:16.432183042Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.432186468Z que    start_loop: processing new tasks
2025-02-14T22:56:16.432189133Z que    start_loop: processing task, id = 68
2025-02-14T22:56:16.432191643Z que    start_loop: update slots
2025-02-14T22:56:16.432194339Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.432196860Z que          post: new task, id = 69, front = 0
2025-02-14T22:56:16.432199401Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 50, n_cache_tokens = 50, truncated = 0
2025-02-14T22:56:16.432202035Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.443518146Z slot process_toke: id  0 | task 51 | n_decoded = 18, n_remaining = -1, next token:  7772 ' largest'
2025-02-14T22:56:16.443570692Z srv  update_slots: run slots completed
2025-02-14T22:56:16.443578697Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.443582463Z que    start_loop: processing new tasks
2025-02-14T22:56:16.443585241Z que    start_loop: processing task, id = 69
2025-02-14T22:56:16.443587905Z que    start_loop: update slots
2025-02-14T22:56:16.443590323Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.443592659Z que          post: new task, id = 70, front = 0
2025-02-14T22:56:16.443595128Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 51, n_cache_tokens = 51, truncated = 0
2025-02-14T22:56:16.443597618Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.454617224Z slot process_toke: id  0 | task 51 | n_decoded = 19, n_remaining = -1, next token:  3283 ' city'
2025-02-14T22:56:16.454640477Z srv  update_slots: run slots completed
2025-02-14T22:56:16.454643759Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.454646280Z que    start_loop: processing new tasks
2025-02-14T22:56:16.454648800Z que    start_loop: processing task, id = 70
2025-02-14T22:56:16.454651301Z que    start_loop: update slots
2025-02-14T22:56:16.454653688Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.454656106Z que          post: new task, id = 71, front = 0
2025-02-14T22:56:16.454658616Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 52, n_cache_tokens = 52, truncated = 0
2025-02-14T22:56:16.454661055Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.466666853Z slot process_toke: id  0 | task 51 | n_decoded = 20, n_remaining = -1, next token:   304 ' in'
2025-02-14T22:56:16.466707340Z srv  update_slots: run slots completed
2025-02-14T22:56:16.466714882Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.466718061Z que    start_loop: processing new tasks
2025-02-14T22:56:16.466720489Z que    start_loop: processing task, id = 71
2025-02-14T22:56:16.466723041Z que    start_loop: update slots
2025-02-14T22:56:16.466725325Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.466727578Z que          post: new task, id = 72, front = 0
2025-02-14T22:56:16.466729883Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 53, n_cache_tokens = 53, truncated = 0
2025-02-14T22:56:16.466732250Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.477458318Z slot process_toke: id  0 | task 51 | n_decoded = 21, n_remaining = -1, next token:   279 ' the'
2025-02-14T22:56:16.477500821Z srv  update_slots: run slots completed
2025-02-14T22:56:16.477509742Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.477513785Z que    start_loop: processing new tasks
2025-02-14T22:56:16.477516563Z que    start_loop: processing task, id = 72
2025-02-14T22:56:16.477519084Z que    start_loop: update slots
2025-02-14T22:56:16.477521667Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.477524095Z que          post: new task, id = 73, front = 0
2025-02-14T22:56:16.477526472Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 54, n_cache_tokens = 54, truncated = 0
2025-02-14T22:56:16.477528972Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.488350701Z slot process_toke: id  0 | task 51 | n_decoded = 22, n_remaining = -1, next token: 16847 ' province'
2025-02-14T22:56:16.488391126Z srv  update_slots: run slots completed
2025-02-14T22:56:16.488398473Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.488401395Z que    start_loop: processing new tasks
2025-02-14T22:56:16.488403813Z que    start_loop: processing task, id = 73
2025-02-14T22:56:16.488406189Z que    start_loop: update slots
2025-02-14T22:56:16.488408597Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.488411457Z que          post: new task, id = 74, front = 0
2025-02-14T22:56:16.488414060Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 55, n_cache_tokens = 55, truncated = 0
2025-02-14T22:56:16.488416591Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.500276104Z slot process_toke: id  0 | task 51 | n_decoded = 23, n_remaining = -1, next token:   315 ' of'
2025-02-14T22:56:16.500311343Z srv  update_slots: run slots completed
2025-02-14T22:56:16.500338794Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.500342395Z que    start_loop: processing new tasks
2025-02-14T22:56:16.500345019Z que    start_loop: processing task, id = 74
2025-02-14T22:56:16.500347313Z que    start_loop: update slots
2025-02-14T22:56:16.500349649Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.500351882Z que          post: new task, id = 75, front = 0
2025-02-14T22:56:16.500354228Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 56, n_cache_tokens = 56, truncated = 0
2025-02-14T22:56:16.500357757Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.511841560Z slot process_toke: id  0 | task 51 | n_decoded = 24, n_remaining = -1, next token:  7855 ' British'
2025-02-14T22:56:16.511885731Z srv  update_slots: run slots completed
2025-02-14T22:56:16.511894250Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.511898047Z que    start_loop: processing new tasks
2025-02-14T22:56:16.511901030Z que    start_loop: processing task, id = 75
2025-02-14T22:56:16.511903870Z que    start_loop: update slots
2025-02-14T22:56:16.511906658Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.511909416Z que          post: new task, id = 76, front = 0
2025-02-14T22:56:16.511912266Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 57, n_cache_tokens = 57, truncated = 0
2025-02-14T22:56:16.511915281Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.522323024Z slot process_toke: id  0 | task 51 | n_decoded = 25, n_remaining = -1, next token: 18796 ' Columbia'
2025-02-14T22:56:16.522365065Z srv  update_slots: run slots completed
2025-02-14T22:56:16.522373162Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.522376208Z que    start_loop: processing new tasks
2025-02-14T22:56:16.522378677Z que    start_loop: processing task, id = 76
2025-02-14T22:56:16.522380972Z que    start_loop: update slots
2025-02-14T22:56:16.522383420Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.522385725Z que          post: new task, id = 77, front = 0
2025-02-14T22:56:16.522388308Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 58, n_cache_tokens = 58, truncated = 0
2025-02-14T22:56:16.522390746Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.534017895Z slot process_toke: id  0 | task 51 | n_decoded = 26, n_remaining = -1, next token:   323 ' and'
2025-02-14T22:56:16.534042023Z srv  update_slots: run slots completed
2025-02-14T22:56:16.534045336Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.534047960Z que    start_loop: processing new tasks
2025-02-14T22:56:16.534050419Z que    start_loop: processing task, id = 77
2025-02-14T22:56:16.534064720Z que    start_loop: update slots
2025-02-14T22:56:16.534067190Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.534069618Z que          post: new task, id = 78, front = 0
2025-02-14T22:56:16.534072025Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 59, n_cache_tokens = 59, truncated = 0
2025-02-14T22:56:16.534074567Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.545669409Z slot process_toke: id  0 | task 51 | n_decoded = 27, n_remaining = -1, next token:   374 ' is'
2025-02-14T22:56:16.545714536Z srv  update_slots: run slots completed
2025-02-14T22:56:16.545721955Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.545724969Z que    start_loop: processing new tasks
2025-02-14T22:56:16.545727439Z que    start_loop: processing task, id = 78
2025-02-14T22:56:16.545729846Z que    start_loop: update slots
2025-02-14T22:56:16.545732254Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.545734713Z que          post: new task, id = 79, front = 0
2025-02-14T22:56:16.545737172Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 60, n_cache_tokens = 60, truncated = 0
2025-02-14T22:56:16.545739590Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.557683053Z slot process_toke: id  0 | task 51 | n_decoded = 28, n_remaining = -1, next token:  3881 ' known'
2025-02-14T22:56:16.557726884Z srv  update_slots: run slots completed
2025-02-14T22:56:16.557734498Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.557737862Z que    start_loop: processing new tasks
2025-02-14T22:56:16.557740486Z que    start_loop: processing task, id = 79
2025-02-14T22:56:16.557743120Z que    start_loop: update slots
2025-02-14T22:56:16.557745610Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.557748069Z que          post: new task, id = 80, front = 0
2025-02-14T22:56:16.557750754Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 61, n_cache_tokens = 61, truncated = 0
2025-02-14T22:56:16.557753347Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.568554322Z slot process_toke: id  0 | task 51 | n_decoded = 29, n_remaining = -1, next token:   369 ' for'
2025-02-14T22:56:16.568593132Z srv  update_slots: run slots completed
2025-02-14T22:56:16.568600602Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.568603822Z que    start_loop: processing new tasks
2025-02-14T22:56:16.568606322Z que    start_loop: processing task, id = 80
2025-02-14T22:56:16.568608678Z que    start_loop: update slots
2025-02-14T22:56:16.568611065Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.568613380Z que          post: new task, id = 81, front = 0
2025-02-14T22:56:16.568625048Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 62, n_cache_tokens = 62, truncated = 0
2025-02-14T22:56:16.568627785Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.581090778Z slot process_toke: id  0 | task 51 | n_decoded = 30, n_remaining = -1, next token:  1181 ' its'
2025-02-14T22:56:16.581140535Z srv  update_slots: run slots completed
2025-02-14T22:56:16.581149847Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.581153170Z que    start_loop: processing new tasks
2025-02-14T22:56:16.581156133Z que    start_loop: processing task, id = 81
2025-02-14T22:56:16.581158788Z que    start_loop: update slots
2025-02-14T22:56:16.581161319Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.581164118Z que          post: new task, id = 82, front = 0
2025-02-14T22:56:16.581166597Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 63, n_cache_tokens = 63, truncated = 0
2025-02-14T22:56:16.581169180Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.592508134Z slot process_toke: id  0 | task 51 | n_decoded = 31, n_remaining = -1, next token:  5810 ' natural'
2025-02-14T22:56:16.592554599Z srv  update_slots: run slots completed
2025-02-14T22:56:16.592562089Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.592565783Z que    start_loop: processing new tasks
2025-02-14T22:56:16.592568417Z que    start_loop: processing task, id = 82
2025-02-14T22:56:16.592570948Z que    start_loop: update slots
2025-02-14T22:56:16.592573325Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.592575660Z que          post: new task, id = 83, front = 0
2025-02-14T22:56:16.592578027Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 64, n_cache_tokens = 64, truncated = 0
2025-02-14T22:56:16.592580537Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.603589752Z slot process_toke: id  0 | task 51 | n_decoded = 32, n_remaining = -1, next token: 13143 ' beauty'
2025-02-14T22:56:16.603621412Z srv  update_slots: run slots completed
2025-02-14T22:56:16.603624900Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.603627503Z que    start_loop: processing new tasks
2025-02-14T22:56:16.603629951Z que    start_loop: processing task, id = 83
2025-02-14T22:56:16.603632482Z que    start_loop: update slots
2025-02-14T22:56:16.603634931Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.603637370Z que          post: new task, id = 84, front = 0
2025-02-14T22:56:16.603640065Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 65, n_cache_tokens = 65, truncated = 0
2025-02-14T22:56:16.603652217Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.616362996Z slot process_toke: id  0 | task 51 | n_decoded = 33, n_remaining = -1, next token:    11 ','
2025-02-14T22:56:16.616416334Z srv  update_slots: run slots completed
2025-02-14T22:56:16.616424205Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.616427487Z que    start_loop: processing new tasks
2025-02-14T22:56:16.616430234Z que    start_loop: processing task, id = 84
2025-02-14T22:56:16.616432806Z que    start_loop: update slots
2025-02-14T22:56:16.616435399Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.616437766Z que          post: new task, id = 85, front = 0
2025-02-14T22:56:16.616440173Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 66, n_cache_tokens = 66, truncated = 0
2025-02-14T22:56:16.616442704Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.628510006Z slot process_toke: id  0 | task 51 | n_decoded = 34, n_remaining = -1, next token: 23034 ' mild'
2025-02-14T22:56:16.628539422Z srv  update_slots: run slots completed
2025-02-14T22:56:16.628542714Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.628545492Z que    start_loop: processing new tasks
2025-02-14T22:56:16.628548198Z que    start_loop: processing task, id = 85
2025-02-14T22:56:16.628550863Z que    start_loop: update slots
2025-02-14T22:56:16.628553291Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.628555658Z que          post: new task, id = 86, front = 0
2025-02-14T22:56:16.628558065Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 67, n_cache_tokens = 67, truncated = 0
2025-02-14T22:56:16.628560545Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.640030686Z slot process_toke: id  0 | task 51 | n_decoded = 35, n_remaining = -1, next token:  9977 ' climate'
2025-02-14T22:56:16.640081667Z srv  update_slots: run slots completed
2025-02-14T22:56:16.640089333Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.640092646Z que    start_loop: processing new tasks
2025-02-14T22:56:16.640095475Z que    start_loop: processing task, id = 86
2025-02-14T22:56:16.640097924Z que    start_loop: update slots
2025-02-14T22:56:16.640100537Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.640103099Z que          post: new task, id = 87, front = 0
2025-02-14T22:56:16.640105774Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 68, n_cache_tokens = 68, truncated = 0
2025-02-14T22:56:16.640108306Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.651740288Z slot process_toke: id  0 | task 51 | n_decoded = 36, n_remaining = -1, next token:    11 ','
2025-02-14T22:56:16.651783142Z srv  update_slots: run slots completed
2025-02-14T22:56:16.651802073Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.651805695Z que    start_loop: processing new tasks
2025-02-14T22:56:16.651808555Z que    start_loop: processing task, id = 87
2025-02-14T22:56:16.651811550Z que    start_loop: update slots
2025-02-14T22:56:16.651814132Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.651817301Z que          post: new task, id = 88, front = 0
2025-02-14T22:56:16.651819945Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 69, n_cache_tokens = 69, truncated = 0
2025-02-14T22:56:16.651822692Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.662228893Z slot process_toke: id  0 | task 51 | n_decoded = 37, n_remaining = -1, next token:   323 ' and'
2025-02-14T22:56:16.662269884Z srv  update_slots: run slots completed
2025-02-14T22:56:16.662277621Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.662280934Z que    start_loop: processing new tasks
2025-02-14T22:56:16.662283507Z que    start_loop: processing task, id = 88
2025-02-14T22:56:16.662286285Z que    start_loop: update slots
2025-02-14T22:56:16.662288682Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.662291059Z que          post: new task, id = 89, front = 0
2025-02-14T22:56:16.662293487Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 70, n_cache_tokens = 70, truncated = 0
2025-02-14T22:56:16.662296028Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.673867911Z slot process_toke: id  0 | task 51 | n_decoded = 38, n_remaining = -1, next token: 16807 ' diverse'
2025-02-14T22:56:16.673915538Z srv  update_slots: run slots completed
2025-02-14T22:56:16.673924531Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.673928626Z que    start_loop: processing new tasks
2025-02-14T22:56:16.673931826Z que    start_loop: processing task, id = 89
2025-02-14T22:56:16.673935108Z que    start_loop: update slots
2025-02-14T22:56:16.673938400Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.673941899Z que          post: new task, id = 90, front = 0
2025-02-14T22:56:16.673947959Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 71, n_cache_tokens = 71, truncated = 0
2025-02-14T22:56:16.673951282Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.684979679Z slot process_toke: id  0 | task 51 | n_decoded = 39, n_remaining = -1, next token:  7042 ' population'
2025-02-14T22:56:16.685023181Z srv  update_slots: run slots completed
2025-02-14T22:56:16.685031176Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.685034663Z que    start_loop: processing new tasks
2025-02-14T22:56:16.685046578Z que    start_loop: processing task, id = 90
2025-02-14T22:56:16.685064172Z que    start_loop: update slots
2025-02-14T22:56:16.685069533Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.685072331Z que          post: new task, id = 91, front = 0
2025-02-14T22:56:16.685074996Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 72, n_cache_tokens = 72, truncated = 0
2025-02-14T22:56:16.685077764Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.698334257Z slot process_toke: id  0 | task 51 | n_decoded = 40, n_remaining = -1, next token:    13 '.'
2025-02-14T22:56:16.698371832Z srv  update_slots: run slots completed
2025-02-14T22:56:16.698376308Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.698379497Z que    start_loop: processing new tasks
2025-02-14T22:56:16.698382450Z que    start_loop: processing task, id = 91
2025-02-14T22:56:16.698385444Z que    start_loop: update slots
2025-02-14T22:56:16.698388325Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T22:56:16.698391247Z que          post: new task, id = 92, front = 0
2025-02-14T22:56:16.698394221Z slot update_slots: id  0 | task 51 | slot decode token, n_ctx = 4096, n_past = 73, n_cache_tokens = 73, truncated = 0
2025-02-14T22:56:16.698397359Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T22:56:16.709826386Z slot process_toke: id  0 | task 51 | stopped by EOS
2025-02-14T22:56:16.709870372Z slot process_toke: id  0 | task 51 | n_decoded = 41, n_remaining = -1, next token: 151645 ''
2025-02-14T22:56:16.709878829Z slot      release: id  0 | task 51 | stop processing: n_past = 73, truncated = 0
2025-02-14T22:56:16.709882369Z slot print_timing: id  0 | task 51 | 
2025-02-14T22:56:16.709884982Z prompt eval time =      14.21 ms /     1 tokens (   14.21 ms per token,    70.40 tokens per second)
2025-02-14T22:56:16.709887688Z        eval time =     456.99 ms /    41 tokens (   11.15 ms per token,    89.72 tokens per second)
2025-02-14T22:56:16.709891022Z       total time =     471.19 ms /    42 tokens
2025-02-14T22:56:16.709893481Z srv          send: sending result for task id = 51
2025-02-14T22:56:16.709896135Z srv          send: task id = 51 pushed to result queue
2025-02-14T22:56:16.709898708Z srv  update_slots: run slots completed
2025-02-14T22:56:16.709901115Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.709903502Z que    start_loop: processing new tasks
2025-02-14T22:56:16.709905858Z que    start_loop: processing task, id = 92
2025-02-14T22:56:16.709908215Z que    start_loop: update slots
2025-02-14T22:56:16.709910715Z srv  update_slots: all slots are idle
2025-02-14T22:56:16.709913092Z que    start_loop: waiting for new tasks
2025-02-14T22:56:16.709927249Z srv  remove_waiti: remove task 51 from waiting list. current waiting = 1 (before remove)
2025-02-14T22:56:16.710111310Z request: POST /v1/chat/completions 172.17.0.1 200








2025-02-14T22:56:16.710144502Z }
2025-02-14T22:56:16.710150789Z response: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Vancouver is a major city located on the west coast of Canada. It is the largest city in the province of British Columbia and is known for its natural beauty, mild climate, and diverse population.","tool_calls":null,"role":"assistant"}}],"created":1739573776,"model":"gpt-3.5-turbo","system_fingerprint":"b4603-4a2b196d","object":"chat.completion","usage":{"completion_tokens":41,"prompt_tokens":33,"total_tokens":74},"id":"chatcmpl-sr0c0yuDP6Ha8Q91sVp8YgLdu0iKHZjq","__verbose":{"index":0,"content":"Vancouver is a major city located on the west coast of Canada. It is the largest city in the province of British Columbia and is known for its natural beauty, mild climate, and diverse population.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":41,"tokens_evaluated":33,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"\u0001","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":73,"timings":{"prompt_n":1,"prompt_ms":14.205,"prompt_per_token_ms":14.205,"prompt_per_second":70.39774727208729,"predicted_n":41,"predicted_ms":456.989,"predicted_per_token_ms":11.146073170731707,"predicted_per_second":89.7176956119294}},"timings":{"prompt_n":1,"prompt_ms":14.205,"prompt_per_token_ms":14.205,"prompt_per_second":70.39774727208729,"predicted_n":41,"predicted_ms":456.989,"predicted_per_token_ms":11.146073170731707,"predicted_per_second":89.7176956119294}}

With tools:

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Where is Vancouver?"
    }
  ],
  "tools": []
}'
Logs for API call with tools


2025-02-14T23:00:02.681558344Z }
2025-02-14T23:00:02.681560773Z [common_chat_params_init] has_tools=true
2025-02-14T23:00:02.681563551Z Prompt: <|im_start|>system
2025-02-14T23:00:02.681566330Z You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
2025-02-14T23:00:02.681569047Z <|im_start|>user
2025-02-14T23:00:02.681571579Z Where is Vancouver?<|im_end|>
2025-02-14T23:00:02.681574388Z <|im_start|>assistant
2025-02-14T23:00:02.681576940Z 
2025-02-14T23:00:02.683414441Z Grammar: root ::= "<tool_call>" space tool-call "</tool_call>" space
2025-02-14T23:00:02.683444646Z space ::= | " " | "\n" [ \t]{0,20}
2025-02-14T23:00:02.683452066Z tool-call ::= 
2025-02-14T23:00:02.683455411Z 
2025-02-14T23:00:02.683457912Z Grammar lazy: true
2025-02-14T23:00:02.683460412Z Chat format: Hermes 2 Pro
2025-02-14T23:00:02.683462841Z Grammar trigger token: 151657 (`<tool_call>`)
2025-02-14T23:00:02.683465620Z Grammar trigger token: 151658 (`</tool_call>`)
2025-02-14T23:00:02.683468223Z srv  add_waiting_: add task 136 to waiting list. current waiting = 0 (before add)
2025-02-14T23:00:02.683470724Z que          post: new task, id = 136/1, front = 0
2025-02-14T23:00:02.683473091Z que    start_loop: processing new tasks
2025-02-14T23:00:02.683475520Z que    start_loop: processing task, id = 136
2025-02-14T23:00:02.683477897Z slot get_availabl: id  0 | task 93 | selected slot by lru, t_last = 238812907679
2025-02-14T23:00:02.683480326Z slot        reset: id  0 | task 93 | 
2025-02-14T23:00:02.683579400Z slot launch_slot_: id  0 | task 136 | launching slot : {"id":0,"id_task":136,"n_ctx":4096,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"root ::= \"<tool_call>\" space tool-call \"</tool_call>\" space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\ntool-call ::= \n","grammar_trigger_tokens":[151657,151658],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","next_token":{"has_next_token":false,"has_new_line":false,"n_remain":-1,"n_decoded":42,"stopping_word":""}}
2025-02-14T23:00:02.683619896Z slot launch_slot_: id  0 | task 136 | processing task
2025-02-14T23:00:02.683656677Z que    start_loop: update slots
2025-02-14T23:00:02.683660269Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.683662986Z que          post: new task, id = 137, front = 0
2025-02-14T23:00:02.683665590Z slot update_slots: id  0 | task 136 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 33
2025-02-14T23:00:02.683668440Z slot update_slots: id  0 | task 136 | prompt token   0: 151644 '<|im_start|>'
2025-02-14T23:00:02.683671188Z slot update_slots: id  0 | task 136 | prompt token   1:   8948 'system'
2025-02-14T23:00:02.683674111Z slot update_slots: id  0 | task 136 | prompt token   2:    198 '
2025-02-14T23:00:02.683677003Z '
2025-02-14T23:00:02.683680121Z slot update_slots: id  0 | task 136 | prompt token   3:   2610 'You'
2025-02-14T23:00:02.683682714Z slot update_slots: id  0 | task 136 | prompt token   4:    525 ' are'
2025-02-14T23:00:02.683685297Z slot update_slots: id  0 | task 136 | prompt token   5:   1207 ' Q'
2025-02-14T23:00:02.683687911Z slot update_slots: id  0 | task 136 | prompt token   6:  16948 'wen'
2025-02-14T23:00:02.683690433Z slot update_slots: id  0 | task 136 | prompt token   7:     11 ','
2025-02-14T23:00:02.683692923Z slot update_slots: id  0 | task 136 | prompt token   8:   3465 ' created'
2025-02-14T23:00:02.683695455Z slot update_slots: id  0 | task 136 | prompt token   9:    553 ' by'
2025-02-14T23:00:02.683697945Z slot update_slots: id  0 | task 136 | prompt token  10:  54364 ' Alibaba'
2025-02-14T23:00:02.683700457Z slot update_slots: id  0 | task 136 | prompt token  11:  14817 ' Cloud'
2025-02-14T23:00:02.683706693Z slot update_slots: id  0 | task 136 | prompt token  12:     13 '.'
2025-02-14T23:00:02.683709348Z slot update_slots: id  0 | task 136 | prompt token  13:   1446 ' You'
2025-02-14T23:00:02.683712003Z slot update_slots: id  0 | task 136 | prompt token  14:    525 ' are'
2025-02-14T23:00:02.683714494Z slot update_slots: id  0 | task 136 | prompt token  15:    264 ' a'
2025-02-14T23:00:02.683717190Z slot update_slots: id  0 | task 136 | need to evaluate at least 1 token to generate logits, n_past = 33, n_prompt_tokens = 33
2025-02-14T23:00:02.683719804Z slot update_slots: id  0 | task 136 | kv cache rm [32, end)
2025-02-14T23:00:02.683730857Z slot update_slots: id  0 | task 136 | prompt processing progress, n_past = 33, n_tokens = 1, progress = 0.030303
2025-02-14T23:00:02.683735591Z slot update_slots: id  0 | task 136 | prompt done, n_past = 33, n_tokens = 1
2025-02-14T23:00:02.683740047Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.696865908Z Grammar still awaiting trigger after token 53 (`V`) (buffer: `V`)
2025-02-14T23:00:02.696907650Z slot process_toke: id  0 | task 136 | n_decoded = 1, n_remaining = -1, next token:    53 'V'
2025-02-14T23:00:02.696915265Z srv  update_slots: run slots completed
2025-02-14T23:00:02.696918497Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.696921069Z que    start_loop: processing new tasks
2025-02-14T23:00:02.696923745Z que    start_loop: processing task, id = 137
2025-02-14T23:00:02.696926267Z que    start_loop: update slots
2025-02-14T23:00:02.696928736Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.696931628Z que          post: new task, id = 138, front = 0
2025-02-14T23:00:02.696934170Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 34, n_cache_tokens = 34, truncated = 0
2025-02-14T23:00:02.696936702Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.707383282Z Grammar still awaiting trigger after token 20471 (`ancouver`) (buffer: `Vancouver`)
2025-02-14T23:00:02.707428553Z slot process_toke: id  0 | task 136 | n_decoded = 2, n_remaining = -1, next token: 20471 'ancouver'
2025-02-14T23:00:02.707436488Z srv  update_slots: run slots completed
2025-02-14T23:00:02.707439894Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.707442807Z que    start_loop: processing new tasks
2025-02-14T23:00:02.707445215Z que    start_loop: processing task, id = 138
2025-02-14T23:00:02.707447726Z que    start_loop: update slots
2025-02-14T23:00:02.707450124Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.707452532Z que          post: new task, id = 139, front = 0
2025-02-14T23:00:02.707454971Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 35, n_cache_tokens = 35, truncated = 0
2025-02-14T23:00:02.707457441Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.718343464Z Grammar still awaiting trigger after token 374 (` is`) (buffer: `Vancouver is`)
2025-02-14T23:00:02.718391792Z slot process_toke: id  0 | task 136 | n_decoded = 3, n_remaining = -1, next token:   374 ' is'
2025-02-14T23:00:02.718399500Z srv  update_slots: run slots completed
2025-02-14T23:00:02.718402762Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.718405284Z que    start_loop: processing new tasks
2025-02-14T23:00:02.718407795Z que    start_loop: processing task, id = 139
2025-02-14T23:00:02.718410265Z que    start_loop: update slots
2025-02-14T23:00:02.718426895Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.718429509Z que          post: new task, id = 140, front = 0
2025-02-14T23:00:02.718432185Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 36, n_cache_tokens = 36, truncated = 0
2025-02-14T23:00:02.718434727Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.729332159Z Grammar still awaiting trigger after token 264 (` a`) (buffer: `Vancouver is a`)
2025-02-14T23:00:02.729363352Z slot process_toke: id  0 | task 136 | n_decoded = 4, n_remaining = -1, next token:   264 ' a'
2025-02-14T23:00:02.729366748Z srv  update_slots: run slots completed
2025-02-14T23:00:02.729369660Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.729372161Z que    start_loop: processing new tasks
2025-02-14T23:00:02.729374611Z que    start_loop: processing task, id = 140
2025-02-14T23:00:02.729377080Z que    start_loop: update slots
2025-02-14T23:00:02.729379499Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.729381897Z que          post: new task, id = 141, front = 0
2025-02-14T23:00:02.729384315Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 37, n_cache_tokens = 37, truncated = 0
2025-02-14T23:00:02.729386826Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.740639890Z Grammar still awaiting trigger after token 3598 (` major`) (buffer: `Vancouver is a major`)
2025-02-14T23:00:02.740691192Z slot process_toke: id  0 | task 136 | n_decoded = 5, n_remaining = -1, next token:  3598 ' major'
2025-02-14T23:00:02.740700588Z srv  update_slots: run slots completed
2025-02-14T23:00:02.740704828Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.740708060Z que    start_loop: processing new tasks
2025-02-14T23:00:02.740711198Z que    start_loop: processing task, id = 141
2025-02-14T23:00:02.740714173Z que    start_loop: update slots
2025-02-14T23:00:02.740717384Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.740720347Z que          post: new task, id = 142, front = 0
2025-02-14T23:00:02.740723404Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 38, n_cache_tokens = 38, truncated = 0
2025-02-14T23:00:02.740729424Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.753889945Z Grammar still awaiting trigger after token 3283 (` city`) (buffer: `Vancouver is a major city`)
2025-02-14T23:00:02.753926345Z slot process_toke: id  0 | task 136 | n_decoded = 6, n_remaining = -1, next token:  3283 ' city'
2025-02-14T23:00:02.753932674Z srv  update_slots: run slots completed
2025-02-14T23:00:02.753936564Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.753939930Z que    start_loop: processing new tasks
2025-02-14T23:00:02.753964804Z que    start_loop: processing task, id = 142
2025-02-14T23:00:02.753970608Z que    start_loop: update slots
2025-02-14T23:00:02.753973881Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.753977143Z que          post: new task, id = 143, front = 0
2025-02-14T23:00:02.753980426Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 39, n_cache_tokens = 39, truncated = 0
2025-02-14T23:00:02.753984265Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.765268589Z Grammar still awaiting trigger after token 7407 (` located`) (buffer: `Vancouver is a major city located`)
2025-02-14T23:00:02.765293998Z slot process_toke: id  0 | task 136 | n_decoded = 7, n_remaining = -1, next token:  7407 ' located'
2025-02-14T23:00:02.765297343Z srv  update_slots: run slots completed
2025-02-14T23:00:02.765299915Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.765302375Z que    start_loop: processing new tasks
2025-02-14T23:00:02.765304896Z que    start_loop: processing task, id = 143
2025-02-14T23:00:02.765307366Z que    start_loop: update slots
2025-02-14T23:00:02.765311956Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.765314570Z que          post: new task, id = 144, front = 0
2025-02-14T23:00:02.765317050Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 40, n_cache_tokens = 40, truncated = 0
2025-02-14T23:00:02.765319788Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.777740005Z Grammar still awaiting trigger after token 304 (` in`) (buffer: `Vancouver is a major city located in`)
2025-02-14T23:00:02.777783352Z slot process_toke: id  0 | task 136 | n_decoded = 8, n_remaining = -1, next token:   304 ' in'
2025-02-14T23:00:02.777791965Z srv  update_slots: run slots completed
2025-02-14T23:00:02.777795259Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.777797832Z que    start_loop: processing new tasks
2025-02-14T23:00:02.777800219Z que    start_loop: processing task, id = 144
2025-02-14T23:00:02.777802751Z que    start_loop: update slots
2025-02-14T23:00:02.777805138Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.777821327Z que          post: new task, id = 145, front = 0
2025-02-14T23:00:02.777826966Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 41, n_cache_tokens = 41, truncated = 0
2025-02-14T23:00:02.777829570Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.790010862Z Grammar still awaiting trigger after token 279 (` the`) (buffer: `Vancouver is a major city located in the`)
2025-02-14T23:00:02.790050452Z slot process_toke: id  0 | task 136 | n_decoded = 9, n_remaining = -1, next token:   279 ' the'
2025-02-14T23:00:02.790058860Z srv  update_slots: run slots completed
2025-02-14T23:00:02.790072033Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.790074740Z que    start_loop: processing new tasks
2025-02-14T23:00:02.790077138Z que    start_loop: processing task, id = 145
2025-02-14T23:00:02.790079587Z que    start_loop: update slots
2025-02-14T23:00:02.790082057Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.790084475Z que          post: new task, id = 146, front = 0
2025-02-14T23:00:02.790086904Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 42, n_cache_tokens = 42, truncated = 0
2025-02-14T23:00:02.790089384Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.800634097Z Grammar still awaiting trigger after token 16847 (` province`) (buffer: `Vancouver is a major city located in the province`)
2025-02-14T23:00:02.800673430Z slot process_toke: id  0 | task 136 | n_decoded = 10, n_remaining = -1, next token: 16847 ' province'
2025-02-14T23:00:02.800684051Z srv  update_slots: run slots completed
2025-02-14T23:00:02.800687622Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.800690287Z que    start_loop: processing new tasks
2025-02-14T23:00:02.800692788Z que    start_loop: processing task, id = 146
2025-02-14T23:00:02.800695402Z que    start_loop: update slots
2025-02-14T23:00:02.800697954Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.800700414Z que          post: new task, id = 147, front = 0
2025-02-14T23:00:02.800702915Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 43, n_cache_tokens = 43, truncated = 0
2025-02-14T23:00:02.800705467Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.811484797Z Grammar still awaiting trigger after token 315 (` of`) (buffer: `Vancouver is a major city located in the province of`)
2025-02-14T23:00:02.811530254Z slot process_toke: id  0 | task 136 | n_decoded = 11, n_remaining = -1, next token:   315 ' of'
2025-02-14T23:00:02.811539506Z srv  update_slots: run slots completed
2025-02-14T23:00:02.811543520Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.811546751Z que    start_loop: processing new tasks
2025-02-14T23:00:02.811550199Z que    start_loop: processing task, id = 147
2025-02-14T23:00:02.811553214Z que    start_loop: update slots
2025-02-14T23:00:02.811556507Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.811559502Z que          post: new task, id = 148, front = 0
2025-02-14T23:00:02.811562558Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 44, n_cache_tokens = 44, truncated = 0
2025-02-14T23:00:02.811565636Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.821846980Z Grammar still awaiting trigger after token 7855 (` British`) (buffer: `Vancouver is a major city located in the province of British`)
2025-02-14T23:00:02.821900011Z slot process_toke: id  0 | task 136 | n_decoded = 12, n_remaining = -1, next token:  7855 ' British'
2025-02-14T23:00:02.821908646Z srv  update_slots: run slots completed
2025-02-14T23:00:02.821911939Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.821914563Z que    start_loop: processing new tasks
2025-02-14T23:00:02.821917023Z que    start_loop: processing task, id = 148
2025-02-14T23:00:02.821919472Z que    start_loop: update slots
2025-02-14T23:00:02.821921921Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.821924329Z que          post: new task, id = 149, front = 0
2025-02-14T23:00:02.821926799Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 45, n_cache_tokens = 45, truncated = 0
2025-02-14T23:00:02.821929547Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.832880250Z Grammar still awaiting trigger after token 18796 (` Columbia`) (buffer: `Vancouver is a major city located in the province of British Columbia`)
2025-02-14T23:00:02.832923000Z slot process_toke: id  0 | task 136 | n_decoded = 13, n_remaining = -1, next token: 18796 ' Columbia'
2025-02-14T23:00:02.832930966Z srv  update_slots: run slots completed
2025-02-14T23:00:02.832934269Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.832938097Z que    start_loop: processing new tasks
2025-02-14T23:00:02.832942451Z que    start_loop: processing task, id = 149
2025-02-14T23:00:02.832946382Z que    start_loop: update slots
2025-02-14T23:00:02.832950468Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.832954131Z que          post: new task, id = 150, front = 0
2025-02-14T23:00:02.832957990Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 46, n_cache_tokens = 46, truncated = 0
2025-02-14T23:00:02.832963043Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.843609184Z Grammar still awaiting trigger after token 11 (`,`) (buffer: `Vancouver is a major city located in the province of British Columbia,`)
2025-02-14T23:00:02.843651265Z slot process_toke: id  0 | task 136 | n_decoded = 14, n_remaining = -1, next token:    11 ','
2025-02-14T23:00:02.843658932Z srv  update_slots: run slots completed
2025-02-14T23:00:02.843662513Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.843665086Z que    start_loop: processing new tasks
2025-02-14T23:00:02.843667422Z que    start_loop: processing task, id = 150
2025-02-14T23:00:02.843681367Z que    start_loop: update slots
2025-02-14T23:00:02.843687233Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.843689713Z que          post: new task, id = 151, front = 0
2025-02-14T23:00:02.843692317Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 47, n_cache_tokens = 47, truncated = 0
2025-02-14T23:00:02.843704388Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.854534949Z Grammar still awaiting trigger after token 389 (` on`) (buffer: `Vancouver is a major city located in the province of British Columbia, on`)
2025-02-14T23:00:02.854574231Z slot process_toke: id  0 | task 136 | n_decoded = 15, n_remaining = -1, next token:   389 ' on'
2025-02-14T23:00:02.854582773Z srv  update_slots: run slots completed
2025-02-14T23:00:02.854586282Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.854588886Z que    start_loop: processing new tasks
2025-02-14T23:00:02.854591417Z que    start_loop: processing task, id = 151
2025-02-14T23:00:02.854593928Z que    start_loop: update slots
2025-02-14T23:00:02.854596563Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.854599084Z que          post: new task, id = 152, front = 0
2025-02-14T23:00:02.854601626Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 48, n_cache_tokens = 48, truncated = 0
2025-02-14T23:00:02.854604178Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.865522453Z Grammar still awaiting trigger after token 279 (` the`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the`)
2025-02-14T23:00:02.865561519Z slot process_toke: id  0 | task 136 | n_decoded = 16, n_remaining = -1, next token:   279 ' the'
2025-02-14T23:00:02.865569000Z srv  update_slots: run slots completed
2025-02-14T23:00:02.865572376Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.865574908Z que    start_loop: processing new tasks
2025-02-14T23:00:02.865577305Z que    start_loop: processing task, id = 152
2025-02-14T23:00:02.865580166Z que    start_loop: update slots
2025-02-14T23:00:02.865582780Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.865585178Z que          post: new task, id = 153, front = 0
2025-02-14T23:00:02.865587607Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 49, n_cache_tokens = 49, truncated = 0
2025-02-14T23:00:02.865590077Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.876130230Z Grammar still awaiting trigger after token 9710 (` west`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west`)
2025-02-14T23:00:02.876167752Z slot process_toke: id  0 | task 136 | n_decoded = 17, n_remaining = -1, next token:  9710 ' west'
2025-02-14T23:00:02.876175501Z srv  update_slots: run slots completed
2025-02-14T23:00:02.876178877Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.876181552Z que    start_loop: processing new tasks
2025-02-14T23:00:02.876184125Z que    start_loop: processing task, id = 153
2025-02-14T23:00:02.876186842Z que    start_loop: update slots
2025-02-14T23:00:02.876201476Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.876204255Z que          post: new task, id = 154, front = 0
2025-02-14T23:00:02.876208968Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 50, n_cache_tokens = 50, truncated = 0
2025-02-14T23:00:02.876211654Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.886711308Z Grammar still awaiting trigger after token 13648 (` coast`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast`)
2025-02-14T23:00:02.886750467Z slot process_toke: id  0 | task 136 | n_decoded = 18, n_remaining = -1, next token: 13648 ' coast'
2025-02-14T23:00:02.886758216Z srv  update_slots: run slots completed
2025-02-14T23:00:02.886761756Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.886764391Z que    start_loop: processing new tasks
2025-02-14T23:00:02.886766820Z que    start_loop: processing task, id = 154
2025-02-14T23:00:02.886769197Z que    start_loop: update slots
2025-02-14T23:00:02.886771636Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.886773972Z que          post: new task, id = 155, front = 0
2025-02-14T23:00:02.886776812Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 51, n_cache_tokens = 51, truncated = 0
2025-02-14T23:00:02.886779313Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.897015634Z Grammar still awaiting trigger after token 315 (` of`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of`)
2025-02-14T23:00:02.897051047Z slot process_toke: id  0 | task 136 | n_decoded = 19, n_remaining = -1, next token:   315 ' of'
2025-02-14T23:00:02.897058189Z srv  update_slots: run slots completed
2025-02-14T23:00:02.897061534Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.897063952Z que    start_loop: processing new tasks
2025-02-14T23:00:02.897066206Z que    start_loop: processing task, id = 155
2025-02-14T23:00:02.897068511Z que    start_loop: update slots
2025-02-14T23:00:02.897071660Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.897073976Z que          post: new task, id = 156, front = 0
2025-02-14T23:00:02.897076322Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 52, n_cache_tokens = 52, truncated = 0
2025-02-14T23:00:02.897078720Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.907501791Z Grammar still awaiting trigger after token 6864 (` Canada`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada`)
2025-02-14T23:00:02.907573552Z slot process_toke: id  0 | task 136 | n_decoded = 20, n_remaining = -1, next token:  6864 ' Canada'
2025-02-14T23:00:02.907582495Z srv  update_slots: run slots completed
2025-02-14T23:00:02.907601194Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.907605084Z que    start_loop: processing new tasks
2025-02-14T23:00:02.907608480Z que    start_loop: processing task, id = 156
2025-02-14T23:00:02.907611465Z que    start_loop: update slots
2025-02-14T23:00:02.907614357Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.907617362Z que          post: new task, id = 157, front = 0
2025-02-14T23:00:02.907620326Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 53, n_cache_tokens = 53, truncated = 0
2025-02-14T23:00:02.907623331Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.919535304Z Grammar still awaiting trigger after token 13 (`.`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.`)
2025-02-14T23:00:02.919576665Z slot process_toke: id  0 | task 136 | n_decoded = 21, n_remaining = -1, next token:    13 '.'
2025-02-14T23:00:02.919584034Z srv  update_slots: run slots completed
2025-02-14T23:00:02.919587193Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.919589550Z que    start_loop: processing new tasks
2025-02-14T23:00:02.919591937Z que    start_loop: processing task, id = 157
2025-02-14T23:00:02.919594335Z que    start_loop: update slots
2025-02-14T23:00:02.919596795Z srv  update_slots: posting NEXT_RESPONSE
2025-02-14T23:00:02.919599028Z que          post: new task, id = 158, front = 0
2025-02-14T23:00:02.919601302Z slot update_slots: id  0 | task 136 | slot decode token, n_ctx = 4096, n_past = 54, n_cache_tokens = 54, truncated = 0
2025-02-14T23:00:02.919603659Z srv  update_slots: decoding batch, n_tokens = 1
2025-02-14T23:00:02.931837424Z Grammar still awaiting trigger after token 151645 (`<|im_end|>`) (buffer: `Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.<|im_end|>`)
2025-02-14T23:00:02.931880092Z slot process_toke: id  0 | task 136 | stopped by EOS
2025-02-14T23:00:02.931888181Z slot process_toke: id  0 | task 136 | n_decoded = 22, n_remaining = -1, next token: 151645 ''
2025-02-14T23:00:02.931891649Z slot      release: id  0 | task 136 | stop processing: n_past = 54, truncated = 0
2025-02-14T23:00:02.931894325Z slot print_timing: id  0 | task 136 | 
2025-02-14T23:00:02.931896733Z prompt eval time =      13.15 ms /     1 tokens (   13.15 ms per token,    76.06 tokens per second)
2025-02-14T23:00:02.931899254Z        eval time =     234.89 ms /    22 tokens (   10.68 ms per token,    93.66 tokens per second)
2025-02-14T23:00:02.931901714Z       total time =     248.04 ms /    23 tokens
2025-02-14T23:00:02.931904174Z srv          send: sending result for task id = 136
2025-02-14T23:00:02.931906561Z srv          send: task id = 136 pushed to result queue
2025-02-14T23:00:02.931918468Z srv  update_slots: run slots completed
2025-02-14T23:00:02.931921566Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.931924180Z que    start_loop: processing new tasks
2025-02-14T23:00:02.931926609Z que    start_loop: processing task, id = 158
2025-02-14T23:00:02.931929038Z que    start_loop: update slots
2025-02-14T23:00:02.931931425Z srv  update_slots: all slots are idle
2025-02-14T23:00:02.931933772Z que    start_loop: waiting for new tasks
2025-02-14T23:00:02.931960014Z srv  remove_waiti: remove task 136 from waiting list. current waiting = 1 (before remove)
2025-02-14T23:00:02.932242069Z request: POST /v1/chat/completions 172.17.0.1 200









2025-02-14T23:00:02.932315961Z }
2025-02-14T23:00:02.932320973Z response: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.","tool_calls":null,"role":"assistant"}}],"created":1739574002,"model":"gpt-3.5-turbo","system_fingerprint":"b4603-4a2b196d","object":"chat.completion","usage":{"completion_tokens":22,"prompt_tokens":33,"total_tokens":55},"id":"chatcmpl-c2GN65ce1SggQmg8P20BxtaWGPQqwVDu","__verbose":{"index":0,"content":"Vancouver is a major city located in the province of British Columbia, on the west coast of Canada.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":22,"tokens_evaluated":33,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"root ::= \"<tool_call>\" space tool-call \"</tool_call>\" space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\ntool-call ::= \n","grammar_trigger_tokens":[151657,151658],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhere is Vancouver?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":54,"timings":{"prompt_n":1,"prompt_ms":13.148,"prompt_per_token_ms":13.148,"prompt_per_second":76.05719501064802,"predicted_n":22,"predicted_ms":234.893,"predicted_per_token_ms":10.676954545454546,"predicted_per_second":93.65966631615245}},"timings":{"prompt_n":1,"prompt_ms":13.148,"prompt_per_token_ms":13.148,"prompt_per_second":76.05719501064802,"predicted_n":22,"predicted_ms":234.893,"predicted_per_token_ms":10.676954545454546,"predicted_per_second":93.65966631615245}}


I think the problem might be without tools, without grammar, with jinja template, the grammar is not set correctly?
I'm running llama.cpp server using docker, how should I fetch the template using the command you mentioned in the docker container? ./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja

@ochafik
Copy link
Collaborator

ochafik commented Feb 15, 2025

@henryclw Thanks for the extra repro details! I was able to reproduce this when building with -DLLAMA_LLGUIDANCE=1.

Looks like I left a typo in chat.cpp / common_chat_params_init_without_tools, will send a fix (edit: #11880 )

@henryclw
Copy link

Thank you for the quick reply. I just compiled your fix branch locally and it solved the problem.

@MoonRide303
Copy link
Contributor Author

Hey @MoonRide303 , @henryclw , thanks for reporting this! Are you both experiencing this on Windows?

Could you try fetching the template with ./scripts/get_chat_template.py google/gemma-2-2b-it > gemma2.jinja ? (or probably with something like py script\get_chat_template.py google/gemma-2-2b-it > gemma2.jinja if not running inside a WSL shell)

(these templates seem to work on my mac, maybe some line ending issue or bad unescaping of the JSON string if editing them manually?)

@ochafik This script doesn't work for me:

python scripts\get_chat_template.py google/gemma-2-2b-it > gemma2.jinja
Traceback (most recent call last):
  File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 76, in <module>
    main(sys.argv[1:])
  File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 71, in main
    template = get_chat_template(model_id, variant)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\repos-git\llama.cpp\scripts\get_chat_template.py", line 25, in get_chat_template
    config_str = f.read()
                 ^^^^^^^^
  File "D:\anaconda3\Lib\encodings\cp1250.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 24973: character maps to <undefined>

I just directly copied content of chat_template field from the tokenizer_config.json files I've linked in the first post - attaching it here (as .txt, as GitHub blocks .jinja).

gemma2.jinja.txt
llama3.2.jinja.txt
qwen2.5.jinja.txt

@ochafik
Copy link
Collaborator

ochafik commented Feb 16, 2025

I just directly copied content of chat_template field from the tokenizer_config.json files I've linked in the first post - attaching it here (as .txt, as GitHub blocks .jinja).

@henryclw That content is JSON-escaped / not valid Jinja; to use it you can paste the chat_template string to a JavaScript console and wrap it with a console.log call (then copy the result to your jinja file), or try and fix the get_chat_template.py script (looks like encoding might be required on Windows). Wasn't able to test on Windows today, could you confirm if the following edit works for you?

        with open(hf_hub_download(repo_id=model_id, filename="tokenizer_config.json"), "r", encoding="utf-8") as f:
            config_str = f.read()

@MoonRide303
Copy link
Contributor Author

MoonRide303 commented Feb 16, 2025

@ochafik It was me who attached those files. And... you're absolutely right it was JSON escaping causing all the troubles here. I've made simpler and working version of the script for acquiring chat templates (as an alternative for broken scripts/get_chat_template.py - maybe it should be added to the repo scripts?), and with proper JSON decoding it seems official templates are working, now (attaching correct versions of those).

get_hf_template.py.txt
gemma-2-2b-it.jinja.txt
Llama-3.2-3B-Instruct.jinja.txt
Qwen2.5-1.5B-Instruct.jinja.txt

Could you add some kind of error when template is not a valid Jinja? It would be easier to avoid that kind of mistakes in future, then.

@ochafik
Copy link
Collaborator

ochafik commented Feb 17, 2025

Could you add some kind of error when template is not a valid Jinja? It would be easier to avoid that kind of mistakes in future, then.

@MoonRide303 it should already print quite a lengthy error message (if you scroll right you'll see the ^ points at the first offending character, an \ escape), doesn't it show for you?

common_chat_templates_from_model: failed to parse chat template: Expected value expression at row 1, column 269:
{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n
                                                                                                                                                                                                                                                                            ^

@MoonRide303
Copy link
Contributor Author

@ochafik When I try to launch it with that earlier (broken) llama3.2.jinja it just silently quits after printing device info:

PS E:\ML-models\Llama-3.2-3B-Instruct-GGUF> E:\llama.cpp-b4734\llama-server.exe -v -ngl 99 -m Llama-3.2-3B-Instruct-Q8_0.gguf --jinja --chat-template-file llama3.2.jinja -c 8192
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes
PS E:\ML-models\Llama-3.2-3B-Instruct-GGUF>

Same output from both my local build, and the official binaries (llama-b4734-bin-win-cuda-cu12.4-x64.zip).

ngxson pushed a commit that referenced this issue Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants