roadmap: Cortex can support Reasoning Models with `chat_templates` #1758

dan-menlo · 2024-12-02T07:05:14Z

Goal

We change model.yaml to align with a generic chat_template that can support reasoning models
Reasoning models require System Prompts (e.g. qwq etc)
- System prompt is in tokenizer_config.json (in HF Transformers),
- https://huggingface.co/Qwen/QwQ-32B-Preview/blob/1032e81cb936c486aae1d33da75b2fbcd5deed4a/tokenizer_config.json#L197
We need to adapt Cortex model.yaml to accomodate System Prompt
- Bad: prompt_template (our own creation?) is not a standard field
- Bad: prompt_template currently takes in a system_message variable
- We need to define the system_message, as well as things like %tools%, etc
We should align with industry standards
- We need to parse the existing chat_template (from HF Transformers)
- into GGUF's in-prefix and in-suffix

Tasklist

How do we transition if we change the field to chat_template (align with HF Transformers?)
- We support both chat_template and prompt_templates
- chat_template overrides prompt_template
- Mark prompt_template as deprecated (we don't support for future models)
- Update Model CI to have support chat_template
Is there a scalable way to do this, by leveraging on existing standards?
- Option 1: Having a 2nd field on all model.yaml that adds chat_template
- Option 2: Copying over tokenizer_config.json
- Option 3: Model.yaml
Models
- Marco-o1
- qwq

Resources

GGUF

GGUF uses in-prefix and in-suffix
e.g. Whats the difference between input prefix / suffix (inp_pfx/inp_sfx) and line prefix / suffix (line_pfx/line_sfx)? ggerganov/llama.cpp#3691
e.g. Prompt structure after the --in-prefix-bos commit ggerganov/llama.cpp#2417

HF Transformers

Bigger question: should we just have tokenizer_config.json define chat_template and include it in model repos
For everything else, we depend on GGUF
We have a lightweight model.yaml that can override this if needed

Example

qwq Chat Template prompt (jinja2)

# qwq chat template prompt
# jinja2 template

"chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "model_max_length": 32768,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null,
  "add_bos_token": false
}

Current `model.yaml` format

# Our current qwq `model.yaml`

prompt_template: <|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n
ctx_len: 4096
ngl: 34
# END REQUIRED
# END MODEL LOAD PARAMETERS

The text was updated successfully, but these errors were encountered:

namchuai · 2024-12-17T05:39:02Z

Summary of my thought on this ticket. Feel free to comment and let me know your idea. I might be wrong.

For GGUF model, we can extract chat_template from GGUF file.
For python model (safetensor stuffs), there is a json config, which will contain the chat_template.
Both of these chat_template are jinja code.

So I think we can get the chat_template, render it with actual data, then send it to engine.
We don't introduce new field, we don't remove any field.
We deprecate prompt_template in the future and will eventually get rid of it.

This change will affect cortex.cpp and cortex.llamacpp. After we release new stable version for cortex.llamacpp, we will prompt user to update the engine and the chat_template will be used.

About the `chat_template`

Sample of chat_template of `qwq` model. You can grab it from here.

{%- if tools %} {{- '<\|im_start\|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool \| tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><\|im_end\|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<\|im_start\|>system\n' + messages[0]['content'] + '<\|im_end\|>\n' }} {%- else %} {{- '<\|im_start\|>system\nYou are a helpful assistant.<\|im_end\|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<\|im_start\|>' + message.role + '\n' + message.content + '<\|im_end\|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<\|im_start\|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments \| tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<\|im_end\|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<\|im_start\|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<\|im_end\|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<\|im_start\|>assistant\n' }} {%- endif %}

Remove the escape characters from here

{%- if tools %} {{- '<|im_start|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool | tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} {%- else %} {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<|im_start|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments | tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<|im_end|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<|im_start|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- endif %}

Then we can render this template using below data (online tool)

{
    "tools": [
        {
            "name": "calculator",
            "description": "Basic calculator function",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    ],
    "messages": [
        {
            "role": "system",
            "content": "You are a math assistant specialized in calculations."
        },
        {
            "role": "user",
            "content": "What is 25 * 4?"
        },
        {
            "role": "assistant",
            "content": "Let me calculate that for you.",
            "tool_calls": [
                {
                    "function": {
                        "name": "calculator",
                        "arguments": "{\"expression\": \"25 * 4\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "100"
        },
        {
            "role": "assistant",
            "content": "The result of 25 * 4 is 100."
        }
    ],
    "add_generation_prompt": false
}

Final prompt pass to engine

<|im_start|>system
You are a math assistant specialized in calculations.

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"description": "Basic calculator function", "name": "calculator", "parameters": {"properties": {"expression": {"description": "Math expression to evaluate", "type": "string"}}, "required": ["expression"], "type": "object"}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
What is 25 * 4?<|im_end|>
<|im_start|>assistant
Let me calculate that for you.
<tool_call>
{"name": "calculator", "arguments": "{\"expression\": \"25 * 4\"}"}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
100
</tool_response><|im_end|>
<|im_start|>assistant
The result of 25 * 4 is 100.<|im_end|>

dan-menlo · 2024-12-18T09:20:54Z

18 Dec

Problem

qwq and reasoning models need a specific System Message to work
Our current model.yaml has a prompt_template but is outdated
- llama.cpp now has a full chat_template
- Does not cover system_message
Jan has user-defined system message with Assistants, that overrides system_message
- This will be prefilled in the right panel
Architecture
- chat_template is for llama.cpp server, and not llama.cpp

Decision

No change to model.yaml
James has a PR to use the gguf chat_template

dan-menlo added this to Menlo Dec 2, 2024

dan-menlo converted this from a draft issue Dec 2, 2024

dan-menlo assigned namchuai, hahuyhoang411 and vansangpfiev and unassigned namchuai Dec 2, 2024

dan-menlo changed the title ~~planning: Cortex supports System Prompt~~ planning: Cortex supports chat_templates in model.yaml Dec 2, 2024

dan-menlo assigned namchuai and unassigned vansangpfiev Dec 2, 2024

louis-jan added the category: model management Model pull, yaml, model state label Dec 4, 2024

dan-menlo changed the title ~~planning: Cortex supports chat_templates in model.yaml~~ roadmap: Cortex supports chat_templates in model.yaml Dec 6, 2024

dan-menlo unassigned hahuyhoang411 Dec 15, 2024

dan-menlo changed the title ~~roadmap: Cortex supports chat_templates in model.yaml~~ roadmap: model.yaml can support Reasoning Models' System Prompt Dec 16, 2024

namchuai moved this from Investigating to Planning in Menlo Dec 17, 2024

namchuai moved this from Planning to Scheduled in Menlo Dec 17, 2024

namchuai moved this from Scheduled to In Progress in Menlo Dec 17, 2024

dan-menlo changed the title ~~roadmap: model.yaml can support Reasoning Models' System Prompt~~ roadmap: Jan can support Reasoning Models with chat_templates Dec 18, 2024

dan-menlo changed the title ~~roadmap: Jan can support Reasoning Models with chat_templates~~ roadmap: Cortex can support Reasoning Models with chat_templates Dec 18, 2024

namchuai moved this from In Progress to Eng Review in Menlo Dec 20, 2024

namchuai moved this from Eng Review to QA in Menlo Dec 25, 2024

TC117 added this to the v1.0.6 milestone Dec 26, 2024

vansangpfiev mentioned this issue Dec 27, 2024

roadmap: llamacpp-engine to align with llama.cpp upstream #1728

Open

TC117 modified the milestones: v1.0.6, v1.0.7 Dec 30, 2024

TC117 modified the milestones: v1.0.7, v1.0.9 Jan 7, 2025

TC117 self-assigned this Jan 7, 2025

freelerobot unassigned namchuai Jan 13, 2025

TC117 modified the milestones: v1.0.9, 1.0.10 Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roadmap: Cortex can support Reasoning Models with `chat_templates` #1758

roadmap: Cortex can support Reasoning Models with `chat_templates` #1758

dan-menlo commented Dec 2, 2024 •

edited

Loading

namchuai commented Dec 17, 2024 •

edited

Loading

dan-menlo commented Dec 18, 2024 •

edited

Loading

roadmap: Cortex can support Reasoning Models with chat_templates #1758

roadmap: Cortex can support Reasoning Models with chat_templates #1758

Comments

dan-menlo commented Dec 2, 2024 • edited Loading

Goal

Tasklist

Resources

GGUF

HF Transformers

Example

qwq Chat Template prompt (jinja2)

Current model.yaml format

namchuai commented Dec 17, 2024 • edited Loading

Summary of my thought on this ticket. Feel free to comment and let me know your idea. I might be wrong.

About the chat_template

Sample of chat_template of qwq model. You can grab it from here.

Remove the escape characters from here

Then we can render this template using below data (online tool)

Final prompt pass to engine

dan-menlo commented Dec 18, 2024 • edited Loading

18 Dec

Problem

Decision

roadmap: Cortex can support Reasoning Models with `chat_templates` #1758

roadmap: Cortex can support Reasoning Models with `chat_templates` #1758

dan-menlo commented Dec 2, 2024 •

edited

Loading

Current `model.yaml` format

namchuai commented Dec 17, 2024 •

edited

Loading

About the `chat_template`

Sample of chat_template of `qwq` model. You can grab it from here.

dan-menlo commented Dec 18, 2024 •

edited

Loading