Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roadmap: Cortex can support Reasoning Models with chat_templates #1758

Open
3 tasks
dan-menlo opened this issue Dec 2, 2024 · 2 comments
Open
3 tasks

roadmap: Cortex can support Reasoning Models with chat_templates #1758

dan-menlo opened this issue Dec 2, 2024 · 2 comments
Assignees
Labels
category: model management Model pull, yaml, model state
Milestone

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Dec 2, 2024

Goal

  • We change model.yaml to align with a generic chat_template that can support reasoning models
  • Reasoning models require System Prompts (e.g. qwq etc)
  • We need to adapt Cortex model.yaml to accomodate System Prompt
    • Bad: prompt_template (our own creation?) is not a standard field
    • Bad: prompt_template currently takes in a system_message variable
    • We need to define the system_message, as well as things like %tools%, etc
  • We should align with industry standards
    • We need to parse the existing chat_template (from HF Transformers)
    • into GGUF's in-prefix and in-suffix

Tasklist

  • How do we transition if we change the field to chat_template (align with HF Transformers?)
    • We support both chat_template and prompt_templates
    • chat_template overrides prompt_template
    • Mark prompt_template as deprecated (we don't support for future models)
    • Update Model CI to have support chat_template
  • Is there a scalable way to do this, by leveraging on existing standards?
    • Option 1: Having a 2nd field on all model.yaml that adds chat_template
    • Option 2: Copying over tokenizer_config.json
    • Option 3: Model.yaml
  • Models
    • Marco-o1
    • qwq

Resources

GGUF

HF Transformers

  • Bigger question: should we just have tokenizer_config.json define chat_template and include it in model repos
  • For everything else, we depend on GGUF
  • We have a lightweight model.yaml that can override this if needed

Example

qwq Chat Template prompt (jinja2)

# qwq chat template prompt
# jinja2 template

"chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "model_max_length": 32768,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null,
  "add_bos_token": false
}


Current model.yaml format

# Our current qwq `model.yaml`

prompt_template: <|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n
ctx_len: 4096
ngl: 34
# END REQUIRED
# END MODEL LOAD PARAMETERS

@dan-menlo dan-menlo added this to Menlo Dec 2, 2024
@dan-menlo dan-menlo converted this from a draft issue Dec 2, 2024
@dan-menlo dan-menlo changed the title planning: Cortex supports System Prompt planning: Cortex supports chat_templates in model.yaml Dec 2, 2024
@dan-menlo dan-menlo assigned namchuai and unassigned vansangpfiev Dec 2, 2024
@louis-jan louis-jan added the category: model management Model pull, yaml, model state label Dec 4, 2024
@dan-menlo dan-menlo changed the title planning: Cortex supports chat_templates in model.yaml roadmap: Cortex supports chat_templates in model.yaml Dec 6, 2024
@dan-menlo dan-menlo changed the title roadmap: Cortex supports chat_templates in model.yaml roadmap: model.yaml can support Reasoning Models' System Prompt Dec 16, 2024
@namchuai namchuai moved this from Investigating to Planning in Menlo Dec 17, 2024
@namchuai namchuai moved this from Planning to Scheduled in Menlo Dec 17, 2024
@namchuai namchuai moved this from Scheduled to In Progress in Menlo Dec 17, 2024
@namchuai
Copy link
Collaborator

namchuai commented Dec 17, 2024

Summary of my thought on this ticket. Feel free to comment and let me know your idea. I might be wrong.

  • For GGUF model, we can extract chat_template from GGUF file.
  • For python model (safetensor stuffs), there is a json config, which will contain the chat_template.
  • Both of these chat_template are jinja code.

So I think we can get the chat_template, render it with actual data, then send it to engine.
We don't introduce new field, we don't remove any field.
We deprecate prompt_template in the future and will eventually get rid of it.

This change will affect cortex.cpp and cortex.llamacpp. After we release new stable version for cortex.llamacpp, we will prompt user to update the engine and the chat_template will be used.

About the chat_template

Sample of chat_template of qwq model. You can grab it from here.

{%- if tools %} {{- '<\|im_start\|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool \| tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><\|im_end\|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<\|im_start\|>system\n' + messages[0]['content'] + '<\|im_end\|>\n' }} {%- else %} {{- '<\|im_start\|>system\nYou are a helpful assistant.<\|im_end\|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<\|im_start\|>' + message.role + '\n' + message.content + '<\|im_end\|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<\|im_start\|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments \| tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<\|im_end\|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<\|im_start\|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<\|im_end\|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<\|im_start\|>assistant\n' }} {%- endif %}

Remove the escape characters from here

{%- if tools %} {{- '<|im_start|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool | tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} {%- else %} {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<|im_start|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments | tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<|im_end|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<|im_start|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- endif %}

Then we can render this template using below data (online tool)

{
    "tools": [
        {
            "name": "calculator",
            "description": "Basic calculator function",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    ],
    "messages": [
        {
            "role": "system",
            "content": "You are a math assistant specialized in calculations."
        },
        {
            "role": "user",
            "content": "What is 25 * 4?"
        },
        {
            "role": "assistant",
            "content": "Let me calculate that for you.",
            "tool_calls": [
                {
                    "function": {
                        "name": "calculator",
                        "arguments": "{\"expression\": \"25 * 4\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "100"
        },
        {
            "role": "assistant",
            "content": "The result of 25 * 4 is 100."
        }
    ],
    "add_generation_prompt": false
}

Final prompt pass to engine

<|im_start|>system
You are a math assistant specialized in calculations.

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"description": "Basic calculator function", "name": "calculator", "parameters": {"properties": {"expression": {"description": "Math expression to evaluate", "type": "string"}}, "required": ["expression"], "type": "object"}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
What is 25 * 4?<|im_end|>
<|im_start|>assistant
Let me calculate that for you.
<tool_call>
{"name": "calculator", "arguments": "{\"expression\": \"25 * 4\"}"}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
100
</tool_response><|im_end|>
<|im_start|>assistant
The result of 25 * 4 is 100.<|im_end|>

@dan-menlo
Copy link
Contributor Author

dan-menlo commented Dec 18, 2024

18 Dec

Problem

  • qwq and reasoning models need a specific System Message to work
  • Our current model.yaml has a prompt_template but is outdated
    • llama.cpp now has a full chat_template
    • Does not cover system_message
  • Jan has user-defined system message with Assistants, that overrides system_message
    • This will be prefilled in the right panel
  • Architecture
    • chat_template is for llama.cpp server, and not llama.cpp

Decision

  • No change to model.yaml
  • James has a PR to use the gguf chat_template

@dan-menlo dan-menlo changed the title roadmap: model.yaml can support Reasoning Models' System Prompt roadmap: Jan can support Reasoning Models with chat_templates Dec 18, 2024
@dan-menlo dan-menlo changed the title roadmap: Jan can support Reasoning Models with chat_templates roadmap: Cortex can support Reasoning Models with chat_templates Dec 18, 2024
@namchuai namchuai moved this from In Progress to Eng Review in Menlo Dec 20, 2024
@namchuai namchuai moved this from Eng Review to QA in Menlo Dec 25, 2024
@TC117 TC117 added this to the v1.0.6 milestone Dec 26, 2024
@TC117 TC117 modified the milestones: v1.0.6, v1.0.7 Dec 30, 2024
@TC117 TC117 modified the milestones: v1.0.7, v1.0.9 Jan 7, 2025
@TC117 TC117 self-assigned this Jan 7, 2025
@TC117 TC117 modified the milestones: v1.0.9, 1.0.10 Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: model management Model pull, yaml, model state
Projects
Status: QA
Development

No branches or pull requests

6 participants