Allow bypassing the chat template via api #1862

K0IN · 2025-01-15T20:28:14Z

Problem Statement

I am building a app to visualise logprobes, one Feature is to restart generation on a token, so if the model responds with text you can pick and chose a token and restart generation from there on with another suggestion (basically forcing a different logprob path to take).

For this in need a way to complete partial LLM responses (this might be in the middle of a response.

Feature Idea

I need a way to disable prompt formatting(and I can take care for prompt formatting and preparation) or a way to "restart" response generation on a partial message.

do other engine do this:
yes ollama has a "raw" flag (even tho not in openai compat mode)

vllm can use complete normal text so you can use the python API to input your specially crafter prompt

why I want this, I came across a paper that states most llms will use cot on its own given all logprobes are sampled from the first token (or at least it's likely)

https://arxiv.org/pdf/2402.10200

K0IN · 2025-01-15T21:42:35Z

a cool quality of life feature would also be a formatting endpoint which just returns the formatted prompt -> openai compatible request to formatted string

github-project-automation bot added this to Menlo Jan 15, 2025

github-project-automation bot moved this to Investigating in Menlo Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow bypassing the chat template via api #1862

Allow bypassing the chat template via api #1862

K0IN commented Jan 15, 2025

K0IN commented Jan 15, 2025

Allow bypassing the chat template via api #1862

Allow bypassing the chat template via api #1862

Comments

K0IN commented Jan 15, 2025

Problem Statement

Feature Idea

K0IN commented Jan 15, 2025