You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a app to visualise logprobes, one Feature is to restart generation on a token, so if the model responds with text you can pick and chose a token and restart generation from there on with another suggestion (basically forcing a different logprob path to take).
For this in need a way to complete partial LLM responses (this might be in the middle of a response.
Feature Idea
I need a way to disable prompt formatting(and I can take care for prompt formatting and preparation) or a way to "restart" response generation on a partial message.
do other engine do this:
yes ollama has a "raw" flag (even tho not in openai compat mode)
vllm can use complete normal text so you can use the python API to input your specially crafter prompt
why I want this, I came across a paper that states most llms will use cot on its own given all logprobes are sampled from the first token (or at least it's likely)
a cool quality of life feature would also be a formatting endpoint which just returns the formatted prompt -> openai compatible request to formatted string
Problem Statement
I am building a app to visualise logprobes, one Feature is to restart generation on a token, so if the model responds with text you can pick and chose a token and restart generation from there on with another suggestion (basically forcing a different logprob path to take).
For this in need a way to complete partial LLM responses (this might be in the middle of a response.
Feature Idea
I need a way to disable prompt formatting(and I can take care for prompt formatting and preparation) or a way to "restart" response generation on a partial message.
do other engine do this:
yes ollama has a "raw" flag (even tho not in openai compat mode)
vllm can use complete normal text so you can use the python API to input your specially crafter prompt
why I want this, I came across a paper that states most llms will use cot on its own given all logprobes are sampled from the first token (or at least it's likely)
https://arxiv.org/pdf/2402.10200
The text was updated successfully, but these errors were encountered: