-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add documentation for Structured Outputs #9943
[Doc] Add documentation for Structured Outputs #9943
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
fd5dfe5
to
734fa12
Compare
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
590e800
to
9914dcf
Compare
Nice! Thank you for the PR.
|
Signed-off-by: ismael-dm <[email protected]>
@simon-mo sure, I added the link to OpenAI server page and changed the models to instruct variant (an re-tested the examples). But now a couple of automatic tests are failling, any idea why? |
@simon-mo all the tests are good now and the requested changes were applied |
Hey, just found your PR as I am struggling with struct output on openai VLLM, do you have any insights on this? But for some reason, when using a pydantic model I get a bad output and whithout it it works. My code is this: import openai
MODEL_NAME = "internlm/internlm2_5-7b-chat"
vllm_client = openai.AsyncOpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy")
class PersonInfo(BaseModel):
name: str = Field(description="The name of the person")
age: int = Field(description="The age of the person")
json_schema = PersonInfo.model_json_schema()
async def instruct_call(prompt: str = "My friend Pedro is 25 years old", **kwargs):
response = await vllm_client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a helpful assistant that generates JSON objects."},
{"role": "user", "content": "Get the name and age from the following text: " + prompt}],
**kwargs
)
return response.choices[0].message.content
print("="*80)
print("Structured output asking for JSON")
print(asyncio.run(instruct_call()))
print("="*80)
print("Structured output with JSON schema")
print(asyncio.run(instruct_call(extra_body={"guided_json": json_schema})))
# ================================================================================
# Structured output asking for JSON
#
# {
# "name": "Pedro",
# "age": 25
# }
#
# ================================================================================
# Structured output with JSON schema
# {"name":"Weird","age":25}``` |
Not sure tbh. It seems that it´s generating a valid json, but the values are not properly filled. What I´ve seen is that some models perform better when filling out the fields of structured outputs. I´ve had the best results with Qwen models. Also, trying to iterate a little bit over the prompt may help, like giving some examples of the desired output. Are you using the latest vllm version? If any of these things work maybe the best option is to open an issue to discuss further. |
I feel we are missing the |
Signed-off-by: ismael-dm <[email protected]> Signed-off-by: Manjul Mohan <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: ismael-dm <[email protected]> Signed-off-by: rickyx <[email protected]>
Signed-off-by: ismael-dm <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Signed-off-by: ismael-dm <[email protected]>
Adding some documentation on the structured outputs (guided decoding) options that are available, together with two files of examples. Includes both online inference (OpenAI API) and offline inference.