Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI structured outputs support #1307

Open
antoniomdk opened this issue Oct 1, 2024 · 11 comments
Open

OpenAI structured outputs support #1307

antoniomdk opened this issue Oct 1, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested

Comments

@antoniomdk
Copy link
Contributor

Feature Request

I've been working with typia.llm.schema for a while and it has been extremely helpful in generating JSON schemas to call LLMs from TS types. However, the new structured outputs API of OpenAI has some limitations in the type of schemas it can take.

In particular nullable is not been taken into account. So it'd be great if we could map types X | null to anyOf. Maybe introducing a new flag to the typia.llm.schema function.

Also, for types that don't extend from Record, we should mark [additionalProperties to false](https://platform.openai.com/docs/guides/structured-outputs/additionalproperties-false-must-always-be-set-in-objects).

I can contribute to this feature, but I may need some pointers for code references to start.

@samchon samchon added enhancement New feature or request good first issue Good for newcomers labels Oct 1, 2024
@samchon samchon added the help wanted Extra attention is needed label Oct 1, 2024
@samchon
Copy link
Owner

samchon commented Oct 1, 2024

T | null type cannot be oneOf type, because it is the specification of JSON schema (of OpenAPI v3.0) that OpenAI has adopted. Writing T | null type as oneOf type, it is allowed since JSON schema 2020-12 draft version (of OpenAPI v3.1).

By the way, OpenAI understands only understands the anyOf type? Currently, @samchon/openapi and typia are utilizing oneOf type for the TypeScript union type case, because oneOf type has clear meaning than anyOf type.

@samchon samchon removed the enhancement New feature or request label Oct 1, 2024
@samchon
Copy link
Owner

samchon commented Oct 1, 2024

Also, about the additionalProperties to be false, it should be a little bit careful.

The additionalProperties := false means that it does not allow any type of superfluous properties. In the validation rule, if there're any extra property that is not defined in the properties, it must be considered as invalid. It is the additionalProperties := false meaning.

Therefore, if you want to contribute to typia.llm.application<App>() and typia.llm.schema<T>() functions, you have to be careful about the rule.

Here is the code filling the ILlmSchema.IObject.additionalProperties property, and you can accomplish what you want just by changing the return type of the join() function from ILlmSchema | undefined to ILlmSchema | false.

/**
* @internal
*/
const join = (extra: ISuperfluous): ILlmSchema | undefined => {
// LIST UP METADATA
const elements: [Metadata, ILlmSchema][] = Object.values(
extra.patternProperties || {},
);
if (extra.additionalProperties) elements.push(extra.additionalProperties);
// SHORT RETURN
if (elements.length === 0) return undefined;
else if (elements.length === 1) return elements[0]![1]!;
// MERGE METADATA AND GENERATE VULNERABLE SCHEMA
const meta: Metadata = elements
.map((tuple) => tuple[0])
.reduce((x, y) => Metadata.merge(x, y));
return llm_schema_station({
blockNever: true,
attribute: {},
metadata: meta,
});
};

@samchon samchon added the question Further information is requested label Oct 1, 2024
@antoniomdk
Copy link
Contributor Author

I haven't found any info about if OpenAI supporting oneOf, they do mention they support anyOf, but I agree that oneOf should be right type, (doesn't make any sense for a type to be null and not null at the same time). That's why I was suggesting putting this behavior changes under a flag or making the user explicitly ask for that, because it deviates from OpenAPI & JSON schema standards.

@antoniomdk
Copy link
Contributor Author

antoniomdk commented Oct 1, 2024

@samchon
Copy link
Owner

samchon commented Oct 1, 2024

How about the other models?

In the Google Gemini case, it is using the OpenAPI v3.0.3 specified JSON schema, but not supporting oneOf.

OpenAI, it sometimes looks like using OpenAPI v3.1, and sometimes v3.0. It supports mixed-in types embodied by type: ["string", "null"], but not supporting tuple type embodied by { type: "array", prefixItems: [A, B, C] }. I need to study and test OpenAI deeply at next weekend.

@samchon
Copy link
Owner

samchon commented Oct 1, 2024

To support LLM function calling feature exactly, I should separate the providers like below.

  • Top level namespaces
    • typia.openai.application<App>(): ILlmApplication<IOpenAiSchema>
    • typia.gemini.application<App>(): ILlmApplication<IGeminiSchema>
    • typia.llama.application<App>(): ILlmApplication<ILlamaSchema>
  • Nested namespaces
    • typia.llm.openai.application<App>()
    • typia.llm.gemini.application<App>()
    • typia.llm.llama.application<App>()
  • Generic Argument
    • typia.llm.application<App, "openai">()
    • typia.llm.application<App, "gemini">()
    • typia.llm.application<App, "llama">()

@samchon
Copy link
Owner

samchon commented Oct 2, 2024

@antoniomdk If you send an PR about additionalProperties, I'll accept it.

Also, about the manipulating specific LLM provider's schema, I'll prepare the major update.

It would be @samchon/[email protected] and [email protected].

@antoniomdk
Copy link
Contributor Author

@samchon That sounds great! I think the LLM-specific separation makes a lot of sense. I'll send a PR for additionalProperties by EOW (probably during the weekend).

@bradleat
Copy link

Related to LLM structured outputs, I find that when prompting I often want to use the jsdoc comment for a type in the prompt. Can typia add a misc method for returning the jsdoc string of a particular type.

Using typia.reflect.metadata can get you this information, but it'd be nice to just get the jsdoc comment.

@samchon
Copy link
Owner

samchon commented Nov 15, 2024

@antoniomdk, @bradleat https://github.com/samchon/openapi/blob/v2.0/src/structures/IChatGptSchema.ts

I'm preparing the OpenAI dedicated schema type as IChatGptSchema in the next version of @samchon/openapi and typia.

Here is the type, and I'll test it by using the ChatGPT API, and considering below things.

  • Whether to adapt $ref type to every name schemas, or just only for the recursive types
  • Whether to just use oneOf type and its discriminator property for clear union type predication
  • Whether to use const type or enum property
    • OpenAI's document supports JSON schema v7 specification (OpenApi.IJsonSchema)
    • However, example of OpenAI shows that only using anyOf
    • Also, const is clear that enum, but example is just utilizing the enum

If you want to experience it earlier, install typia@next version, and call the typia.llm.application<App, "chatgpt">().

npm install typia@next

@samchon samchon added the enhancement New feature or request label Nov 15, 2024
@samchon

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested
Projects
Status: No status
Status: No status
Status: To do
Development

No branches or pull requests

3 participants