Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zod => JSONSchema conversion creates references to unknown definitions #978

Closed
1 task done
airhorns opened this issue Aug 9, 2024 · 7 comments · Fixed by #979
Closed
1 task done

Zod => JSONSchema conversion creates references to unknown definitions #978

airhorns opened this issue Aug 9, 2024 · 7 comments · Fixed by #979
Labels
bug Something isn't working

Comments

@airhorns
Copy link

airhorns commented Aug 9, 2024

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • This is an issue with the Node library

Describe the bug

When compiling a zod schema with multiple references to the same nullable object, the compiled JSONSchema refers to definitions that don't exist. This is using the latest version of the openai client with structured output support. I believe the issue comes from an extracted definition trying to reference an inner extracted definition again -- see the example below.

To Reproduce

Here's an example zod schema and function call which triggers the issue:

// optional object that can be on each field, mark it as nullable to comply with structured output restrictions
const metadata = z.nullable(
  z.object({
    foo: z.string(),
  })
);

// union element a
const fieldA = z.object({
  type: z.literal("string"),
  name: z.string(),
  metadata,
});

// union element b, both referring to above nullable object
const fieldB = z.object({
  type: z.literal("number"),
  metadata,
});

// top level input object with array of union element
const model = z.object({
  name: z.string(),
  fields: z.array(z.union([fieldA, fieldB])),
});

const completion = await openai.beta.chat.completions.parse({
  model: "gpt-4o-2024-08-06",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant. Generate a data model according to the user's instructions.",
    },
    { role: "user", content: "create a todo app data model" },
  ],
  tools: [zodFunction({ name: "query", parameters: model })],
});
expect(completion.choices[0].message.tool_calls[0].function.parsed_arguments).toMatchInlineSnapshot();

When run, I get this error:

400 Invalid schema for function 'query': In context=('anyOf', '0'), reference to component 'query_properties_fields_items_anyOf_0_properties_metadata_anyOf_0' which was not found in the schema.

I ninja'd into the source and console.log'd the generated JSON schema, here's what comes out:

    {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "fields": {
          "type": "array",
          "items": {
            "anyOf": [
              {
                "type": "object",
                "properties": {
                  "type": {
                    "type": "string",
                    "const": "string"
                  },
                  "name": {
                    "type": "string"
                  },
                  "metadata": {
                    "anyOf": [
                      {
                        "type": "object",
                        "properties": {
                          "foo": {
                            "type": "string"
                          }
                        },
                        "required": [
                          "foo"
                        ],
                        "additionalProperties": false
                      },
                      {
                        "type": "null"
                      }
                    ]
                  }
                },
                "required": [
                  "type",
                  "name",
                  "metadata"
                ],
                "additionalProperties": false
              },
              {
                "type": "object",
                "properties": {
                  "type": {
                    "type": "string",
                    "const": "number"
                  },
                  "metadata": {
                    "$ref": "#/definitions/query_properties_fields_items_anyOf_0_properties_metadata"
                  }
                },
                "required": [
                  "type",
                  "metadata"
                ],
                "additionalProperties": false
              }
            ]
          }
        }
      },
      "required": [
        "name",
        "fields"
      ],
      "additionalProperties": false,
      "definitions": {
        "query_properties_fields_items_anyOf_0_properties_metadata": {
          "anyOf": [
            {
              "$ref": "#/definitions/query_properties_fields_items_anyOf_0_properties_metadata_anyOf_0"
            },
            {
              "type": "null"
            }
          ]
        },
        "query": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string"
            },
            "fields": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "type": {
                        "type": "string",
                        "const": "string"
                      },
                      "name": {
                        "type": "string"
                      },
                      "metadata": {
                        "anyOf": [
                          {
                            "type": "object",
                            "properties": {
                              "foo": {
                                "type": "string"
                              }
                            },
                            "required": [
                              "foo"
                            ],
                            "additionalProperties": false
                          },
                          {
                            "type": "null"
                          }
                        ]
                      }
                    },
                    "required": [
                      "type",
                      "name",
                      "metadata"
                    ],
                    "additionalProperties": false
                  },
                  {
                    "type": "object",
                    "properties": {
                      "type": {
                        "type": "string",
                        "const": "number"
                      },
                      "metadata": {
                        "$ref": "#/definitions/query_properties_fields_items_anyOf_0_properties_metadata"
                      }
                    },
                    "required": [
                      "type",
                      "metadata"
                    ],
                    "additionalProperties": false
                  }
                ]
              }
            }
          },
          "required": [
            "name",
            "fields"
          ],
          "additionalProperties": false
        }
      },
      "$schema": "http://json-schema.org/draft-07/schema#"
    }

Code snippets

No response

OS

macOS

Node version

v22.2.0

Library version

4.55.3

@airhorns airhorns added the bug Something isn't working label Aug 9, 2024
@airhorns
Copy link
Author

airhorns commented Aug 9, 2024

This might be the same issue as the second issue discussed in #970, but that issue is marked as closed and this is still happening to me. Hopefully the above example serves as a good test case!

@RobertCraigie
Copy link
Collaborator

Thanks for the bug report, I'm investigating.

@RobertCraigie
Copy link
Collaborator

@airhorns this should be fixed in v4.55.4!

@airhorns
Copy link
Author

airhorns commented Aug 9, 2024

Spectacular, thanks so much!

@airhorns
Copy link
Author

Hm, I'm not sure why but I'm still seeing an error on v4.55.4. This is my exact test case:

  test("test", async () => {
    const metadata = z.nullable(
      z.object({
        foo: z.string(),
      })
    );

    const fieldA = z.object({
      type: z.literal("string"),
      name: z.string(),
      metadata,
    });

    const fieldB = z.object({
      type: z.literal("number"),
      metadata,
    });

    const model = z.object({
      name: z.string(),
      fields: z.array(z.union([fieldA, fieldB])),
    });

    const completion = await openai.beta.chat.completions.parse({
      model: "gpt-4o-2024-08-06",
      messages: [
        {
          role: "system",
          content: "You are a helpful assistant. Generate a data model according to the user's instructions.",
        },
        { role: "user", content: "create a todo app data model" },
      ],
      tools: [zodFunction({ name: "query", parameters: model })],
    });
    expect(completion.choices[0].message.tool_calls[0].function.parsed_arguments).toMatchInlineSnapshot();
  });

And I checked and double checked that I am using openai v4.55.4:

pnpm why -r openai

[email protected] /Users/airhorns/Code/gadget/packages/data-science

dependencies:
@langchain/core 0.2.23
└─┬ langsmith 0.1.41
  ├─┬ langchain 0.2.15 peer
  │ └─┬ @langchain/openai 0.2.6
  │   └── openai 4.55.4
  └── openai 4.55.4 peer
@langchain/langgraph 0.0.33
└─┬ @langchain/core 0.2.23
  └─┬ langsmith 0.1.41
    ├─┬ langchain 0.2.15 peer
    │ └─┬ @langchain/openai 0.2.6
    │   └── openai 4.55.4
    └── openai 4.55.4 peer
@langchain/openai 0.2.6
├─┬ @langchain/core 0.2.23
│ └─┬ langsmith 0.1.41
│   └── openai 4.55.4 peer
└── openai 4.55.4
api link:../api
└─┬ data-science link:
  ├─┬ @langchain/core 0.2.23
  │ └─┬ langsmith 0.1.41
  │   ├─┬ langchain 0.2.15 peer
  │   │ └─┬ @langchain/openai 0.2.6
  │   │   └── openai 4.55.4
  │   └── openai 4.55.4 peer
  └─┬ @langchain/langgraph 0.0.33
    └─┬ @langchain/core 0.2.23
      └─┬ langsmith 0.1.41
        ├─┬ langchain 0.2.15 peer
        │ └─┬ @langchain/openai 0.2.6
        │   └── openai 4.55.4
        └── openai 4.55.4 peer

Could it be that I'm using zod for function parsing and the newly added test uses it as a response format?

@RobertCraigie
Copy link
Collaborator

hmm sorry about that, I'll look into it.

Could it be that I'm using zod for function parsing and the newly added test uses it as a response format?

I think that'd be unlikely, they use the same function for generating the JSON schema 🤔

@ZijiaZhang
Copy link
Contributor

ZijiaZhang commented Aug 12, 2024

Is there a reason that the $ref is used in the schema? I tested using the schema without ref ($refStrategy: 'none'), it passes the check. And in the OpenAI playground, the schema that used ref and without ref seems to cost same input tokens.

Also when I test using the 'z.literal' (or 'const' in json schema) it does not always match it. But when I change it to enum, it can always match it.

Edit: Checked the doc showing the example of the $ref with recursive here. Sorry I missed it previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants