Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 Clarification: Optional properties and null vs. undefined when used in languages like Python that only has a single None type #1586

Open
Ark-kun opened this issue Feb 21, 2025 · 6 comments

Comments

@Ark-kun
Copy link

Ark-kun commented Feb 21, 2025

Specification section

?

What is unclear?

Please help us.

Pydantic v2 started converting Python's Optional[str] type to {"anyOf":[{"type":"string"}, {"type":"null"}]} Json Schema instead of an optional string property.This breaks many existing tools that use JsonSchemas, but the maintainer claims that JsonSchema is designed this way. pydantic/pydantic#7161
Please help us get clarity whether this is really what Json Schema spec design intends.

I want to ask whether this is indeed the intention of JsonSchema design and if it's not the case, then hopefully the maintainers can be persuaded to restore the previous behavior.

Problem background:
Javascript has null and undefined types.
Python has None singleton type. It's automatically used in some cases. For example, when function does not return anything, the actual returned value is None.

Let's look at this simple JsonSchema that has an optional field:

{
  "title": "Something",
  "type": "object",
  "properties": {
    "requiredProp": {"type": "string"},
    "optionalProp": {"type": "string"},
  "required": [ "requiredProp"]
}

Now let's try to represent such schema using Python:

class Something:
  requiredProp: str
  optionalProp: Optional[str]

For this type, Pydantic v2 produces the following JsonSchema:

{
  "title": "Something",
  "type": "object",
  "properties": {
    "requiredProp": {
      "title": "Requiredprop",
      "type": "string"
    },
    "optionalProp": {
      "title": "Optionalprop",
      "anyOf": [
        {"type": "string"},
        {"type": "null"}
      ]
    }
  },
  "required": ["requiredProp", "optionalProp"]
}

Notice that the "optionalProp" is required and it's type declaration is {"anyOf":[{"type":"string"}, {"type":"null"}]}.

And if we slightly change the class to add the default value:

class Something:
  requiredProp: str
  optionalProp: Optional[str] = None

some_obj = Something(requiredProp="foo")

The generated schema becomes

{
  "title": "Something",
  "type": "object",
  "properties": {
    "requiredProp": {
      "title": "Requiredprop",
      "type": "string"
    },
    "optionalProp": {
      "title": "Optionalprop",
      "anyOf": [
        {"type": "string"},
        {"type": "null"}
      ]
    }
  },
  "required": ["requiredProp"]
}

The optionalProp type declaration still remains {"anyOf":[{"type":"string"}, {"type":"null"}]}.

So it's not possible to generate a normal optional string property.

Is it the intention of JsonSchema that programming languages that do not have the undefined/null duality of Javascript cannot adhere to simple JSON schemas with simple optional properties?

Would it be OK to treat Python's None as Javascript's undefined in cases of optional function/constructor parameters or are these types considered to be fundamentally different?

Proposal

I propose to clarify that in non-JS languages optional properties with the default None/NULL/nil value can be treated as Javascript's undefined and can be described using JsonSchema's optional property mechanism.

Do you think this work might require an [Architectural Decision Record (ADR)]? (significant or noteworthy)

No

@gregsdennis
Copy link
Member

gregsdennis commented Feb 21, 2025

I think @Julian is probably the best person to comment on Python-specific things.


I do have a question: would you consider this to be valid data?

{
  "requiredProp": "",
  "optionalProp": null
}

Specifically, is a null value interpreted by your code the same as the property just being absent?

I'd guess that the Pydantic folks might think it is valid if the property is optional (null and absence are the same), whereas maybe you don't.


As far as JSON Schema is concerned, a property with a null value is distinct from the absence of that property. This is the design intent of JSON Schema.

@Julian
Copy link
Member

Julian commented Feb 21, 2025

(The JSON Schema spec doesn't cover schema generation from a language's types, so a bit of this discussion will always be groundless. But nevertheless, yes, opinions below.)

What you're asking is mostly about "shortcomings" in the typing annotation system in Python really more than anything else I think. And I put "shortcomings" in quotes here because the case where this matters -- at least when it comes to classes -- is one I would call a bad idea in Python, so I don't personally cry too hard about it not being possible.

Optional[str] in Python, as it seems has been pointed out in the ticket there, is simply shorthand for str | None.
There is no way to express the concept of "might not exist" as part of a normal class. E.g. for your example:

class Something:
  requiredProp: str
  optionalProp: Optional[str] = None

I disagree that even this expresses the JSON / JSON Schema notion of "optionalProp may not be present". That notion is expressed by typing.NotRequired in the case of dicts, and in the case of classes it.. does not exist (and above I called it a bad idea, I think it is for any use case other than using class syntax to generate schemas).

Specifically, for classes it really would look like:

class _S1:
    requiredProp: str
    optionalProp: str

class _S2:
    requiredProp: str

Something: _S1 | _S2

but there's no shorthand for that, and clearly it's untenable for multiple such properties -- and again I think for normal Python classes it's ridiculous to design one which sometimes doesn't have an attribute (but this is the direct parallel to dicts not having a key).

I propose to clarify that in non-JS languages optional properties with the default None/NULL/nil value can be treated as Javascript's undefined and can be described using JsonSchema's optional property mechanism.

I short I'd disagree with that both from a JSON Schema perspective and from a Python developer's perspective, though one not really familiar with Pydantic's norms.

I'm not saying this solves the upstream problem, just that "treat None specially" seems very wrong. To me my first guess would be an annotation a la NotRequired for non-TypedDicts is the right shape of solution.

@jdesrosiers
Copy link
Member

As Julian said, the spec doesn't cover how schemas map to a language's type system, but I can share my opinion.

First, a couple things the keep in mind. Remember that JSON Schema describes JSON, not JavaScript and undefined is not a feature of JSON. The absence of a value is effectively the same concept, but you can't assign something to be undefined ({ "foo": undefined }) like you can in JavaScript. Also, as Greg pointed out, null isn't the same as undefined. In JSON null is a value, not an indicator of the absence of a value as it is in most languages. In the same way that boolean is a type with two possible values (true and false), null is a type with one possible value (null). When a JSON Schema says a property is null, it means it must preset with the value null.

IMO, that makes JSON's null a JSON-specific concept that should be avoided unless you're specifically trying model JSON that has nulls and you definitely shouldn't equate it to common concepts of null/nil/None/etc. Since there's no concept in Python that translates to JSON's concept of null, I wouldn't expect it to ever generate schemas that use null. I think it makes the most sense to equate Python's None with the absence of a value in JSON.

So, Something(requiredProp="foo", optionalProp=None) should be considered equivalent to { "requiredProp": "foo" }.

I don't think using a None default value should make any difference to the generated schema. The instance created from instance1 = Something(requiredProp="foo", optionalProp=None) and instance2 = Something(requiredProp="foo") are indistinguishable. Both instance1.optionalProp and instance2.optionalProp have None. Therefore the JSON representation should be the same as well.

This approach also has the benefit of resulting in simpler and more idiomatic JSON Schemas.

Again, there's no official correct or incorrect way to do this. This is just my recommendation.

@Julian
Copy link
Member

Julian commented Feb 21, 2025

(Responding again just in case you didn't know the below Jason, but if you did and still think your way obviously all fine to disagree:

Since there's no concept in Python that translates to JSON's concept of null, I wouldn't expect it to ever generate schemas that use null. I think it makes the most sense to equate Python's None with the absence of a value in JSON.

Python's None serializes as null, and it's very common to have None wherever you'd like in Python as a real value, so the equivalence is there already / I think that ship has long sailed, which is why I disagreed (strongly) with:

So, Something(requiredProp="foo", optionalProp=None) should be considered equivalent to { "requiredProp": "foo" }.

@jdesrosiers
Copy link
Member

jdesrosiers commented Feb 21, 2025

Thanks for the correction Julian! It's been a while since I've written Python and didn't remember that correctly. That means that Python's None is equivalent to JSON's null. In that case, generating schemas from Python using JSON null is logically sound.

However, JSON that uses null to represent absent values is not idiomatic JSON and makes schemas unnecessarily complex, awkward, and renders some JSON Schema keywords unusable. That's the problem that originally motivated this question. So, I still think it would be best to equate None with not-present even though null isn't technically wrong. I recognize that that could cause some friction in the Python ecosystem that serializes nulls by default. If it's not too hard to get around that, you could make the lives of the users of your JSON and JSON Schemas much easier.

@gregsdennis
Copy link
Member

@Ark-kun does the above answer your questions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants