Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nullable formats for Openapi #209

Merged
merged 7 commits into from
Jan 28, 2025

Conversation

moberegger
Copy link
Contributor

@moberegger moberegger commented Jan 21, 2025

I noticed some inconsistency in behaviour with how nullability works with the format keyword. Figured I'd put up a PR instead of just throwing it over the fence with an issue; you do enough hard work so thought I'd at least try to save you from some! 😆

My assessment here may not be correct; I'm rather new-ish to Openapi and JSON Schema, so my understanding of the specification and expected behaviour might be wrong. Also not sure if my proposed solution here is the best path forward. My goal is more so to surface the problem I'm seeing - or at least what I think is a problem. I won't be offended if you don't accept it 😆.

Anyways!

Let's say I have a schema to validate that a value is either a date-time string or null. I can do so like this:

date_schema =
  JSONSchemer.schema(
    { type: %w[string null], format: 'date-time' },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )

date_schema.valid_schema? # true (just to confirm it's well formed)
date_schema.valid?('2025-01-20T21:34:48-05:00') # true
date_schema.valid?(nil) # true

Cool! But if I try the same thing with int32:

int32_schema =
  JSONSchemer.schema(
    { type: %w[integer null], format: 'int32' },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )

int32_schema.valid_schema? # true (just to confirm it's well formed)
int32_schema.valid?(1) # true
int32_schema.valid?(nil) # false: "value at root does not match format: int32"

That same behaviour happens for int32, int64, double, and float (ie. the formats that Openapi add).

I compared the implementation of the Openapi formatters to the ones in the JSON schema 2020 draft and noticed that the JSON schema ones would simply return true if the input wasn't a String. This in effect meant that it would be true for a nil input. I thought that it might be appropriate to take a similar approach to the Openapi formatters and have them return true when the input is nil as well, which gets things working for the above example.

int32_schema =
  JSONSchemer.schema(
    { type: %w[integer null], format: 'int32' },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )

int32_schema.valid_schema? # true (just to confirm it's well formed)
int32_schema.valid?(1) # true
int32_schema.valid?(nil) # is now true!

Now, the tradeoff here is what happens when you don't provide a type and only provide a format.

int32_schema =
  JSONSchemer.schema(
    { format: 'int32' },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )

int32_schema.valid_schema? # true (just to confirm it's well formed)
int32_schema.valid?(1) # true
int32_schema.valid?(nil) # also true

This does seem awkward, but it is actually consistent with how the JSON schema formatters work.

date_schema =
  JSONSchemer.schema(
    { format: 'date-time' },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )
date_schema.valid_schema? # true (just to confirm it's well formed)
date_schema.valid?('2025-01-20T21:34:48-05:00') # true
date_schema.valid?(nil) # also true

To be honest, I don't know what the expected behaviour should be here. I promise I did my best to look through both Openapi and JSON schema specifications to get a better understanding of how nullable formats should work, but I wasn't able to find anything that cleared that up for me.

I admit that I may be overstepping my bounds here by making the formatters more lenient with nil. It doesn't feel right, but since I observed that the same behaviour was happening with the JSON Schema formatters, I thought that maybe this was an appropriate solution.

It's worth noting that nullable formats can be achieved by using oneOf

int32_union_schema =
  JSONSchemer.schema(
    { oneOf: [{ type: 'integer', format: 'int32' }, { type: 'null' }] },
    meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
  )

int32_union_schema.valid_schema? # true (just to confirm it's well formed)
int32_union_schema.valid?(1) # true
int32_union_schema.valid?(nil) # true

so perhaps this is the correct way to specify a nullable format?

@moberegger moberegger marked this pull request as ready for review January 21, 2025 18:40
Copy link

@fatemeh-affinity fatemeh-affinity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense!

@moberegger
Copy link
Contributor Author

moberegger commented Jan 22, 2025

Thinking about this some more, I'm wondering if the proposal should be the inverse of this. The schema's type should not relax the constraint provided by format, which would mean that the way int32 works is correct (ie. something that is not an Integer inherently cannot be of the int32 format) and it's the string-based formats that should be more strict.

Consider a slightly more complex schema like

schema = JSONSchemer.schema({ type: %w[string boolean null], format: 'email' })
schema.valid_schema? # true
schema.valid?('[email protected]') # true
schema.valid?('test') #false, with "value at root does not match format: email"
schema.valid?(true) # true, but should be false since input is not an email

true is not in the format of email, and thus should not be valid.

We could instead make the format there more strict by returning false for any input that is not a String (ie. it cannot actually run the logic to validate whether or not it's an email). So perhaps those string-based formats can be adjusted to be something like

EMAIL = proc do |instance, _format|
  instance.is_a?(String) && instance.ascii_only? && valid_email?(instance)
end

which now gives us

schema = JSONSchemer.schema({ type: %w[string boolean null], format: 'email' })
schema.valid_schema? # true
schema.valid?('[email protected]') # true
schema.valid?('test') # false, with "value at root does not match format: email"
schema.valid?(true) # false, with "value at root does not match format: email"

Again, not sure what the specification(s) say about this. A colleague of mine recalls reading something in the specification that suggests that keywords shouldn't relax a constraint from another keyword... although we haven't been able to find it again 😞.

@davishmcclurg
Copy link
Owner

Hi @moberegger, this is great—thanks for opening it!

I believe your initial thought is correct. format should only apply to the relevant instance types. From the JSON Schema spec:

A format attribute can generally only validate a given set of instance types. If the type of the instance to validate is not in this set, validation for this format attribute and instance SHOULD succeed.

For the OpenAPI formats, I don't think checking nil? is quite right, though. Your boolean type addition is a good example (using the changes from this PR):

?> JSONSchemer.schema(
?>   { type: %w[string boolean null], format: 'date-time' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(true)
=> true
?> JSONSchemer.schema(
?>   { type: %w[integer boolean null], format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(true)
=> false

I think what we want is something like:

'int32' => proc { |instance, _format| !instance.is_a?(Numeric) || (instance.is_a?(Integer) && instance.bit_length <= 32) }

Which gives:

?> JSONSchemer.schema(
?>   { type: %w[integer boolean null], format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(true)
=> true

There's an additional question of how strict we should be. Since int32 and int64 are only meant to apply to integer types, we could use !instance.is_a?(Integer) for the check, but that could be surprising for floats:

?> JSONSchemer.schema(
?>   { type: %w[integer boolean null], format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(2147483648)
=> false
?> JSONSchemer.schema(
?>   { type: %w[integer boolean null], format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(2147483648.0)
=> true

I think if we go down the more strict path, we'd need to copy the logic from valid_type? (which changes after draft 4 for integer):

I'm leaning toward the more strict approach, but let me know what you think.


I also noticed that the int32 and int64 bit_length checks are wrong! I believe it should be instance.bit_length < 32 not instance.bit_length <= 32. It currently allows ints over the max value (I fixed it for some of the examples above):

?> JSONSchemer.schema(
?>   { type: 'integer', format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(2147483648)
=> true

@moberegger
Copy link
Contributor Author

I believe your initial thought is correct. format should only apply to the relevant instance types.

Nice! That is reassuring. It means I don't have to fix all of my schemas 😆. Also appreciate the link to the relevant part of the specification!

For the OpenAPI formats, I don't think checking nil? is quite right, though.

Yeah, I only thought it was an issue specifically with nullss. Like... I thought that was the only thing that needed an exception to bypass the format validation. It was only after the fact that I realized I wasn't quite on the right track. I'll kick tires on this and try to get something working that doesn't bias towards that specific nil check.

@moberegger
Copy link
Contributor Author

There's an additional question of how strict we should be. Since int32 and int64 are only meant to apply to integer types, we could use !instance.is_a?(Integer) for the check, but that could be surprising for floats:

> JSONSchemer.schema(
?>   { type: %w[integer boolean null], format: 'int32' },
?>   meta_schema: 'https://spec.openapis.org/oas/3.1/dialect/base',
>> ).valid?(2147483648.0)
=> true

According to https://json-schema.org/understanding-json-schema/reference/numeric#integer

Numbers with a zero fractional part are considered integers (ex: 1.0)
...
Floating point numbers are rejected (ex: 3.1415926)

It reads like your example of 2147483648.0 should in fact be true, but something like 2147483648.123 should be false. So perhaps what you observed there isn't surprising after all. So for int32 and int64, a non-Integer like 2147483648.123 could be truthy on the format validation (ie. !instance.is_a?(Integer)) but would still result in false from the type: integer check.

@moberegger
Copy link
Contributor Author

I also noticed that the int32 and int64 bit_length checks are wrong! I believe it should be instance.bit_length < 32 not instance.bit_length <= 32. It currently allows ints over the max value (I fixed it for some of the examples above)

My apologies for the shotgun of comments...

Would you like me to slip that change into this PR? Or would you prefer to handle it separately?

@moberegger
Copy link
Contributor Author

moberegger commented Jan 22, 2025

Think there may be a similar problem with Openapi 3.0's byte and binary formats.

'byte' => proc { |instance, _value| ContentEncoding::BASE64.call(instance).first },
'binary' => proc { |instance, _value| instance.is_a?(String) && instance.encoding == Encoding::BINARY },

According to https://spec.openapis.org/oas/v3.0.4.html#data-type-format those should only apply to strings. So perhaps those should also become

'byte' => proc { |instance, _value| !instance.is_a?(String) || ContentEncoding::BASE64.call(instance).first },
'binary' => proc { |instance, _value| !instance.is_a?(String) || instance.encoding == Encoding::BINARY },

@davishmcclurg
Copy link
Owner

It reads like your example of 2147483648.0 should in fact be true, but something like 2147483648.123 should be false. So perhaps what you observed there isn't surprising after all. So for int32 and int64, a non-Integer like 2147483648.123 could be truthy on the format validation (ie. !instance.is_a?(Integer)) but would still result in false from the type: integer check.

I think the issue is the float version won't go through the bit_length check even though it's considered an "integer":

>> integer_type_only_schemer = JSONSchemer.schema({ 'type' => 'integer' }, :meta_schema => JSONSchemer.openapi31)
>> integer_type_only_schemer.valid?(4294967296)
=> true
>> integer_type_only_schemer.valid?(4294967296.0)
=> true
>> integer_type_and_uint32_schemer = JSONSchemer.schema({ 'type' => 'integer', 'format' => 'int32' }, :meta_schema => JSONSchemer.openapi31)
>> integer_type_and_uint32_schemer.valid?(4294967296)
=> false
>> integer_type_and_uint32_schemer.valid?(4294967296.0)
=> true

That's pretty picky and it's unclear to me what the behavior should be exactly. Let me know what you think. The spec has a relevant note (though it didn't do much to clear things up for me):

Note that the "type" keyword in this specification defines an "integer" type which is not part of the data model. Therefore a format attribute can be limited to numbers, but not specifically to integers. However, a numeric format can be used alongside the "type" keyword with a value of "integer", or could be explicitly defined to always pass if the number is not an integer, which produces essentially the same behavior as only applying to integers.


Would you like me to slip that change into this PR? Or would you prefer to handle it separately?

No that's alright. I'll handle it separately once this is merged.

Think there may be a similar problem with Openapi 3.0's byte and binary formats.

Good catch! Thanks for fixing that.

@moberegger
Copy link
Contributor Author

moberegger commented Jan 27, 2025

Oooooh, OK. I see what you mean now. I wasn't cluing in that 4294967296 was over the int32 limit. I was like "interesting choice to use for an example" 😆.

I agree; we should be strict here. As a consumer, I would expect that anything truthy for type: integer should have the format applied.

I've made a change to the Openapi 3.1 int32 and int64 formats to accommodate that. Don't like that I copy+pasted some code, but didn't know how else to do it.

The Openapi 3.0 int32 and int64 formats only check if the input is an Integer as per draft 4. Openapi 3.0 uses draft 5 of JSON Schema, but my understanding is that draft 5 is just a tidied up version of draft 4

These specifications were intended as modernized and tidied versions of the specifications referenced by the “Draft-04” meta-schemas, so those draft-04 meta-schemas should continue to be used.

I've also extended the tests to include the following:

  • Openapi 3.1 schemas just using format
  • Openapi 3.1 schemas using format with appropriate type
  • Openapi 3.1 schemas using format with a multi type including null
  • Openapi 3.0 schemas just using format
  • Openapi 3.0 schemas using format with appropriate type
  • Openapi 3.0 schemas using format with nullable key word

My hope is to capture the subtle differences between the two whilst also having some coverage over representative use cases.

@davishmcclurg
Copy link
Owner

Oooooh, OK. I see what you mean now. I wasn't cluing in that 4294967296 was over the int32 limit. I was like "interesting choice to use for an example" 😆.

Haha that's my bad—should've been more clear about what I meant.

I agree; we should be strict here. As a consumer, I would expect that anything truthy for type: integer should have the format applied.

I've made a change to the Openapi 3.1 int32 and int64 formats to accommodate that.

Excellent! Thanks so much for the help. And I really appreciate the thorough tests 👍

Don't like that I copy+pasted some code, but didn't know how else to do it.

I might follow up by moving the valid_type logic into keyword class methods, but that's just organizational. Merging—thanks again!

@davishmcclurg davishmcclurg merged commit 09841e7 into davishmcclurg:main Jan 28, 2025
29 checks passed
@moberegger
Copy link
Contributor Author

moberegger commented Jan 28, 2025

My pleasure, honestly. Your feedback is always insightful and helpful, and really made this quite enjoyable for me. I learned a lot.

I'm impressed with your attention to detail with the specs; I mean, I always presumed coding to these specs must have been a lot of work... but even just going through with this comparatively small change really opened my eyes to how much there is to think about. My goodness.

@davishmcclurg
Copy link
Owner

My pleasure, honestly. Your feedback is always insightful and helpful, and really made this quite enjoyable for me. I learned a lot.

Glad to hear it! And thank you for the kind words. I just released a new version of the gem with your changes.

I ran into an issue changing the bit_length checks that I thought you might find interesting: 386c2a6

>> (2.pow(63) - 1)
=> 9223372036854775807
>> (2.pow(63) - 1).to_f
=> 9.223372036854776e+18
>> (2.pow(63) - 1).to_f.to_i > (2.pow(63) - 1)
=> true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants