Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

Closed
epoberezkin opened this issue Oct 11, 2016 · 31 comments
Milestone

Comments

@epoberezkin
Copy link
Member

https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L161

At the moment it is only $schema, but given the discussion around id it may change ...

@epoberezkin epoberezkin changed the title spec: note that some meta-keywords can only be allowed in the top level of the root schema spec: note that some (meta?)keywords can only be allowed in the top level of the root schema Oct 11, 2016
@awwright
Copy link
Member

I don't believe "$schema" is currently allowed only at the root level.

draft-04 mandates it at the root level; in practice nobody seems to do this, so I reduced the language to merely SHOULD.

A JSON Schema validator might not even know the difference between a root schema and a sub-schema, the only reason "$schema" is suggested for the root schema is so that every schema (root schema, sub-schema) is guaranteed to have a deterministic "$schema" and URI associated with it.

And in fact, I provided an example a couple weeks ago where a schema can speak multiple vocabularies if so desired, by using "$schema" multiple times: #67 (comment)

@epoberezkin
Copy link
Member Author

I see.

At the same time the standard clearly defines root schema as a standalone JSON instance, so it can be given some special treatment...

I think it could be a good way to simplify things by not allowing multiple vocabularies in a single JSON Schema instance. Same for IDs - seems a good middle ground between supporting current "id" at any level approach and obliterating it completely. The current resolution scope change is rarely supported properly. $schema change - does any validator support it? Ajv doesn't, it only checks the root value of the instance and ignores it everywhere else.

But ok, we can leave this ambiguity in this version and hopefully resolve it later :)

@awwright just give it some thought please.

@epoberezkin
Copy link
Member Author

I would suggest replacing

The root schema of a JSON Schema document SHOULD use this keyword.

with something like

The root schema of a JSON Schema document SHOULD use this keyword at the top level.

https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L270

If we want to support $schema change within a single JSON schema instance, I would explicitly say it as well. Although few people would thank us I am afraid :)...

@awwright
Copy link
Member

Hmm, idk how that would impact "$ref".

Right now you can generally replace a "$ref" with the schema itself (that's what JSON Reference was intended to be, a late-bound transparent inclusion of a JSON document into another document).

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.


The two alternatives you list are saying the same thing, but perhaps in clearer or more terse language.

The there's only one root schema in a file, and it only consists of a root level (the definition excludes subschemas).

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 11, 2016

Ok about the language, you are right.

I don't think it is right to see $ref for the validation purposes as the equivalent to the file inclusion because it is not true for recursive and mutually recursive $refs. You can't replace {ref: '#'} with the schema. You also cannot replace if a refers to b and vice versa, which is absolutely needed for trees, graphs, etc.

The idea that $ref is equivalent to inclusion is also not consistent with https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L297, although I think the wording in that paragraph needs clarification because it allows multiple interpretations (was about to create a separate issue about it).

I don't think we have any choice but to acknowledge that $ref is essentially a validation keyword that instructs to perform validation of the current part of JSON instance against a referenced (sub)schema (that can be the current or another root schema or a subschema in the current or another root schema). The alternative is to not support recursion in schemas, but there are too many use cases for it (including meta-schema) for it to be a viable option.

EDIT: And if we do acknowledge that in some cases $ref is not the same as inclusion we may as well say that it is never the same as inclusion and then we don't have to worry about top-level only keywords appearing anywhere else.

@awwright
Copy link
Member

because it is not true for recursive and mutually recursive $refs

Which is why JSON Reference is late bound (maybe it chooses a different term, but it acknowledges recursion is possible and you might not be able to fully expand every document right away)

he idea that $ref is equivalent to inclusion is also not consistent with

That section could probably be worded better. But consider that the validator doesn't keep any state. So if I have two schemas

{ id: "alice", allOf: [ {$ref: "bob"} ] }
{ id: "bob", allOf: [ {$ref: "alice"} ] }

and I validate an instance against one of them... the validator should return an error (or success?), but definitely not get thrown into an infinite loop.

Or put another way, since the validation function is functional, if you're executing validate(instance, schema) multiple times in the course of a validation operation, you're doing something wrong because the function is always going to return the same result (it's getting passed the same input).

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 11, 2016

Another option is to say that validators SHOULD (rather than MUST) detect such endless recursion in schema definitions and return validation result as undefined (depending on the language it can be exception, Nothing, null etc., rather than "pass" or "fail").

I honestly think it is better than either pass or fail, it would force schema writers to avoid this ambiguity. I actually think that even infinite loop is better than pass or fail :)...

@awwright awwright added this to the draft-6 milestone Oct 11, 2016
@handrews
Copy link
Contributor

@epoberezkin By "return validation result as undefined" do you mean that the validator should error out?

@awwright
Copy link
Member

@epoberezkin I believe that's correct.

The language I use is "raise an error condition" or "return indeterminate".

@epoberezkin
Copy link
Member Author

That's right.

@epoberezkin
Copy link
Member Author

@awwright an additional argument against allowing $schema anywhere but the top level is that a schema is a JSON instance that should be valid according to the meta-schema with the URI in $schema keyword. If we want to support $schema in subschemas we also have to support $schema in all data instances in sub-objects. I am not really suggesting that - it's absurd.

@epoberezkin
Copy link
Member Author

I don't think we can have validation process of schema against meta-schema different from validation of data against the schema - it should be the same. So I also think that $ref should be the meta-schema - it should be treated as a special validation keyword, not as inlining of schema in place.

@epoberezkin
Copy link
Member Author

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.

Only while we continue to support resolution scope change, that there seems to be some desire to drop. Without resolution scope change $ref cannot ever be treated as schema inclusion in place.

@awwright
Copy link
Member

awwright commented Oct 12, 2016

You may have to walk me through this argument a bit more.

a schema is a JSON instance that should be valid according to the meta-schema with the URI in $schema keyword

This sounds exactly right, "$schema" refers to a (meta-)schema URI that describes the current schema. To be pedantic, a schema is an application/schema+json document (not just anything that looks like a schema will work, it has to be specifically used/declared as one somehow).

If we want to support $schema in subschemas we also have to support $schema in all data instances in sub-objects. I am not really suggesting that - it's absurd.

So if I understand what you're saying, the JSON meta-schema defines sub-schemas as being instances of itself. The JSON meta-schema does not provide any way for a different sub-schema to be used.

So far, this hasn't been a problem, because there's only two vocabularies: JSON Validation and JSON Hyper-Schema. A valid JSON Hyper-Schema is always a valid JSON Validation schema.

This is sort of a problem we have to tackle anyways, because a naive implementation would start by processing a schema-instance against the Hyper-schema meta-schema, see a property like "allOf", and descend into it processing against the regular validation schema, when we intend to parse it as a hyper-schema.

And strictly speaking, the normative behavior is specified by the I-D, not any meta-schema. The meta-schema is non-normative.

Am I getting you right? Does that make sense?

@epoberezkin
Copy link
Member Author

What I am saying is that the validation process where you change meta-schema on the fly is simply incorrect. During validation against meta-schema the schema should be treated simply as data, where no properties have any special meaning (that follows from the definition of the JSON instance of being valid according to the schema - nowhere in this definition it mentions the possibility to change the schema on the fly). So the validator should take the schema referenced in the $schema of the root and then validate the WHOLE instance according to this schema. Why would validator be switching meta-schema on the fly? During validation the schema is just JSON data.

So if there will be be any $schema keywords in any of subschemas, the only thing the validator is supposed to do is to validate that they are strings and valid URIs (according to the meta-schema), it is not supposed to switch the meta-schema. I think expressly prohibiting is better than ignoring.

@epoberezkin
Copy link
Member Author

This is sort of a problem we have to tackle anyways, because a naive implementation would start by processing a schema-instance against the Hyper-schema meta-schema, see a property like "allOf", and decend into it processing against the regular validation schema.

hyper-meta schema copy/pastes all structural keywords, so it would be processed correctly, it's just wasteful, both in the schema size and validation process - all structural keywords that have subschemas will be validated twice, firstly as the part of draft 4 validation, then as part of hyperschema.

@epoberezkin
Copy link
Member Author

So the validator should take the schema referenced in the $schema of the root and then validate the WHOLE instance according to this schema.

From this also follows that you cannot inline $refs, because they can point to the root schemas that should be validated against different meta-schema. $ref should be validated as a property against some partial URI format (isn't specified at the moment, but uri format cannot be used for $refs).

@awwright
Copy link
Member

awwright commented Oct 12, 2016

This is still all stemming from the fact JSON Schema defines special, non-structural behavior for what makes a valid schema.

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

We could very well just define a special (mostly) blank meta-schema, define it as "the set of all JSON meta-schemas". Anywhere a schema is expected, we refer to this. And so any time we see it, we know there can be a $ref, or a $schema, or maybe neither.

But this takes a lot of fun out of validating schemas against meta-schemas.

What we do know for sure: $ref is not (cannot be) a simple late-bound transclusion.

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 12, 2016

This is still all stemming from the fact JSON Schema defines special, non-structural behavior for what makes a valid schema.

I am trying to say it should not be special. There may be additional requirements, beyond "JSON-schema validation", but the process of "JSON-schema validating" schema against meta-schema should not be different from validating any data against any schema. And allowing $schema inside it makes it different, unless we say $schema should be ignored.

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

I see it as an omission - it should be there. Behaviour is not supposed to be defined by meta-schema, it applies to all keyword. Meta-schema determines validity.

We could very well just define a special (mostly) blank meta-schema, define it as "the set of all JSON meta-schemas". Anywhere a schema is expected, we refer to this. And so any time we see it, we know there can be a $ref, or a $schema, or maybe neither.

I don't understand it.

What we do know for sure: $ref is not (cannot be) a simple late-bound transclusion.

What does that mean? I am saying that it's not a transclusion at all. $ref is an instruction to use another schema for the validation of the current part of the data instance. No other definition covers all cases because of recursion and ref resolution being relative to source schema.

@awwright
Copy link
Member

awwright commented Oct 12, 2016

I was hoping draft-5 would be able to solve all of these problems. This is exactly the sort of problem I was hoping to clean up this iteration (which was either updating references, aligning vocabulary with existing documents, or gently adjusting behavior that nobody really cares about, this is an instance of the latter).

But I think I can justify putting it off because it's not the root of other major problems, and, again, it doesn't seem to be hurting anyone.


So to backtrack a little, just thinking aloud...

One of the major principles of design is that schemas have no context. But if that's the case, then why do we bother defining a "root schema" and "subschema". It seems there's two reasons.

Part of it is because of context, in the context of a particular schema, that particular schema is the "root schemas" and schemas found inside it are "subschemas". So if we've got a collection of schemas, and we pick one of them or one of their subschemas at random. The root schema might have a parent that it is found in, but the subschemas (if any) definitely do (by definition).

So when the spec says "a root schema SHOULD have an id and a $schema", that's so the id and schema isn't left potentially undefined at the mercy of the validator.

The other use is slightly different, a root schema is any schema that we're treating as an application/schema+json document (that is, a string of octets with that media type). So in this definition, a root schema is any schema where the outermost base URI (before adjusted by "id" if any) is set not by a schema, but by the application, for example set to the URL it was downloaded from.

How do we handle the case where $ref links into a sub-schema inside a document, instead of the top of a downloaded document?

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 12, 2016

schemas have no context

I think it refers to the independence of data. Base for ref resolution is a context.

The root schema might have a parent that it is found in

That's not the case -according to the spec - root schema is a JSON document. Document does not have parent

How do we handle the case where $ref links into a sub-schema inside a document, instead of the top of a downloaded document?

Can you clarify what is the problem here? Isn't it the common case?

@awwright
Copy link
Member

Base for ref resolution is a context.

We do need a better term for this. Because yes, the environment provides a base URI, but it does so whether there's a "parent" schema or not. A base URI it gets from the download URL and the base URI it gets from a parent "id" is indistinguishable.

Can you clarify what is the problem here?

I was wondering how a validator is supposed to know the base URI and current $schema version to be using when it descends into a schema. But I know the answer to that... I think

I was just busy going through the archives, $schema was introduced in draft-03. It seems to be going under the assumption it'll only be used once in a document. But all it really has to say on the matter is a modification added draft-04, that it MUST be used at the root. If an "only" was omitted there, I can't tell. (This seems like a reasonable thing to add for the second version, since now you have a need to distinguish the old from the new: If the keyword is missing, then it must be from draft-03 back when it was optional.)

In any event, I've always gone on the assumption you can use it anywhere. I think I saw an early example that showed how you are able to mix versions without breakage.

I know it's a great marketing push to be able to say "our schema can validate itself" and to say "use this schema to ensure your own schema is valid", but again, the document specifies what a valid schema is, not the meta-schema. This wouldn't be a problem... if we were able to align the two.

Here's some old issues about $ref missing from the meta-schema:

json-schema/json-schema#216
... I could have sworn I filed one myself

@handrews
Copy link
Contributor

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

This is a big part of why JSON Reference was proposed as a separate standard. That removed the need for integrating $ref in to the meta-schema. It worked independently from JSON Schema.

I am still not convinced that dropping JSON Reference as a separate standard is a good idea at all. I think bringing it into JSON Schema and producing this meta-schema conundrum is a real problem.

@handrews
Copy link
Contributor

I don't think we can have validation process of schema against meta-schema different from validation of data against the schema - it should be the same. So I also think that $ref should be the meta-schema - it should be treated as a special validation keyword, not as inlining of schema in place.

$ref is not validation, it is structural manipulation. There are three current structural keywords which involve referencing other schemas/documents or changing how such things are referenced. I have two proposals (nearly finished, but on hold during the big hyper-schema discussions) for re-use involving additionalProperties and annotation overriding will add three more:

  • $schema references the meta-schema inline.
  • id (really should be $id) changes how references are resolved
  • $ref (if we consider it a JSON Schema keyword rather than a separate spec
  • $use (this is the "override annotation keywords" proposal from the email thread)
  • $combine (also from email thread- sometime I've called it $include or $expand)
  • $combinable (or $includable or $expandable)

I don't want to get into $use/$combine/$combinable here, as I'll post them as soon as I can and we might not accept any of them anyway. I mention them to point out that there's a distinct class of keywords here that we can and should think about in different ways.

If we are specifying $ref as part of JSON Schema (still not sold on this) then it would need to be in the meta-schema, but that doesn't mean that we stop thinking about it as inlining (or otherwise pulling in) schemas.

When using a schema, I shouldn't care if $ref was involved or not- it needs to appear if it's all one schema structure, even if we can't look at the whole thing at once because of recursion.

@awwright
Copy link
Member

@handrews We've already established that we can't make $ref work the way JSON Reference describes it... The JSON parser has to be aware of when it's a reference, and when it's just a literal; simply using $ref sets a new base URI compared to if it were simply transcluded; and it impacts how $schema is handled.

@handrews
Copy link
Contributor

handrews commented Oct 12, 2016

@awwright I think we should consider if this is really an either-or. Having JSON Reference specified separately doesn't mean we need to keep the exact existing JSON Reference schema. For instance, escaping with a double-$ref.

Assuming $ref is available everywhere because it is separately defined, this would be a regular reference that pulls in a set of property names and schemas...

{
    "properties": {
        "$ref": "#/definitions/fooDef"
    }
}

...while this would be a literal property named "$ref" of type "number":

{
    "properties": {
        "$ref": {"$ref": {"type": "number"}}
    }
}

I can't imagine a situation where you would need a literal nested "$ref", but basically any number of nested $refs should just strip off the outermost $ref, which would allow for arbitrarily deeply nested literals. $ref-ing another $ref would not trigger this. That would just be double-indirection. Only a literal nested "$ref" would be 'escaped'.

The nesting for literal "$ref" would be part of the separate JSON Reference specification.

@epoberezkin
Copy link
Member Author

@awwright:

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.

@handrews:

$ref is not validation, it is structural manipulation.

Please see this comment, it applies here as well: #98 (comment)
I don't know how to address recursion and $ref resolution in subschemas (test case "ref within remote ref") and at the same time treat $ref as structural manipulation.
Please explain how it can be addressed if you know.

@handrews
Copy link
Contributor

I don't know how to address recursion and $ref resolution in subschemas (test case "ref within remote ref") and at the same time treat $ref as structural manipulation.
Please explain how it can be addressed if you know.

Can you talk about where in an algorithmic sense you find that this breaks down? I don't understand how these things are in conflict so if I could see exactly how this breaks for you perhaps I can understand the problem.

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 16, 2016

@handrews Please have a look at the comment I linked to above and at the test case it points to (in JSON-Schema-Test-Suite). If $ref means inclusion, then this test simply fails (because the reference would not resolve). This use case is very common.

Recursive references are used for any recursive data structures. When $refs are recursive, the inclusion simply becomes impossible.

@handrews
Copy link
Contributor

@awwright @epoberezkin what needs to be done to resolve this? The $ref discussion in #66 is resolved, did that have any impact here? As far as I can tell, there is nothing left to do here.

@awwright
Copy link
Member

I'm going to close this out for inactivity, also this seems to be covered by other active issues. If this can be rephrased to refer to master branch or the latest I-D then please file a new issue, thanks!

handrews added a commit to handrews/json-schema-spec that referenced this issue Aug 20, 2017
Addresses issue json-schema-org#85.  URI fragment-encoded JSON Pointers are
already handled by the "uri-reference" format, plus "pattern"
if the media type supports multiple fragment types that would
need to be disambiguated.

Guidance on using "uri-reference" plus "pattern" belongs on
the web site, so I did not add it to the spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants