spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

epoberezkin · 2016-10-11T17:18:29Z

https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L161

At the moment it is only $schema, but given the discussion around id it may change ...

awwright · 2016-10-11T17:28:58Z

I don't believe "$schema" is currently allowed only at the root level.

draft-04 mandates it at the root level; in practice nobody seems to do this, so I reduced the language to merely SHOULD.

A JSON Schema validator might not even know the difference between a root schema and a sub-schema, the only reason "$schema" is suggested for the root schema is so that every schema (root schema, sub-schema) is guaranteed to have a deterministic "$schema" and URI associated with it.

And in fact, I provided an example a couple weeks ago where a schema can speak multiple vocabularies if so desired, by using "$schema" multiple times: #67 (comment)

epoberezkin · 2016-10-11T17:40:45Z

I see.

At the same time the standard clearly defines root schema as a standalone JSON instance, so it can be given some special treatment...

I think it could be a good way to simplify things by not allowing multiple vocabularies in a single JSON Schema instance. Same for IDs - seems a good middle ground between supporting current "id" at any level approach and obliterating it completely. The current resolution scope change is rarely supported properly. $schema change - does any validator support it? Ajv doesn't, it only checks the root value of the instance and ignores it everywhere else.

But ok, we can leave this ambiguity in this version and hopefully resolve it later :)

@awwright just give it some thought please.

epoberezkin · 2016-10-11T17:47:39Z

I would suggest replacing

The root schema of a JSON Schema document SHOULD use this keyword.

with something like

The root schema of a JSON Schema document SHOULD use this keyword at the top level.

https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L270

If we want to support $schema change within a single JSON schema instance, I would explicitly say it as well. Although few people would thank us I am afraid :)...

awwright · 2016-10-11T17:51:37Z

Hmm, idk how that would impact "$ref".

Right now you can generally replace a "$ref" with the schema itself (that's what JSON Reference was intended to be, a late-bound transparent inclusion of a JSON document into another document).

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.

The two alternatives you list are saying the same thing, but perhaps in clearer or more terse language.

The there's only one root schema in a file, and it only consists of a root level (the definition excludes subschemas).

epoberezkin · 2016-10-11T18:01:24Z

Ok about the language, you are right.

I don't think it is right to see $ref for the validation purposes as the equivalent to the file inclusion because it is not true for recursive and mutually recursive $refs. You can't replace {ref: '#'} with the schema. You also cannot replace if a refers to b and vice versa, which is absolutely needed for trees, graphs, etc.

The idea that $ref is equivalent to inclusion is also not consistent with https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L297, although I think the wording in that paragraph needs clarification because it allows multiple interpretations (was about to create a separate issue about it).

I don't think we have any choice but to acknowledge that $ref is essentially a validation keyword that instructs to perform validation of the current part of JSON instance against a referenced (sub)schema (that can be the current or another root schema or a subschema in the current or another root schema). The alternative is to not support recursion in schemas, but there are too many use cases for it (including meta-schema) for it to be a viable option.

EDIT: And if we do acknowledge that in some cases $ref is not the same as inclusion we may as well say that it is never the same as inclusion and then we don't have to worry about top-level only keywords appearing anywhere else.

awwright · 2016-10-11T18:10:56Z

because it is not true for recursive and mutually recursive $refs

Which is why JSON Reference is late bound (maybe it chooses a different term, but it acknowledges recursion is possible and you might not be able to fully expand every document right away)

he idea that $ref is equivalent to inclusion is also not consistent with

That section could probably be worded better. But consider that the validator doesn't keep any state. So if I have two schemas

{ id: "alice", allOf: [ {$ref: "bob"} ] }
{ id: "bob", allOf: [ {$ref: "alice"} ] }

and I validate an instance against one of them... the validator should return an error (or success?), but definitely not get thrown into an infinite loop.

Or put another way, since the validation function is functional, if you're executing validate(instance, schema) multiple times in the course of a validation operation, you're doing something wrong because the function is always going to return the same result (it's getting passed the same input).

epoberezkin · 2016-10-11T18:16:01Z

Another option is to say that validators SHOULD (rather than MUST) detect such endless recursion in schema definitions and return validation result as undefined (depending on the language it can be exception, Nothing, null etc., rather than "pass" or "fail").

I honestly think it is better than either pass or fail, it would force schema writers to avoid this ambiguity. I actually think that even infinite loop is better than pass or fail :)...

handrews · 2016-10-11T18:44:41Z

@epoberezkin By "return validation result as undefined" do you mean that the validator should error out?

awwright · 2016-10-11T19:22:22Z

@epoberezkin I believe that's correct.

The language I use is "raise an error condition" or "return indeterminate".

epoberezkin · 2016-10-11T19:43:21Z

That's right.

epoberezkin · 2016-10-12T09:10:45Z

@awwright an additional argument against allowing $schema anywhere but the top level is that a schema is a JSON instance that should be valid according to the meta-schema with the URI in $schema keyword. If we want to support $schema in subschemas we also have to support $schema in all data instances in sub-objects. I am not really suggesting that - it's absurd.

epoberezkin · 2016-10-12T09:46:11Z

I don't think we can have validation process of schema against meta-schema different from validation of data against the schema - it should be the same. So I also think that $ref should be the meta-schema - it should be treated as a special validation keyword, not as inlining of schema in place.

epoberezkin · 2016-10-12T09:48:24Z

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.

Only while we continue to support resolution scope change, that there seems to be some desire to drop. Without resolution scope change $ref cannot ever be treated as schema inclusion in place.

awwright · 2016-10-12T10:23:48Z

You may have to walk me through this argument a bit more.

a schema is a JSON instance that should be valid according to the meta-schema with the URI in $schema keyword

This sounds exactly right, "$schema" refers to a (meta-)schema URI that describes the current schema. To be pedantic, a schema is an application/schema+json document (not just anything that looks like a schema will work, it has to be specifically used/declared as one somehow).

If we want to support $schema in subschemas we also have to support $schema in all data instances in sub-objects. I am not really suggesting that - it's absurd.

So if I understand what you're saying, the JSON meta-schema defines sub-schemas as being instances of itself. The JSON meta-schema does not provide any way for a different sub-schema to be used.

So far, this hasn't been a problem, because there's only two vocabularies: JSON Validation and JSON Hyper-Schema. A valid JSON Hyper-Schema is always a valid JSON Validation schema.

This is sort of a problem we have to tackle anyways, because a naive implementation would start by processing a schema-instance against the Hyper-schema meta-schema, see a property like "allOf", and descend into it processing against the regular validation schema, when we intend to parse it as a hyper-schema.

And strictly speaking, the normative behavior is specified by the I-D, not any meta-schema. The meta-schema is non-normative.

Am I getting you right? Does that make sense?

epoberezkin · 2016-10-12T10:32:03Z

What I am saying is that the validation process where you change meta-schema on the fly is simply incorrect. During validation against meta-schema the schema should be treated simply as data, where no properties have any special meaning (that follows from the definition of the JSON instance of being valid according to the schema - nowhere in this definition it mentions the possibility to change the schema on the fly). So the validator should take the schema referenced in the $schema of the root and then validate the WHOLE instance according to this schema. Why would validator be switching meta-schema on the fly? During validation the schema is just JSON data.

So if there will be be any $schema keywords in any of subschemas, the only thing the validator is supposed to do is to validate that they are strings and valid URIs (according to the meta-schema), it is not supposed to switch the meta-schema. I think expressly prohibiting is better than ignoring.

epoberezkin · 2016-10-12T10:39:00Z

This is sort of a problem we have to tackle anyways, because a naive implementation would start by processing a schema-instance against the Hyper-schema meta-schema, see a property like "allOf", and decend into it processing against the regular validation schema.

hyper-meta schema copy/pastes all structural keywords, so it would be processed correctly, it's just wasteful, both in the schema size and validation process - all structural keywords that have subschemas will be validated twice, firstly as the part of draft 4 validation, then as part of hyperschema.

epoberezkin · 2016-10-12T10:43:58Z

So the validator should take the schema referenced in the $schema of the root and then validate the WHOLE instance according to this schema.

From this also follows that you cannot inline $refs, because they can point to the root schemas that should be validated against different meta-schema. $ref should be validated as a property against some partial URI format (isn't specified at the moment, but uri format cannot be used for $refs).

awwright · 2016-10-12T11:05:22Z

This is still all stemming from the fact JSON Schema defines special, non-structural behavior for what makes a valid schema.

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

We could very well just define a special (mostly) blank meta-schema, define it as "the set of all JSON meta-schemas". Anywhere a schema is expected, we refer to this. And so any time we see it, we know there can be a $ref, or a $schema, or maybe neither.

But this takes a lot of fun out of validating schemas against meta-schemas.

What we do know for sure: $ref is not (cannot be) a simple late-bound transclusion.

epoberezkin · 2016-10-12T13:11:14Z

This is still all stemming from the fact JSON Schema defines special, non-structural behavior for what makes a valid schema.

I am trying to say it should not be special. There may be additional requirements, beyond "JSON-schema validation", but the process of "JSON-schema validating" schema against meta-schema should not be different from validating any data against any schema. And allowing $schema inside it makes it different, unless we say $schema should be ignored.

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

I see it as an omission - it should be there. Behaviour is not supposed to be defined by meta-schema, it applies to all keyword. Meta-schema determines validity.

We could very well just define a special (mostly) blank meta-schema, define it as "the set of all JSON meta-schemas". Anywhere a schema is expected, we refer to this. And so any time we see it, we know there can be a $ref, or a $schema, or maybe neither.

I don't understand it.

What we do know for sure: $ref is not (cannot be) a simple late-bound transclusion.

What does that mean? I am saying that it's not a transclusion at all. $ref is an instruction to use another schema for the validation of the current part of the data instance. No other definition covers all cases because of recursion and ref resolution being relative to source schema.

awwright · 2016-10-12T15:27:49Z

I was hoping draft-5 would be able to solve all of these problems. This is exactly the sort of problem I was hoping to clean up this iteration (which was either updating references, aligning vocabulary with existing documents, or gently adjusting behavior that nobody really cares about, this is an instance of the latter).

But I think I can justify putting it off because it's not the root of other major problems, and, again, it doesn't seem to be hurting anyone.

So to backtrack a little, just thinking aloud...

One of the major principles of design is that schemas have no context. But if that's the case, then why do we bother defining a "root schema" and "subschema". It seems there's two reasons.

Part of it is because of context, in the context of a particular schema, that particular schema is the "root schemas" and schemas found inside it are "subschemas". So if we've got a collection of schemas, and we pick one of them or one of their subschemas at random. The root schema might have a parent that it is found in, but the subschemas (if any) definitely do (by definition).

So when the spec says "a root schema SHOULD have an id and a $schema", that's so the id and schema isn't left potentially undefined at the mercy of the validator.

The other use is slightly different, a root schema is any schema that we're treating as an application/schema+json document (that is, a string of octets with that media type). So in this definition, a root schema is any schema where the outermost base URI (before adjusted by "id" if any) is set not by a schema, but by the application, for example set to the URL it was downloaded from.

How do we handle the case where $ref links into a sub-schema inside a document, instead of the top of a downloaded document?

epoberezkin · 2016-10-12T15:36:34Z

schemas have no context

I think it refers to the independence of data. Base for ref resolution is a context.

The root schema might have a parent that it is found in

That's not the case -according to the spec - root schema is a JSON document. Document does not have parent

How do we handle the case where $ref links into a sub-schema inside a document, instead of the top of a downloaded document?

Can you clarify what is the problem here? Isn't it the common case?

awwright · 2016-10-12T16:10:28Z

Base for ref resolution is a context.

We do need a better term for this. Because yes, the environment provides a base URI, but it does so whether there's a "parent" schema or not. A base URI it gets from the download URL and the base URI it gets from a parent "id" is indistinguishable.

Can you clarify what is the problem here?

I was wondering how a validator is supposed to know the base URI and current $schema version to be using when it descends into a schema. But I know the answer to that... I think

I was just busy going through the archives, $schema was introduced in draft-03. It seems to be going under the assumption it'll only be used once in a document. But all it really has to say on the matter is a modification added draft-04, that it MUST be used at the root. If an "only" was omitted there, I can't tell. (This seems like a reasonable thing to add for the second version, since now you have a need to distinguish the old from the new: If the keyword is missing, then it must be from draft-03 back when it was optional.)

In any event, I've always gone on the assumption you can use it anywhere. I think I saw an early example that showed how you are able to mix versions without breakage.

I know it's a great marketing push to be able to say "our schema can validate itself" and to say "use this schema to ensure your own schema is valid", but again, the document specifies what a valid schema is, not the meta-schema. This wouldn't be a problem... if we were able to align the two.

Here's some old issues about $ref missing from the meta-schema:

json-schema/json-schema#216
... I could have sworn I filed one myself

handrews · 2016-10-12T16:59:14Z

Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose.

This is a big part of why JSON Reference was proposed as a separate standard. That removed the need for integrating $ref in to the meta-schema. It worked independently from JSON Schema.

I am still not convinced that dropping JSON Reference as a separate standard is a good idea at all. I think bringing it into JSON Schema and producing this meta-schema conundrum is a real problem.

handrews · 2016-10-12T17:13:31Z

I don't think we can have validation process of schema against meta-schema different from validation of data against the schema - it should be the same. So I also think that $ref should be the meta-schema - it should be treated as a special validation keyword, not as inlining of schema in place.

$ref is not validation, it is structural manipulation. There are three current structural keywords which involve referencing other schemas/documents or changing how such things are referenced. I have two proposals (nearly finished, but on hold during the big hyper-schema discussions) for re-use involving additionalProperties and annotation overriding will add three more:

$schema references the meta-schema inline.
id (really should be $id) changes how references are resolved
$ref (if we consider it a JSON Schema keyword rather than a separate spec
$use (this is the "override annotation keywords" proposal from the email thread)
$combine (also from email thread- sometime I've called it $include or $expand)
$combinable (or $includable or $expandable)

I don't want to get into $use/$combine/$combinable here, as I'll post them as soon as I can and we might not accept any of them anyway. I mention them to point out that there's a distinct class of keywords here that we can and should think about in different ways.

If we are specifying $ref as part of JSON Schema (still not sold on this) then it would need to be in the meta-schema, but that doesn't mean that we stop thinking about it as inlining (or otherwise pulling in) schemas.

When using a schema, I shouldn't care if $ref was involved or not- it needs to appear if it's all one schema structure, even if we can't look at the whole thing at once because of recursion.

awwright · 2016-10-12T17:33:18Z

@handrews We've already established that we can't make $ref work the way JSON Reference describes it... The JSON parser has to be aware of when it's a reference, and when it's just a literal; simply using $ref sets a new base URI compared to if it were simply transcluded; and it impacts how $schema is handled.

handrews · 2016-10-12T17:53:57Z

@awwright I think we should consider if this is really an either-or. Having JSON Reference specified separately doesn't mean we need to keep the exact existing JSON Reference schema. For instance, escaping with a double-$ref.

Assuming $ref is available everywhere because it is separately defined, this would be a regular reference that pulls in a set of property names and schemas...

{
    "properties": {
        "$ref": "#/definitions/fooDef"
    }
}

...while this would be a literal property named "$ref" of type "number":

{
    "properties": {
        "$ref": {"$ref": {"type": "number"}}
    }
}

I can't imagine a situation where you would need a literal nested "$ref", but basically any number of nested $refs should just strip off the outermost $ref, which would allow for arbitrarily deeply nested literals. $ref-ing another $ref would not trigger this. That would just be double-indirection. Only a literal nested "$ref" would be 'escaped'.

The nesting for literal "$ref" would be part of the separate JSON Reference specification.

epoberezkin · 2016-10-16T18:42:16Z

@awwright:

The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends.

@handrews:

$ref is not validation, it is structural manipulation.

Please see this comment, it applies here as well: #98 (comment)
I don't know how to address recursion and $ref resolution in subschemas (test case "ref within remote ref") and at the same time treat $ref as structural manipulation.
Please explain how it can be addressed if you know.

handrews · 2016-10-16T18:48:16Z

I don't know how to address recursion and $ref resolution in subschemas (test case "ref within remote ref") and at the same time treat $ref as structural manipulation.
Please explain how it can be addressed if you know.

Can you talk about where in an algorithmic sense you find that this breaks down? I don't understand how these things are in conflict so if I could see exactly how this breaks for you perhaps I can understand the problem.

epoberezkin · 2016-10-16T18:51:43Z

@handrews Please have a look at the comment I linked to above and at the test case it points to (in JSON-Schema-Test-Suite). If $ref means inclusion, then this test simply fails (because the reference would not resolve). This use case is very common.

Recursive references are used for any recursive data structures. When $refs are recursive, the inclusion simply becomes impossible.

handrews · 2016-10-31T18:29:22Z

@awwright @epoberezkin what needs to be done to resolve this? The $ref discussion in #66 is resolved, did that have any impact here? As far as I can tell, there is nothing left to do here.

awwright · 2016-11-30T13:36:59Z

I'm going to close this out for inactivity, also this seems to be covered by other active issues. If this can be rephrased to refer to master branch or the latest I-D then please file a new issue, thanks!

Addresses issue json-schema-org#85. URI fragment-encoded JSON Pointers are already handled by the "uri-reference" format, plus "pattern" if the media type supports multiple fragment types that would need to be disambiguated. Guidance on using "uri-reference" plus "pattern" belongs on the web site, so I did not add it to the spec.

epoberezkin changed the title ~~spec: note that some meta-keywords can only be allowed in the top level of the root schema~~ spec: note that some (meta?)keywords can only be allowed in the top level of the root schema Oct 11, 2016

awwright added question labels Oct 11, 2016

awwright added this to the draft-6 milestone Oct 11, 2016

epoberezkin mentioned this issue Oct 11, 2016

spec: remove/change extending meta-schemas recommendation #86

Closed

awwright removed the Feedback period label Oct 13, 2016

epoberezkin mentioned this issue Oct 16, 2016

Determine behavior of $ref #66

Closed

awwright closed this as completed Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

epoberezkin commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016 •

edited

Loading

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016 •

edited

Loading

handrews commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

awwright commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

awwright commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 12, 2016 •

edited

Loading

awwright commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 12, 2016 •

edited

Loading

awwright commented Oct 12, 2016

handrews commented Oct 12, 2016

handrews commented Oct 12, 2016

awwright commented Oct 12, 2016

handrews commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 16, 2016

handrews commented Oct 16, 2016

epoberezkin commented Oct 16, 2016 •

edited

Loading

handrews commented Oct 31, 2016

awwright commented Nov 30, 2016

spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85

Comments

epoberezkin commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016 • edited Loading

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016 • edited Loading

handrews commented Oct 11, 2016

awwright commented Oct 11, 2016

epoberezkin commented Oct 11, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

awwright commented Oct 12, 2016 • edited Loading

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

epoberezkin commented Oct 12, 2016

awwright commented Oct 12, 2016 • edited Loading

epoberezkin commented Oct 12, 2016 • edited Loading

awwright commented Oct 12, 2016 • edited Loading

epoberezkin commented Oct 12, 2016 • edited Loading

awwright commented Oct 12, 2016

handrews commented Oct 12, 2016

handrews commented Oct 12, 2016

awwright commented Oct 12, 2016

handrews commented Oct 12, 2016 • edited Loading

epoberezkin commented Oct 16, 2016

handrews commented Oct 16, 2016

epoberezkin commented Oct 16, 2016 • edited Loading

handrews commented Oct 31, 2016

awwright commented Nov 30, 2016

epoberezkin commented Oct 11, 2016 •

edited

Loading

epoberezkin commented Oct 11, 2016 •

edited

Loading

awwright commented Oct 12, 2016 •

edited

Loading

awwright commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 12, 2016 •

edited

Loading

awwright commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 12, 2016 •

edited

Loading

handrews commented Oct 12, 2016 •

edited

Loading

epoberezkin commented Oct 16, 2016 •

edited

Loading