-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: note that some (meta?)keywords can only be allowed in the top level of the root schema #85
Comments
I don't believe "$schema" is currently allowed only at the root level. draft-04 mandates it at the root level; in practice nobody seems to do this, so I reduced the language to merely SHOULD. A JSON Schema validator might not even know the difference between a root schema and a sub-schema, the only reason "$schema" is suggested for the root schema is so that every schema (root schema, sub-schema) is guaranteed to have a deterministic "$schema" and URI associated with it. And in fact, I provided an example a couple weeks ago where a schema can speak multiple vocabularies if so desired, by using "$schema" multiple times: #67 (comment) |
I see. At the same time the standard clearly defines root schema as a standalone JSON instance, so it can be given some special treatment... I think it could be a good way to simplify things by not allowing multiple vocabularies in a single JSON Schema instance. Same for IDs - seems a good middle ground between supporting current "id" at any level approach and obliterating it completely. The current resolution scope change is rarely supported properly. $schema change - does any validator support it? Ajv doesn't, it only checks the root value of the instance and ignores it everywhere else. But ok, we can leave this ambiguity in this version and hopefully resolve it later :) @awwright just give it some thought please. |
I would suggest replacing
with something like
https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L270 If we want to support $schema change within a single JSON schema instance, I would explicitly say it as well. Although few people would thank us I am afraid :)... |
Hmm, idk how that would impact "$ref". Right now you can generally replace a "$ref" with the schema itself (that's what JSON Reference was intended to be, a late-bound transparent inclusion of a JSON document into another document). The only time there would be a difference is if the base URI changes. Which isn't a problem if your root schemas always have an absolute-URI "id" like JSON Schema recommends. The two alternatives you list are saying the same thing, but perhaps in clearer or more terse language. The there's only one root schema in a file, and it only consists of a root level (the definition excludes subschemas). |
Ok about the language, you are right. I don't think it is right to see $ref for the validation purposes as the equivalent to the file inclusion because it is not true for recursive and mutually recursive $refs. You can't replace The idea that $ref is equivalent to inclusion is also not consistent with https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L297, although I think the wording in that paragraph needs clarification because it allows multiple interpretations (was about to create a separate issue about it). I don't think we have any choice but to acknowledge that $ref is essentially a validation keyword that instructs to perform validation of the current part of JSON instance against a referenced (sub)schema (that can be the current or another root schema or a subschema in the current or another root schema). The alternative is to not support recursion in schemas, but there are too many use cases for it (including meta-schema) for it to be a viable option. EDIT: And if we do acknowledge that in some cases $ref is not the same as inclusion we may as well say that it is never the same as inclusion and then we don't have to worry about top-level only keywords appearing anywhere else. |
Which is why JSON Reference is late bound (maybe it chooses a different term, but it acknowledges recursion is possible and you might not be able to fully expand every document right away)
That section could probably be worded better. But consider that the validator doesn't keep any state. So if I have two schemas { id: "alice", allOf: [ {$ref: "bob"} ] }
{ id: "bob", allOf: [ {$ref: "alice"} ] } and I validate an instance against one of them... the validator should return an error (or success?), but definitely not get thrown into an infinite loop. Or put another way, since the validation function is functional, if you're executing validate(instance, schema) multiple times in the course of a validation operation, you're doing something wrong because the function is always going to return the same result (it's getting passed the same input). |
Another option is to say that validators SHOULD (rather than MUST) detect such endless recursion in schema definitions and return validation result as undefined (depending on the language it can be exception, Nothing, null etc., rather than "pass" or "fail"). I honestly think it is better than either pass or fail, it would force schema writers to avoid this ambiguity. I actually think that even infinite loop is better than pass or fail :)... |
@epoberezkin By "return validation result as undefined" do you mean that the validator should error out? |
@epoberezkin I believe that's correct. The language I use is "raise an error condition" or "return indeterminate". |
That's right. |
@awwright an additional argument against allowing $schema anywhere but the top level is that a schema is a JSON instance that should be valid according to the meta-schema with the URI in $schema keyword. If we want to support $schema in subschemas we also have to support $schema in all data instances in sub-objects. I am not really suggesting that - it's absurd. |
I don't think we can have validation process of schema against meta-schema different from validation of data against the schema - it should be the same. So I also think that $ref should be the meta-schema - it should be treated as a special validation keyword, not as inlining of schema in place. |
Only while we continue to support resolution scope change, that there seems to be some desire to drop. Without resolution scope change $ref cannot ever be treated as schema inclusion in place. |
You may have to walk me through this argument a bit more.
This sounds exactly right, "$schema" refers to a (meta-)schema URI that describes the current schema. To be pedantic, a schema is an
So if I understand what you're saying, the JSON meta-schema defines sub-schemas as being instances of itself. The JSON meta-schema does not provide any way for a different sub-schema to be used. So far, this hasn't been a problem, because there's only two vocabularies: JSON Validation and JSON Hyper-Schema. A valid JSON Hyper-Schema is always a valid JSON Validation schema. This is sort of a problem we have to tackle anyways, because a naive implementation would start by processing a schema-instance against the Hyper-schema meta-schema, see a property like "allOf", and descend into it processing against the regular validation schema, when we intend to parse it as a hyper-schema. And strictly speaking, the normative behavior is specified by the I-D, not any meta-schema. The meta-schema is non-normative. Am I getting you right? Does that make sense? |
What I am saying is that the validation process where you change meta-schema on the fly is simply incorrect. During validation against meta-schema the schema should be treated simply as data, where no properties have any special meaning (that follows from the definition of the JSON instance of being valid according to the schema - nowhere in this definition it mentions the possibility to change the schema on the fly). So the validator should take the schema referenced in the $schema of the root and then validate the WHOLE instance according to this schema. Why would validator be switching meta-schema on the fly? During validation the schema is just JSON data. So if there will be be any $schema keywords in any of subschemas, the only thing the validator is supposed to do is to validate that they are strings and valid URIs (according to the meta-schema), it is not supposed to switch the meta-schema. I think expressly prohibiting is better than ignoring. |
hyper-meta schema copy/pastes all structural keywords, so it would be processed correctly, it's just wasteful, both in the schema size and validation process - all structural keywords that have subschemas will be validated twice, firstly as the part of draft 4 validation, then as part of hyperschema. |
From this also follows that you cannot inline $refs, because they can point to the root schemas that should be validated against different meta-schema. $ref should be validated as a property against some partial URI format (isn't specified at the moment, but uri format cannot be used for $refs). |
This is still all stemming from the fact JSON Schema defines special, non-structural behavior for what makes a valid schema. Because also note that $ref doesn't actually exist anywhere in the meta-schema. It's behavior that can be defined only in prose. We could very well just define a special (mostly) blank meta-schema, define it as "the set of all JSON meta-schemas". Anywhere a schema is expected, we refer to this. And so any time we see it, we know there can be a $ref, or a $schema, or maybe neither. But this takes a lot of fun out of validating schemas against meta-schemas. What we do know for sure: $ref is not (cannot be) a simple late-bound transclusion. |
I am trying to say it should not be special. There may be additional requirements, beyond "JSON-schema validation", but the process of "JSON-schema validating" schema against meta-schema should not be different from validating any data against any schema. And allowing $schema inside it makes it different, unless we say $schema should be ignored.
I see it as an omission - it should be there. Behaviour is not supposed to be defined by meta-schema, it applies to all keyword. Meta-schema determines validity.
I don't understand it.
What does that mean? I am saying that it's not a transclusion at all. $ref is an instruction to use another schema for the validation of the current part of the data instance. No other definition covers all cases because of recursion and ref resolution being relative to source schema. |
I was hoping draft-5 would be able to solve all of these problems. This is exactly the sort of problem I was hoping to clean up this iteration (which was either updating references, aligning vocabulary with existing documents, or gently adjusting behavior that nobody really cares about, this is an instance of the latter). But I think I can justify putting it off because it's not the root of other major problems, and, again, it doesn't seem to be hurting anyone. So to backtrack a little, just thinking aloud... One of the major principles of design is that schemas have no context. But if that's the case, then why do we bother defining a "root schema" and "subschema". It seems there's two reasons. Part of it is because of context, in the context of a particular schema, that particular schema is the "root schemas" and schemas found inside it are "subschemas". So if we've got a collection of schemas, and we pick one of them or one of their subschemas at random. The root schema might have a parent that it is found in, but the subschemas (if any) definitely do (by definition). So when the spec says "a root schema SHOULD have an id and a $schema", that's so the id and schema isn't left potentially undefined at the mercy of the validator. The other use is slightly different, a root schema is any schema that we're treating as an application/schema+json document (that is, a string of octets with that media type). So in this definition, a root schema is any schema where the outermost base URI (before adjusted by "id" if any) is set not by a schema, but by the application, for example set to the URL it was downloaded from. How do we handle the case where $ref links into a sub-schema inside a document, instead of the top of a downloaded document? |
I think it refers to the independence of data. Base for ref resolution is a context.
That's not the case -according to the spec - root schema is a JSON document. Document does not have parent
Can you clarify what is the problem here? Isn't it the common case? |
We do need a better term for this. Because yes, the environment provides a base URI, but it does so whether there's a "parent" schema or not. A base URI it gets from the download URL and the base URI it gets from a parent "id" is indistinguishable.
I was wondering how a validator is supposed to know the base URI and current $schema version to be using when it descends into a schema. But I know the answer to that... I think I was just busy going through the archives, $schema was introduced in draft-03. It seems to be going under the assumption it'll only be used once in a document. But all it really has to say on the matter is a modification added draft-04, that it MUST be used at the root. If an "only" was omitted there, I can't tell. (This seems like a reasonable thing to add for the second version, since now you have a need to distinguish the old from the new: If the keyword is missing, then it must be from draft-03 back when it was optional.) In any event, I've always gone on the assumption you can use it anywhere. I think I saw an early example that showed how you are able to mix versions without breakage. I know it's a great marketing push to be able to say "our schema can validate itself" and to say "use this schema to ensure your own schema is valid", but again, the document specifies what a valid schema is, not the meta-schema. This wouldn't be a problem... if we were able to align the two. Here's some old issues about $ref missing from the meta-schema: json-schema/json-schema#216 |
This is a big part of why JSON Reference was proposed as a separate standard. That removed the need for integrating $ref in to the meta-schema. It worked independently from JSON Schema. I am still not convinced that dropping JSON Reference as a separate standard is a good idea at all. I think bringing it into JSON Schema and producing this meta-schema conundrum is a real problem. |
$ref is not validation, it is structural manipulation. There are three current structural keywords which involve referencing other schemas/documents or changing how such things are referenced. I have two proposals (nearly finished, but on hold during the big hyper-schema discussions) for re-use involving additionalProperties and annotation overriding will add three more:
I don't want to get into If we are specifying When using a schema, I shouldn't care if $ref was involved or not- it needs to appear if it's all one schema structure, even if we can't look at the whole thing at once because of recursion. |
@handrews We've already established that we can't make $ref work the way JSON Reference describes it... The JSON parser has to be aware of when it's a reference, and when it's just a literal; simply using $ref sets a new base URI compared to if it were simply transcluded; and it impacts how $schema is handled. |
@awwright I think we should consider if this is really an either-or. Having JSON Reference specified separately doesn't mean we need to keep the exact existing JSON Reference schema. For instance, escaping with a double-$ref. Assuming {
"properties": {
"$ref": "#/definitions/fooDef"
}
} ...while this would be a literal property named "$ref" of type "number": {
"properties": {
"$ref": {"$ref": {"type": "number"}}
}
} I can't imagine a situation where you would need a literal nested "$ref", but basically any number of nested $refs should just strip off the outermost $ref, which would allow for arbitrarily deeply nested literals. $ref-ing another $ref would not trigger this. That would just be double-indirection. Only a literal nested "$ref" would be 'escaped'. The nesting for literal "$ref" would be part of the separate JSON Reference specification. |
Please see this comment, it applies here as well: #98 (comment) |
Can you talk about where in an algorithmic sense you find that this breaks down? I don't understand how these things are in conflict so if I could see exactly how this breaks for you perhaps I can understand the problem. |
@handrews Please have a look at the comment I linked to above and at the test case it points to (in JSON-Schema-Test-Suite). If $ref means inclusion, then this test simply fails (because the reference would not resolve). This use case is very common. Recursive references are used for any recursive data structures. When $refs are recursive, the inclusion simply becomes impossible. |
@awwright @epoberezkin what needs to be done to resolve this? The |
I'm going to close this out for inactivity, also this seems to be covered by other active issues. If this can be rephrased to refer to master branch or the latest I-D then please file a new issue, thanks! |
Addresses issue json-schema-org#85. URI fragment-encoded JSON Pointers are already handled by the "uri-reference" format, plus "pattern" if the media type supports multiple fragment types that would need to be disambiguated. Guidance on using "uri-reference" plus "pattern" belongs on the web site, so I did not add it to the spec.
https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L161
At the moment it is only $schema, but given the discussion around id it may change ...
The text was updated successfully, but these errors were encountered: