Skip to content
fge edited this page Aug 17, 2012 · 83 revisions

Note

This page mentions the status of the latest version -- ie, O.5.x

IMPORTANT NOTE: the "end user" API is still a work in progress. Suggestions welcome.

What is supported

All section 5 of the draft is supported, apart from the limitations mentioned below. Supported features include:

  • union types (in type as well as in disallow),
  • full dependencies (ie, property dependencies as well as schema dependencies),
  • "multiple extends" (ie, an array of schemas),
  • tuple/non-tuple validation for arrays,
  • $ref with loop detection,
  • formats (but limited, see below),
  • enums,
  • etc etc.

All in all, quite a complete implementation.

Limitations

Currently no support at all for...

The default keyword is likely never to be supported since this API is about validation only. Similarly, description is happily ignored.

Strict JSON input required

Even though Jackson has the ability to parse many a malformed JSON document, this project chooses to ask Jackson to obey the specification to the letter (which is the default anyway -- use Jackson, I tell you).

This means, for instance, that:

  • no comments are allowed,
  • strings must be surrounded by double quotes and escaped correctly,
  • numeric instances must not be surrounded by double quotes (these are strings, not numbers),
  • etc etc

URI support and $ref

Only HTTP is supported natively as a protocol for absolute URIs. While the internal API is there to register processors for other schemes than "http", the public API is not there yet.

Other than that, the current implementation supports absolute and relative ref lookups, and can handle ref /loops and dead-end URIs.

color and style format specifications

These format specifications aim to validate respectively a CSS 2.1 color and style. It has been supported in the past (0.4.x) to a certain extent, but support for these has been dropped. Nobody uses them.

Limits on m{in,ax}Length and m{in,ax}Items

In a schema, these enforce resp. the minimum/maximum length of a string instance, and the minimum/maximum number of items of an array instance. The implementation won't accept any values for these which are greater than Integer.MAX_VALUE, that is... 2^31 - 1. You don't have JSON documents that big, do you? Well, OK, some modern NoSQL databases may have JSON data as large, if not even larger. (note: this will also turn out to be an "issue" with m{in,ax}Properties when draft v4 is out there -- but draft v4 has other problems, see below)

Numeric instances (integers and numbers) are another story, see below.

What the draft doesn't say explicitly, but which is implicit, and is implemented

(for some definition of "implicit")

Unknown keywords in schemas

Unknown keywords in schemas are purely and simply ignored. Beware of spelling mistakes!

properties and patternProperties

If a property of an object instance being validated matches exactly a field defined in properties, then this property will be validated against the corresponding schema, so far so good.

However, nothing says that this property should match only this schema. In fact, in this case, the implementation also goes through patternProperties to see if the property happens to match a regex in there too (and see below about regexes). If and only if the property matches neither of them is additionalProperties considered (provided that it is not false, of course).

As an example, consider this schema:

{
    "type": "object",
    "properties": {
        "p1": { "type": "string" }
    },
    "patternProperties": {
        "p": { "minLength": 10 },
        "1": { "format": "host-name" }
    }
}

Now, if the instance to validate contains a property named p1:

  • it will of course have to match the schema defined by the corresponding entry in properties;
  • but it also matches regexes p and 1 (again, see below), so it will have in fact to match all three schemas: the one defined in properties and the two schemas in patternProperties.

divisibleBy, exclusiveM{in,ax}imum and m{in,ax}imum

Curiously, the draft doesn't say that, for instance, if exclusiveMinimum is present, then minimum MUST also be present. Neither does it say that the number in divisibleBy must not be 0. However, if you have a look at the schema , you see this:

// divisibleBy definition:
"divisibleBy" : {
	"type" : "number",
	"minimum" : 0,
	"exclusiveMinimum" : true,
	"default" : 1
},
// dependencies:
"dependencies" : {
	"exclusiveMinimum" : "minimum",
	"exclusiveMaximum" : "maximum"
},

Which means what it means. Those are therefore enforced at the syntax checking level.

Discussions about some fine points of the draft

Numeric instance validation

This applies to integer and number JSON nodes, and therefore to the minimum, maximum and divisibleBy keywords. And especially to the latter.

What happens here is that the JSON spec itself doesn't specify a range limit for numeric instances, and neither does the JSON Schema draft. These three keywords therefore theoretically apply to arbitrarily large numbers and/or numbers with an arbitrarily large precision. Although Javascript limits itself to 64-bit IEEE 754 floating point numbers, and although JSON has Javascript in its acronym (recall: JSON means JavaScript Object Notation), it doesn't mean JSON is used only with JavaScript. Consider MongoDB, for example.

Therefore, the implementation chooses to use Java's BigDecimal for numeric instance validation, and falls back to long if both the schema keyword value and the instance value fit into this type. For decimal validation however, rounding has to be taken into account... And rounding means rounding errors, which means inaccuracies, which means wreaking havoc to the divisibleBy check in particular. I don't like inaccuracy, so, for decimal numbers, BigDecimal it is and it will likely remain so for the foreseeable future.

Regex support: ECMA 262, and the real definition of "matching"

The draft is quite clear that regexes should conform to ECMA 262. This rules out java.util.regex entirely (for instance, possessive quantifiers, like in a++, are legal in Java, but are not supported by ECMA 262). The only Java library (that I know of) in existence which is able to process ECMA 262 regexes is Rhino and its Javascript engine. This project uses it for that very reason (and, again, I don't like inaccuracy).

Also, even though the draft only implies it (and as the Javadoc points out in several places), please note that the definition of matching is the real one, not the "Java one": a regex can match anywhere in the input! So, remember this when writing your schemas -- if you want your regex to match the whole input, you must anchor it. This is valid for the pattern keyword, but also for keys in patternProperties. A JSON Schema implementation which doesn't act this way simply does not obey the draft!

Hostname and email validation

These are two of the format specifications defined by the draft (resp. host-name and email). While, stricto sensu, hostnames and emails MAY have no domain part at all, this implementation chooses to require that they have one. This is is contradiction with the relevant RFCs, but is more in line with user expectations.

utc-millisec validation

This format specification is said to be the number of milliseconds since epoch (that is, Jan 1st 1970 at 00:00 GMT). This is, in essence, a signed 32-bit integer times 1000. The implementation makes the choice to consider a numeric instance bound to this format specification invalid if:

  • it is negative, or
  • its result divided by 1000 is greater than 2^31 - 1.

This may, or may not be, a problem for you, YMMV. But if you actually plan to use such a formatted value in one of your programs, I think it is useful to enforce these.

JSON Pointer aims at describing a unique way to address specific "paths" (for lack of a better word) within a JSON document. 0.4.x supports an old version of the draft (in which the / was "URI-escaped" within reference tokens), but the master branch is switching to the latest draft, which only requires that ^ and / be escaped (by ^) within reference tokens. This makes JSON Pointers unambiguous, and this is a huge win for the specification.

Clone this wiki locally