-
-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] JTD Schema Object #1446
[RFC] JTD Schema Object #1446
Conversation
This is great - I was thinking to add it, but I didn't have an idea how to make discriminator work. From briefly looking at PR - I am definitely not on this level of TS typefoo - some things there go over my head - need to read slowly :) Some comments on the raised questions:
On a similar question, when I was writing a meta-schema for JTD schemas I've made it to explicitly prevent definitions anywhere but in the root, but given that JTD itself is not designed for such purpose, it doubled meta-schema size. I avoided the repetition on the code level, but on the data level the schema is still twice bigger - potentially meaning some memory/performance penalties (although not double time) for what could have been effectively handled much simpler - by creating On the question here, I am absolutely ok with not enforcing this restriction on type level. Typescript does flag unknown properties, and we should not allow unknown properties (if possible - need to read the code attentively) but trying to prevent definitions from appearing anywhere other than in the root would also make this type heavier (not sure how much heavier though - you may know how to do it in a simple way). So happy to leave as is and happy to add this restriction too if it doesn't complicate things too much. I think it goes up to the higher level question - what is this type purpose? The options are:
same, i + ii makes sense
assuming you mean checking if the schema is valid according to JTD spec but not necessarily consistent with instance type. From goal 2 above, in isolation it seems of low value - although we could of course define a generic type for any schema as well - as
assuming you mean the type of the instance - then yes.
It should allow Date in place of timestamp type - see I would not worry about handling extensions inside metadata field on the type level. Ajv only allowed such extensions (and only inside metadata) for two reasons:
In general, I think users should really avoid using such extensions in the new schema, based on the differences between design objectives of JTD and JSON Schema. I have spent some time objecting to user defined keywords JSON Schema, but given that JSON Schema approaches the level of complexity of general purpose programming language (and you can actually use ajv with JSON Schema to interpret another JSON based general purpose language with some extension keywords - see my abandoned JSON Script), having user define keywords in JSON Schema fits this model, and it can be helpful for traversing and folding the trees with some arbitrary logic on the way. JTD on another hand is fulfilling a much smaller, but, arguably, as important objective: define a JSON based specification for defining schemas of cross-platform JSON messages, specifically in a language-heterogenous environment, in such way that a simple schema can be used to generate types for multiple languages - hence the name (which I proudly contributed to:). It is much more restrictive by design, either because certain data structures/types are not present in some important languages and using them would undermine its cross-platform-ness, or because these data structures are difficult to extend reliably, specifically:
So in a way, JTD guides your JSON message / API modelling in such a way that you would not have challenges adding services in any language later. So I'd strongly discourage using any extensions and instead move any value/semantic validation logic to the application code.
Yep, I had the same issue in JSONSchemaType - didn't realise it is solvable with restrictions. It is actually not such a bad idea - maybe it can be a separate version of this type (or extra parameter) or we could just make this call that this type in opinionated in this way - some people enforce sorting of everything via linting and I tend to gravitate towards sorting things - it makes them easier to find in large lists and avoids large diffs at a point when the unsorted list grows out of control and you eventually sort it... Maybe it is better to start with sorted. Up to you, maybe deserves a separate PR, but I am on board with enforcing sorting of enums in schemas - unless it is super complex. I would do it in JSONSchemaType too then (in v8). Anyway, sorry for the brain dump, what you've done is great - I'll read the code carefully - it'll take some effort :) - and comment. All your tradeoffs make sense to me based on the above. |
lib/types/jtd-schema.ts
Outdated
type AllExtend<T, E> = false extends AllExtend_<T, E> ? false : true | ||
|
||
/** type is true if type is identically string */ | ||
type IsString_<T> = T extends string ? (string extends T ? true : false) : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why the second line is necessary here? The first line seems to say that for T to be identically string, T should extend string and string should extend T - which I thought would be enough, but it is probably not correct given that the second line is needed... And I think for majority of T the first line is enough, but there are some Ts when it is not - what are they?
As a side comment - Typescript is weird really if you just can have type level equality without these trick :)
AllExtend I understand even less, but it looks similar - what is the example there?
Maybe it is worth defining a utility type TypeEquality<T1, T2>
that would only evaluate to true only if T1 is identical to T2 - although I am not sure if what happens here with string can be generalised - maybe not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer ties to a lot of the choices I've made, and it's a design decision, one I'm open to having. In this case it has to do with how typescript handles X extends Y ? ... : ...
when X is a union. In that case typescript distributes the conditional across every element of the union (see distributive conditional types). In this case, the decision that I've made is that that the JDTSchema<number | string>
shouldn't create a valid schema. However, if you write T extends string ? string extends T ? MyType<T> : never : never
of string | number
because of distributive conditional types, that will result in string | never
which is just string
. Thus, this solution uses this trick to assert that the type isn't a union with string as a member, but is specifically only the type string.
Yes, it is a little weird, and mostly has to do with the way they wrote the conditional type system. I think the general principle is that typescript is meant to mimic javascript, so you care more about extending (e.g. having the relevant properties) than you care about strict matching, e.g. only having the specific properties.
All extend is similar but instead is saying, every element of the union extends the type. This way AllExtend<1 | 2 | 3, number>
is still true, and AllExtend<true, bool>
is still true. This is "necessary" because JTD only defines enums for strings, not for numbers or bools, so we need to extend them. Arguably, AllExtend
is an interesting design decision. I could see the argument that JDTSchemaType
should reject that typing because JTD can't actually verify that type.
TypeEquality should work in place of IsString, so I can change that. Depending on the answer / discussion about the above question, they might all become type equality.
Finally, for simple types like this, the typescript playground is a good place to test out how things are resolving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot - got it - need to get into habit of using the playground indeed.
I could see the argument that JDTSchemaType should reject that typing because JTD can't actually verify that type.
Interesting. I think the current trade-off is better - to align JTD Schema with the existing type, rather than to restrict type to JTD schema capabilities (there is type code generation approach for the latter). We could add JTDSchemaStrict that would be stricter in that regard
spec/types/jtd-schema.spec.ts
Outdated
describe("JTDSchema typechecks", () => { | ||
it("should typecheck number schemas", () => { | ||
const numf: JTDSchema<number> = {type: "float64"} | ||
should.exist(numf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the only reason why should.exist is needed is to prevent typescript complaining could we prefix with underscores and remove all should.exist to reduce the noise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both eslint and typescript complained, and I wasn't sure of a good solution. If both are set to ignore underscores, then yeah, that makes much more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eslant is definitely set to ignore names that start from underscore, I "think" typescript would ignore them too.
That's very cool. Some comments is just me not understanding things and maybe not possible to achieve - probably need to just read slower - but any comments would be great. Thank you! |
It's also worth supporting readonly arrays - it was just added to JSONSchemaType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try and summarize some of the things I mentioned in some of the comments here, and may contradict some of what I wrote. Thanks for going over this so carefully. I think you ultimately caught some stuff I missed.
I want to make sure to hammer out this question of the types purpose to make sure we're on the same page. The first I expressed in one of the comments, but I'll express it again here. I ultimately this is the most important quality of the JTDSchemaType
and that is that this should be true:
ajv.compile<T>(schema: JTDSchema<T>): (obj:unknown) => obj is T
This means that empty schemas should only be allowed for empty objects. values should only be allowed for types that are indexed by any string (contrary to what I wrote earlier), types like 1 | 2 | 3
should not be allowed as JTD can not properly verify them, etc.
The second property that I think is good, but parts can be done gradually, is that the type should attempt to prevent unexpected errors, e.g. the schema should try and match the actual type as much as possible. This means that while the having the schema { type: "string" }
for the type number | string
is valid, i.e. if the validator returns true, then the object is that type, the schema was likely written improperly, and thus shouldn't be allowed. Similarly, if possible, definitions should only be allowed in the root, discriminator fields shouldn't be allowed in mapping objects, enumerated types should make sure that every element of the union is specified, etc. Most of these are somewhat hard to guarantee. I think in scope of this first commit is just union checking, but the others could be added later as they make sense.
- definitions in nested schemas: for now I'm going to not focus on disallowing this since it should fail fast and not provide unexpected errors. I'll look into see how difficult it will be / how complicated it will make the type, but I don't think it's that high priority.
- dates: I'm going to add support, shouldn't be hard.
- metadata: since this is mainly for migration purposes, and is complicated, and may not be able to typecheck, this seems out of scope for now. Also, the general notes around extensions seem to indicate that this should ideally be targeted at first class JTD users, not migrators.
- some schema: I think there's use for this, but I think it's somewhat out of scope. It seems like that may be better suited to being part of the vocabularies in some way, since internally they may want to use those types anyway.
- union to tuple: See this issue for the long discussion about it. There seem to be some solutions that generate all permutations. In some sense, that's the ideal solution, but obviously without a compact way to represent all permutations, that type is not feasible for enumerations larger than a handful. This seems worth exploring, but as noted above, probably an extension.
Finally, what do you mean by readonly arrays, just readonly T[]
? I think that should work fine, but I'll need to add tests to assert that.
spec/types/jtd-schema.spec.ts
Outdated
describe("JTDSchema typechecks", () => { | ||
it("should typecheck number schemas", () => { | ||
const numf: JTDSchema<number> = {type: "float64"} | ||
should.exist(numf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both eslint and typescript complained, and I wasn't sure of a good solution. If both are set to ignore underscores, then yeah, that makes much more sense.
yes
yes
based on validation === type guard statement, yes, although in some comment I wrote that maybe there should be a separate strict version of this type, better to keep things simple - if application code does some additional validation further down it should "upgrade" the type (rather than have the same type). agreed
yes
if not too complex - ok
yes, but not as important, as it would fail compilation - data type would still be correct.
I am in support of "sorted enum" requirement if this type is used
yes, let's ship it - it's good
yes. see #1447 agreed with all points, need to read "union to tuple" but ok too :)
That's what I thought. It's just scoping rules are not intuitive... I am not anti-words in type parameters :)
yes |
Everything should be addressed, check the tests to make sure things are as you expect. What's implemented here is essentially the JTDSchemaStrict. That is, it won't accept
which would allow you to know the type that was validated, but also assert afterwards, or just cast. I'm definitely open to other suggestions too, but my general thinking was, if it starts strict, the type is easier to loosen than it is to make more strict and make previously passing schemas fail. Other miscellaneous updates:
|
Also, looking into it more, it seems that there is no way to compare type literals in typescript, so there isn't a good way to do union to tuple 😢 |
agreed looks good to me! ok to merge/release? |
Yeah, if the tests seem to cover everything, this should be good to squash and merge |
@erikbrinkman I was wondering - if you have a recursive type that represents say a tree, then JTDSchemaType would probably require a recursive schema and it should all type check? Or would typescript just blow up? I didn't try it yet, but it probably should work? Also, it would be great to add some advanced demos/examples in the tests to showcase what is possible. |
It seems like it should generally work, because the schema only needs to check one branch, e.g. as since the ref type checks, the rest works. I tested it and it seems to work:
typechecked fine in the playground. I'm happy to add some demos / examples. Do you have a pointer to what you're thinking of? Finally, this should be its own issue, but it's small, so I'll just mention it here. I missed one case, which is the type Also, do you want to open up an issue to discuss what the relaxed schema should look like? Do you want to wait until people try the current schema and raise particular issues? |
just some more relatable complex example combining multiple primitives in one schema. Recursive case would also be good (it was unexpected btw that you need to explicitly pass definitions as parameters - is it maybe avoidable?)
I don’t think it’s needed to be honest. It would encourage having one type where there should be two - one after JTD validation and another after additional application level validation (anything that cannot be expressed with JTD)
indeed |
@epoberezkin my real question was where. Is there a place where you show some examples of JSONSchemaType that I could append with JTDSchemaType? As for the question of needing to type the definitions, to me it makes sense. You current specify the type for the schema, why wouldn't you specify the type for the definitions, as they're basically mini schemas in and of themselves. If these go untyped, it becomes a little less clear to me how a recursive schema, or any complex definitions would work. They work now because you essentially specify the recursion, and typescript verifies that it's consistent. Without that I'm not sure. It does bring up this idea though. Currently JTDSchemaType type takes a typescript type, and converts it to the type of an valid schema for that type. This is useful for error checking. However, another route would be to go the opposite direction, e.g. allow any schema type, and convert the schema type to a typescript type. Then you could just type I think it should be pretty easy to do something like this, but part of the utility depends on the ability to customize the signature of compile (the other question you tagged me in). It'd be easier if we kind of assumed that it'd only be used for inferred types, and therefore wouldn't need to handle unions in the same way as a user specified type. The major problem I see is that a lot of the types won't get inferred correctly due to the way typescript works. If you write: I decided to play around with this, and it seems to generally work. See the playground here. It looks like typescript is smart in that it handles infinite recursion, but not in a human readable way, e.g. it seems to lazily expand the type as necessary. |
I think it should be in "getting started" in readme, but I was thinking to add to the tests too, similar to json-schema types test. It all deservers a separate big section in json-type-definition.md, particularly if data type is added (probably JTDDataType?)
That's the idea of JTD I think - then you don't really need to define the type.
yes - sounds ok.
You can make the whole schema It works in your playground too It's all good - it's worth adding it and it can be used it type signatures. |
I did not know about const assertions, that's really useful and undoes a lot of that frustration. This might not be too bad afterall. |
It would be cool to have type for SomeJTSSchemaType to make sure the schema is valid (maybe it is just JSONSchemaType without parameters) and then JTDDataType that computes type for the data from the schema. And actually using JSONSchemaType might also force the types to be correct without So it would work like this:
Probably JTDDataType can make sure that the schema is valid and it removes the need for schema validation actually... |
yeah, I think with a fully defined JTDDataType, that should work to both validate and produce the correct type. The downside being that incorrect types will just be never instead of throwing an immediate error. There's some talk about adding an invalid type to make things like this more obvious, but nothing currently exists. |
What issue does this pull request resolve?
Adds a type like the
JSONSchemaType
but for JTD schemas. Notably, because unions are tagged in JTD by default, the discriminator type can determine if the sub schemas match each of the union elements, and there's no indecisiveness betweenundefined
andnull
.This is very much an RFC in that there's a few design decisions that should probably be made before adding this, and there's a few behaviors of this current submission that aren't ideal.
What changes did you make?
Added the type, and tests to verify typescripts behavior, although some of the verified behavior is not ideal.
Is there anything that requires more attention while reviewing?
There are three major decisions that should probably be decided on, and then there's a number of behaviors that I've called out as odd in the code that should probably also be resolved.
definitions
is only allowed in a root schema, but typescript doesn't like forbidding properties (but it can be done). Should this definition error if anything extra is specified in a schema (at least anything that's expressly forbidden).iii
, but that's a preference. My thoughts here are messy, but basically the type could:Schema<number | string> = { type: "string" }
on other words, validating the schema would produce an instance of the presented type, even if it's narrower.never
, like the abovenumber | string
since it's not a tagged union.Date
being a validtimestamp
or specific fields inmetadata
like untagged unions that I'm not sure can be done well.Other notes from the code that don't fall under the above decisions:
enum
types there's no way to make sure that everything was specified. This technically means that onlyii
above is possible, but could potentially be overlooked. There might be a way to create a sorted tuple type, but it would add the complexity that enum type must be in sorted order. Not clear how feasible this is or what the runtime of checking it would be for large enumerations.