Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema fixups #10

Closed
wants to merge 1 commit into from
Closed

schema fixups #10

wants to merge 1 commit into from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 2, 2019

Take 2, or 3 or something. Trying to narrow down the parsing rules and the AST (not technically an AST but let's call it that until we have a better term).

My first take at this had "representation":{"map":{}} showing up for all structs and maps (and other things which it shouldn't have) but as I expanded into the other base types it made less sense. Do you also need "representation":{"list":{}} for a list type, "representation":{"int":{}} for an int? So I went with a rule: where there is a default representation for a type (map for struct, string for enum, list for list, bytes for bytes etc.) as described in the schema spec PR, don't include it if it doesn't add anything useful.

Also, I've seen you talk about field aliases in map representations in a couple of places now but they aren't in schema-schema.ipldsch or the spec PR, are you intending these to be included now or something to expand to later?

@warpfork
Copy link
Collaborator

warpfork commented May 2, 2019

So I went with a rule: where there is a default representation for a type (map for struct, string for enum, list for list, bytes for bytes etc.) as described in the schema spec PR, don't include it if it doesn't add anything useful.

👍

An alternative but very similar rule we could apply is "for those that don't yet have more than one representation, we don't store a field for it". The effect would be string and int and such don't get anything, but structs would still be explicit about representing as maps in the JSON form even though that's the implied default in the DSL.

Either way we should be making sure the schema-schema self-describes properly; and the higher level rule we're talking about here is just how we decide which fields on types are actually present in the schema-schema and whether those fields are optional / have default values.

And I guess it's the case that the DSL has some slightly stronger feelings about implied defaults than the schema-schema itself does. This isn't explicitly said anywhere yet, so maybe we should fine a place to document that. Basically, I want the IPLD format matching the schema-schema to stay extremely stable over time (and if anything, use the schema system's own migration mitigation tools over time), and I care a fair bit less about the stability of the human-convenience DSL (which should still be "stable" for our sanity, of course, but having a "2.0" heyday in the DSL that changes some defaults is going to have lot more managable splash radius than doing so in the IPLD format that we'll orient most tools around).

@@ -10,47 +10,43 @@
"kind": "struct",
"fields": {
"fooField": {
"valueType": {
"type": "map",
"type": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, jeesh. Yep, this whole block of changes is a fair cop.

"fields": {},
"representation": {
"map": {}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all these, whether or not we want them here depends on which of the two rules we prefer.

But right now, the final word should match the field defns for TypeStruct -- which have representation not being an optional field, so these should stay in by that logic.

@warpfork
Copy link
Collaborator

warpfork commented May 2, 2019

Also, I've seen you talk about field aliases in map representations in a couple of places now but they aren't in schema-schema.ipldsch

Yes they are :)

@warpfork
Copy link
Collaborator

warpfork commented May 2, 2019

So I went with a rule: where there is a default representation for a type (map for struct, string for enum, list for list, bytes for bytes etc.) as described in the schema spec PR, don't include it if it doesn't add anything useful.

An alternative but very similar rule we could apply is "for those that don't yet have more than one representation, we don't store a field for it".

which fields on types are actually present in the schema-schema and whether those fields are optional / have default values.

Continuing this train of thought... I'm not sure which ideas are going to be best future-proofed and consistent here.

If we use the second rule (and "representation" fields in the schema schema are non-optional when present, and otherwise absent entirely)... this might result in some weirdness down the road if we do introduce new representation strategies for kinds that don't currently have any: in order to be backwards compatible with existing IPLD-serialized schemas, the "representation" fields for those would now have to be 'optional' or have a 'default'.

So maybe it would be wise for our future consistency to bite the bullet and try to do everything with optional or default now?

(There's another concern to recurse into if we apply the 'default' feature here -- that there's currently no definition for how default values that aren't basic scalars should look, and we've got keyed unions for representations. But that's a bridge we should cross at some point anyway, I suppose.)

(A totally different way to handle the backwards compatibility in case of such a change in the schema-schema is of course to just start versioning the schema-schema, and the newer one has additional "representation" fields, and they're not optional nor have defaults... and we use the "schema try-stack" idea to probe any IPLD-serialized schema we're parsing; if it's missing one of the "representation" fields that's required in the newest version of the schema, the parser goes "welp" and falls back to the older schema version, and then we have code to do the migration semantically. Tada; this all works fine. It's a higher-firepower path though.)

@rvagg
Copy link
Member Author

rvagg commented May 3, 2019

So maybe it would be wise for our future consistency to bite the bullet and try to do everything with optional or default now?

You're saying that maybe everything should have "representation":{...}, right? I'm fine with that for consistency sake, and to ensure the above point about encoded hash consistency. But it does blow out the size (in bytes) of the JSON form. And surely there's cases where that matters? It also introduces more human readability problems and it'd be nice if the JSON/IPLD/AST form was almost as easy to grok as the DSL.

Regarding future-proofing and making default representations not appear in the JSON: I'm not sure if that's a real problem as long as we retain the concept of a "default" representation into the future. Maybe we add like type StringyFloat float representation string, but that doesn't have to change the fact that the default representation for float is "float", you'd just be adding "representation":{"string":{}} to your StringyFloat and leaving normal float's without.

I think the two clear choices here are:

  1. Don't include a "representation" block where the representation being used is the only for that type, or the default for that type and doesn't include additional options (like field customisations)
  2. Include a "representation" for everything.

The in-between one of: only have a "representation" block for types that have more than one way of representing them has future proofing and consistency problems.

I'm fine with either 1 or 2, but for different reasons. 1 for efficiency and lack of verbosity (i.e. easier for humans to read) and 2 for consistency and ease of explaining it in a spec. I think you're going to have to generate a strong feeling either way to help this out.

@warpfork
Copy link
Collaborator

Okay, housecleaning day! :D I'm gonna land this by cherry-picking.

There's a couple parts of this diff that are definitely fixes of flat out typos; and I'm gonna cherry-pick and merge those.

There's a couple other parts that are probably good things to give some more consideration to and suss out; but I'd like to do that in another PR if that's alright with you (and ideally, after flipping these files over to the specs repo -- it might've been convenient to incubate them in this repo, but it's getting somewhat silly to let them reside here any further).

@warpfork
Copy link
Collaborator

(I do think you've probably got a point about the need to further clarify where representation fields should be present or not... but the schema for schemas should make that declaration, and be itself formatted correspondingly at the same time, and that's giving me enough of a headache that it's really really hard to deal with it as anything other than a completely standalone diff.)

warpfork pushed a commit that referenced this pull request Jun 25, 2019
Cherry-picked by eric with several changes.

- The fixups of missing bits of schema-schema are a fair cop.

- Several critical typos in the example.ipldsch.json file are fixed.

- I tweaked the syntax for aliases in the example file; we've had the
  idea since this was originally posted of putting representational
  concerns that are clearly associated with a field inline with the
  fields for convenience, but in parens to distinguish that they're
  indeed representation concerns rather than type/cardinality concerns,
  so this is the expression of that idea.

- I actually walked back the removal of representation blocks in the
  json.  There's probably valid considerations to be made on that, but
  it deserves separate handling.  (Disconcertingly, I actually added
  two *new* representation blocks that seem to previously have been
  missing due to typos.)

Any new errors are presumably mine.

Original: #10
@warpfork
Copy link
Collaborator

Cherry-picked in the above automatically referenced commit: 54ac969

Of course, this means this PR is now in "conflict" (since I took half of it!), so... Will now close this :)

Thanks for catching all these details! ❤️

@warpfork warpfork closed this Jun 25, 2019
vmx pushed a commit to ipld/specs that referenced this pull request Jun 25, 2019
Cherry-picked by eric with several changes.

- The fixups of missing bits of schema-schema are a fair cop.

- Several critical typos in the example.ipldsch.json file are fixed.

- I tweaked the syntax for aliases in the example file; we've had the
  idea since this was originally posted of putting representational
  concerns that are clearly associated with a field inline with the
  fields for convenience, but in parens to distinguish that they're
  indeed representation concerns rather than type/cardinality concerns,
  so this is the expression of that idea.

- I actually walked back the removal of representation blocks in the
  json.  There's probably valid considerations to be made on that, but
  it deserves separate handling.  (Disconcertingly, I actually added
  two *new* representation blocks that seem to previously have been
  missing due to typos.)

Any new errors are presumably mine.

Original: ipld/go-ipld-prime#10
@rvagg rvagg deleted the rvagg/schema-fixups branch June 26, 2019 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants