json schema validation: enforce lowerCamelCase keys #987

Insti · 2017-10-30T15:29:50Z

There is a json schema definition for the canoncial-data.json file.
https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json

However this still allows for arbitrary keys containing spaces.
These keys are often used for specifying test input arguments.

problem-specifications/exercises/food-chain/canonical-data.json

Line 21 in 637cca6

"start verse": 1,

Task

Update the https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json to require all keys to be lowerCamelCase

The text was updated successfully, but these errors were encountered:

Insti · 2017-10-30T15:31:54Z

Is this a good idea?

I'm personally fine with keys that contain spaces.
It is only food-chain and house that currently contain keys that contain spaces. (there are pull requests open that may change this: house: Update to test property recite #985, food-chain: Make properties less ambiguous #973 )
No-one has (yet) raised any issues with parsing the json for these exercises.
For some languages being able to assume that json keys will not contain spaces will make parsing easier.

NobbZ · 2017-10-30T15:43:45Z

I'd not say that parsing is easier or harder, but usage.

In some languages keys are transformed in a way that we can use the keys name as an accessor of a struct/object/record like thing (eg. case.startVerse). If though they contain spaces they need to be often treated differently by using array like accessors (eg case["start verse"]).

I'm not affected by that in erlang anyway. But I have to match on the keys using atoms, and usually atoms in erlang are alphanumerics + underscore, beginning with lowercase alpha. I can though construct atoms that contain all kind of characters (in modern erlang even unicode characters) as long as they do not exceed 256 (or was it 255) bytes in their UTF8 representation. But I have to quote them then using single-quote character: (startVerse vs. 'start verse'). Even though quoted atoms are possible, a rule of thumb is to avoid them, since the quoting often leads to problems in IDEs and when talking with people from the outside world. They often consider them plain strings which is wrong.

So even here I'd prefer a way that avoids quoting the key.

ErikSchierboom · 2017-11-08T08:56:17Z

So even here I'd prefer a way that avoids quoting the key.

Agreed. I'm also in favor of enforcing lowerCamelCase keys.

per exercism#987

rbasso · 2018-01-28T18:40:18Z

Nice change, @Insti. We didn't enforced that in the schema in #602 at the time because my json-schema-fu [is|was] too weak.

ErikSchierboom · 2022-01-14T12:38:44Z

@Insti Are you still interested in working on this issue?

ErikSchierboom · 2022-01-19T15:20:00Z

Looking at the current canonical-data schema, it looks like this has been implemented.

petertseng · 2022-01-19T17:49:16Z

As you can see in #1925, what has been asked for in this issue has not been implemented, because #1925 passed CI.

So if this issue is to be closed, the reason can't be "it has already been implemented" (because it hasn't); the reason has to be "we chose not to implement it".

This comment is neither a recommendation for or against closing this issue.

ErikSchierboom · 2022-01-20T12:29:20Z

I misread the issue and assumed this was about the property key, and not the input/expected keys.
I've just ran the following bash script to figure out if any keys have a space:

for file in exercises/*/canonical-data.json; do
    if [ $(jq -e -r '[paths | join(".") | select(. | contains(" "))] | length > 0' $file) == "true" ]; then
        echo $file
    fi
done

This did not output any file, so it looks like we don't yet have any keys with spaces.
The question then becomes: how can we update the schema file to enforce this?
What must be considered is that the input/expected keys can be objects, but could also be literal values like booleans, integers or strings.

rbasso · 2022-01-20T22:07:40Z

The question then becomes: how can we update the schema file to enforce this?

Hi, @ErikSchierboom .

I think you could use propertyNames to enforce a pattern to the keys...

{
  "type": "object",
  "propertyNames": {
    "pattern": "^[a-z]+([A-Z][a-z]+)*[A-Z]?$"
  }
}

...but that would not enforce the pattern inside nested objects automatically, so the following would be valid:

{
  "validKey": {
    "Invalid Key": null
  }
}

Edit - Below is the simplest solution I was able to write to restrict all keys to a regexp. I divided it in three definions, instead of just one, to make usage more flexible:

$defs:

  lowerCamelCaseArray:
    type: array
    items:
      $ref: "#/$defs/lowerCamelCaseValue"

  lowerCamelCaseObject:
    type: object
    patternProperties:
      "^[a-z]+([A-Z][a-z]+)*[A-Z]?$":
        $ref: "#/$defs/lowerCamelCaseValue"
    additionalProperties: false

  lowerCamelCaseValue:
    anyOf:
    - type: "boolean"
    - type: "null"
    - type: "number"
    - type: "string"
    - $ref: "#/$defs/lowerCamelCaseArray"
    - $ref: "#/$defs/lowerCamelCaseObject"

$ref: "#/$defs/lowerCamelCaseObject"

Please, check if it works as expected!

Right now, I don't have the time to write the PR, but I hope this helps. 😄

rbasso · 2022-01-22T16:24:20Z

I just wrote #1929 to check the possible impact of the change.

The following exercises have nonLowerCamelCase keys:

alphametics
circular-buffer
clock
complex-numbers
custom-set
etl
hamming
list-ops
nucleotide-count
queen-attack
rational-numbers
react
rest-api
sgf-parsing
word-count

At first sight, there are different problems:

Inputs and/out outputs encoded as objects. (~~hard to fix: change output encoding to avoid using objects, otherwise no rules are enforceable on input or expected~~)
- alphametics
- etl
- nucleotide-count
- sgf-parsing
- word-count
Keys in snake_case. (~~easy to fix: change keys to lowerCamelCase~~)
- circular-buffer
- queen-attack
- react
- rest-api
Keys containing numbers. (~~easy to fix: change keys or change pattern to accept numbers~~)
- clock
- complex-numbers
- custom-set
- hamming
- list-ops
- rational-numbers

ps: We should probably consider upgrading ajv to a newer version and the JSON Schema to Draft 2020-12.

Edit:

I just saw the README.md, and I'm a little bit confused. If test cases are immutable, how could we change an offending key?

Test cases are immutable, which means that once a test case has been added, it never changes.

It's been a while since the time I was active here, so maybe I'm getting something wrong, but I think this design rules out enforcing new schema restrictions on input and expected.

ErikSchierboom · 2022-01-25T08:33:02Z

Wow, so many keys in a different format. That is problematic.

It's been a while since the time I was active here, so maybe I'm getting something wrong, but I think this design rules out enforcing new schema restrictions on input and expected.

You are totally correct. The only option we have is to add new test cases that reimplement existing ones, but such a change shouldn't change the actual keys as it will still break things that way. I think we'll have to consider not enforcing lowerCamelCase for input and expected values, unless someone has a great idea?

petertseng · 2022-01-25T09:00:08Z

Ah, right, because in word-count a track's test generator is never meant to read the key and values of the expected; it's just meant to treat the expected as an object to compare the answer to. As opposed to something like zipper where a track's test generator does want to read the keys and values of expected.

What is the minimum we could do? Something like:

Only enforce the first level of keys under input. Do not descend into objects nested underinput to validate their keys, nor enforce anything in expected.

Would that still have value?

ErikSchierboom · 2022-01-25T09:07:40Z

Only enforce the first level of keys under input. Do not descend into objects nested underinput to validate their keys, nor enforce anything in expected.

Would that still have value?

I think so, as the keys on the first level of input are usually directly converted to parameter names. This is not the case for expected.

@rbasso Could you check to see if there are any first-level input keys that don't use lowerCamelCase?

rbasso · 2022-01-25T15:03:58Z

@rbasso Could you check to see if there are any first-level input keys that don't use lowerCamelCase?

@ErikSchierboom , changing the schema to enforce ~~lowerCamelCaseObject~~ the rule (edit) on input we get the following errors:

clock           : property name 'clock1'      is invalid
complex-numbers : property name 'z1'          is invalid
custom-set      : property name 'set1'        is invalid
hamming         : property name 'strand1'     is invalid
list-ops        : property name 'list1'       is invalid
queen-attack    : property name 'white_queen' is invalid
rational-numbers: property name 'r1'          is invalid

Anyway, I think this PR should be closed for now, because the inability to enforce new schema rules is a limitation implied by design choices, which cannot be simply fixed without a new design.

If I remember correctly, when the schema was created (#602) - after a long and heated discussion (#336) I partially regret 😬 - we allowed nested groups of tests forming a tree-like structure, to model the existing data, and the entire file was versioned to allow test generators to detect API-breaking and content-only changes.

If I understand correctly #1674, test cases now have UUIDs and are treated as immutable entities, which tracks may implement.

I think #602 and #1674 have underlying conflicting views, because, in #602, the canonical-data.json represent the latest version of a test tree, but, after #1674, it is also a storage of all test versions.

I think the current canonical-data.json format does both tasks badly:

If it models a tree of tests, it should not include older test versions.
If it is a storage of tests, what is the meaning of a tree-like structure?

I didn't read all the discussion behind #1674, so take anything I wrote with a grain of salt, but I guess this is a "design smell"

ErikSchierboom · 2022-01-25T15:09:40Z

Thanks for the writeup. We'll need to consider these things in more detail.

ErikSchierboom mentioned this issue Nov 14, 2017

simple-cipher: Add canonical data #966

Closed

Insti mentioned this issue Nov 15, 2017

documentation: README.md - Document lowerCamelCase keys #1008

Closed

ErikSchierboom mentioned this issue Nov 21, 2017

series: Improve documentation #1020

Merged

devkabiir mentioned this issue Dec 6, 2017

binary-search-tree: implement canonical-data.json #940

Merged

rpottsoh added a commit to rpottsoh/exercism-problem-specifications that referenced this issue Jan 8, 2018

bowling: Apply lowerCamelCase policy

737ee61

per exercism#987

rpottsoh mentioned this issue Jan 8, 2018

bowling: update test data for recent policy changes #1088

Merged

ErikSchierboom mentioned this issue Jan 8, 2018

gigasecond: Apply new "input" policy #1080

Merged

rpottsoh added a commit to rpottsoh/exercism-problem-specifications that referenced this issue Jan 18, 2018

two-bucket: apply lowerCamelCase rule to variables

edbc86b

per exercism#987

rpottsoh mentioned this issue Jan 18, 2018

two-bucket: apply lowerCamelCase rule to variables #1136

Merged

Insti added the policy label Jan 25, 2018

This was referenced Feb 4, 2018

crypto-square: apply lowerCamelCase policy #1168

Closed

all-your-base: apply lowerCamelCase policy #1169

Merged

ErikSchierboom closed this as completed Jan 19, 2022

ErikSchierboom reopened this Jan 20, 2022

rbasso mentioned this issue Jan 22, 2022

Enforce localCamelCase keys on canonical_data.json #1929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json schema validation: enforce lowerCamelCase keys #987

json schema validation: enforce lowerCamelCase keys #987

Insti commented Oct 30, 2017

Insti commented Oct 30, 2017 •

edited

Loading

NobbZ commented Oct 30, 2017

ErikSchierboom commented Nov 8, 2017

rbasso commented Jan 28, 2018

ErikSchierboom commented Jan 14, 2022

ErikSchierboom commented Jan 19, 2022

petertseng commented Jan 19, 2022

ErikSchierboom commented Jan 20, 2022

rbasso commented Jan 20, 2022 •

edited

Loading

rbasso commented Jan 22, 2022 •

edited

Loading

ErikSchierboom commented Jan 25, 2022

petertseng commented Jan 25, 2022

ErikSchierboom commented Jan 25, 2022

rbasso commented Jan 25, 2022 •

edited

Loading

ErikSchierboom commented Jan 25, 2022

json schema validation: enforce lowerCamelCase keys #987

json schema validation: enforce lowerCamelCase keys #987

Comments

Insti commented Oct 30, 2017

Insti commented Oct 30, 2017 • edited Loading

NobbZ commented Oct 30, 2017

ErikSchierboom commented Nov 8, 2017

rbasso commented Jan 28, 2018

ErikSchierboom commented Jan 14, 2022

ErikSchierboom commented Jan 19, 2022

petertseng commented Jan 19, 2022

ErikSchierboom commented Jan 20, 2022

rbasso commented Jan 20, 2022 • edited Loading

rbasso commented Jan 22, 2022 • edited Loading

ErikSchierboom commented Jan 25, 2022

petertseng commented Jan 25, 2022

ErikSchierboom commented Jan 25, 2022

rbasso commented Jan 25, 2022 • edited Loading

ErikSchierboom commented Jan 25, 2022

Insti commented Oct 30, 2017 •

edited

Loading

rbasso commented Jan 20, 2022 •

edited

Loading

rbasso commented Jan 22, 2022 •

edited

Loading

rbasso commented Jan 25, 2022 •

edited

Loading