Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonical-data.json standardisation discussion (was: Malformed data?) #336

Closed
zenspider opened this issue Aug 13, 2016 · 70 comments
Closed

Comments

@zenspider
Copy link

zenspider commented Aug 13, 2016

It appears that all-your-base.json is malformed. Where allergies.json has the structure of:

{
    "allergic_to": {
        "description": [ ... ],
        "cases": [ { "description": "...", ... } ... ]
            }, ...
}

all-your-base.json has:

{
  "#": [ ... ],
  "cases": [ ... ]

cases should be wrapped in a function name, yes?

It appears that bin/jsonlint only checks that the json parses, not that it has good structure.

At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. 😀

@zenspider
Copy link
Author

Looks like there are a lot of different structures involved. Please provide hints as to the correct syntax so I can parse this stuff.

@kytrinyx
Copy link
Member

Looks like there are a lot of different structures involved.

Yeah, this sort of happened a bit at a time, and we weren't sure what the various needs of this data were going to be.

We now have enough data to decide on a file format, but I don't think anyone has gone through and figured out what the syntax should be yet.

@Insti
Copy link
Contributor

Insti commented Aug 14, 2016

@zenspider sounds like you're writing a parser, perhaps you can look through the existing data and tell us what structure we should be using to make parsing convenient.
Then we can document it and create issues to update the old data.

@kytrinyx
Copy link
Member

kytrinyx commented Sep 9, 2016

@devonestes This is the issue we were talking about on twitter.

@catb0t
Copy link
Contributor

catb0t commented Sep 21, 2016

I'm just gonna collect my thoughts from #376 here, because I think this needs fleshing out.

I believe we can simultaneously make the JSON easier for humans and programs to read, but the way it is now makes it very hard to make a generalising program.

@petertseng linked to examples of code in various tracks using canonical-data.json to generate exercises, and I feel they all share a common problem: because each exercise has a different structure, each exercise needs its own separate, different test generator program.

My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible. I don't want a different ${exercisename}-testgen.factor for each different JSON structure.

As it is right now, I could theoretically write code to map x-common's JSON keys to my own internal structure, but this requires a duplication across programs that read this data. Also, it's not scalable, and as such it would be genuinely beneficial to everyone to standardise the keys and their meanings.

I am personally willing to manually rewrite all the JSON in this repository to fit a predictable format, but I won't until we have a consensus.

@NobbZ
Copy link
Member

NobbZ commented Sep 21, 2016

I'd fully support a more generic structure which would make it unnecessary to have a generator for each exercise.

But I have to admit, I have no idea how it could look like. Since you already said you would change them, do you have an idea about the structure already @catb0t?

Also since it seems to be the right time, I want to request a feature for this generic format:

I had a sleepness night, of how I should handle changes in the canonical data as I wanted to have some versioning test. First I thought I'd could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get "invalidated". Therefore I think it would be a good idea to version the canonical data as well.

@catb0t
Copy link
Contributor

catb0t commented Sep 21, 2016

@catb0t wrote:

I'm thinking something like:

  • For exercises with one input translating to one output, description, input and output.
  • For exercises with multiple inputs / multiple outputs, description, input_N, output_N.

Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like input_multi which is an array of inputs, I suppose?

@petertseng wrote

For exercises with multiple inputs / multiple outputs, description,input_N, output_N.

[ ... ] Can we simultaneously make it easy for a human to read as well? [ ... ] in e.g. all-your-base's JSON, [ ... ] many tracks will pass in three inputs: input_base, input_digits, output_base, and then check that the output digits are as specified in output_digits. If the data then simply looked like "input_1": 2, "input_2": [1], "input_3": 10, "output": [1] I think it might not be clear what is the difference between input_1 and input_3 to a human, and I consider this important for being able to understand PRs that propose to change the test cases.

@petertseng makes a good point that input_N, etc, might harm readability especially since there are no comments in JSON, and I'm not really sure what to do about that.

I don't have a firm idea of what keys would fix Peter's point, which is a reason I haven't started rewriting it all myself yet.

Using descriptive English names makes it hard to access them programmatically, but using numbered keys makes it hard for people (not me, but other maintainers) to read. What strikes a balance?

This might be a little bit wild, so bear with me: what if we add a top-level key metadata, and it has this structure:

"cases": { "cases data..." }
"metadata": {
    "input_keys": [ "input_key1", "input_key2", "input_key3" ],
    "output_keys": [ "output_keyN" ]
}

That moves the mapping of human-readable keys from each track's generation code to the JSON itself. Then autogeneration code can read metadata to get the list of keys that are used in this cases structure.

[ ... ]How should I handle changes in the canonical data as I wanted to have some versioning test. [ ... ] I could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get "invalidated". Therefore I think it would be a good idea to version the canonical data as well.

We could perhaps end up with:

"#": "..."
"cases": { "cases data..." }
"metadata": { "..." }
"version": {
    "version_hash": "shasum of minified version of this file",
    "version_time": "seconds since 1 Jan 1970 here"
}

And you can read the version key. Or perhaps I'm misunderstanding your point.

@NobbZ
Copy link
Member

NobbZ commented Sep 21, 2016

I do not understand the input_N stuff, but there came something into my mind.

{
  "exercise": "repeat",
  "examples": [
    {
      "function": "repeat",
      "description": "tests valid stuff",
      "input_count": 5,
      "input_string": "foo",
      "expected": "foofoofoofoofoo"
    },
    {
      "function": "repeat",
      "description": "tests failure",
      "input_count": -5,
      "input_string": "foo",
      "expected": { "error": "no negatives allowed" }
    }
  ]
}

Perhaps we can use this as a base, or throw it away instantly?

@zenspider
Copy link
Author

zenspider commented Sep 21, 2016

@NobbZ:

    {
      "function": "repeat",
      "description": "tests failure",
      "input_count": -5,
      "input_string": "foo",
      "expected": { "error": "no negatives allowed" }
    }

and what ensures the order of the args? There's no metadata in place to declare argument names.

@zenspider
Copy link
Author

@catb0t:

My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible [emphasis mine]. I don't want a different ${exercisename}-testgen.factor for each different JSON structure.

I don't. I think you can get a good start on it for most languages, but that idea doesn't take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). Nor is it realistic about the level of finality. I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.

@NobbZ
Copy link
Member

NobbZ commented Sep 21, 2016

I do not see any sense in specifying order of arguments in the canonical
testdata. There are different idioms and necessities in the various tracks.

Let's assume we have some data type and we write functions around it. Let's
call it list. In object oriented languages it will be the object we call a
method in so it will be completely out of the order of arguments. In elixir
we like to have this object like argument at the first position to be able
to pipe it around, while in Haskell it is preferred to have it last to be
able to use point free style and partial application.

So as you can see order of arguments has to be specifies by the tracks
maintainer a anyway.

Ryan Davis [email protected] schrieb am Mi., 21. Sep. 2016 23:47:

{
  "function": "repeat",
  "description": "tests failure",
  "input_count": -5,
  "input_string": "foo",
  "expected": { "error": "no negatives allowed" }
}

and what ensures the order of the args? There's no metadata in place to
declare argument names.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#336 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADmR8LftAf2rePU2wUD_ZgXKqYCjrzbks5qsaYCgaJpZM4JjxYn
.

@rbasso
Copy link
Contributor

rbasso commented Sep 22, 2016

Maybe I'm a little late and out of topic, but I'll try anyway...

About automatically generated test suites

I know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks.

I think it is impossible, in the general case, to auto-magically generate the test suite, unless we collapse all the types into the ones representable in JSON. I know that, at least in Haskell, that would be bad and wrong! 😄

That said, it is certainly possible to have a generator to automatically update a specific exercise, if the JSON structure is not changed.

Is it worthy?

That depends on how frequently the data and the structure are updated, but mostly on how fun is the process of writing and maintaining it. So I think it is not unreasonable. 👍

Alternatively - if the desire is really to have auto-magic test suites - it would be more compatible if the exercises where specified as stdin-stdout mappings. That would be similar to how online judge systems work, but I don't think it is exercism's destiny to follow that path.

About readability for humans and software

Considering that it is generally impossible to automatically generate test suites, I think it doesn't make sense to sacrifice human-readability too much, forging a JSON that is convenient for software but inconvenient for humans.

That doesn't mean we shouldn't standardize the files. We should, but remembering that the files are meant to be read first by humans, and then by software.

About the format

Maybe I'm the only one that doesn't got what is going on here, but I think that, until it is clear what is our goal here, we should avoid getting into the details of the specification.

Edit: Ok. I think I got it!. 😄

@rbasso
Copy link
Contributor

rbasso commented Sep 22, 2016

What about something like this:

{
  "exercise": "cipher",
  "version": "0.1.0 or an object with more detailed information",
  "comments": [
    "Anything you can think of",
    "as a list of strings"
  ],
  "tests": [
    {
      "name": "encode",
      "description": "Encodes plaintext",
      "cases": [
        {
          "description": "Encodes simple text",
          "plaintext": "Secret message",
          "key": "asdf1234",
          "expected": "qwertygh"
        },
        {
          "description": "Encodes empty string",
          "plaintext": "",
          "key": "test1234",
          "expected": ""
        }
      ]
    },
    {
      "name": "decode",
      "description": "Decodes plaintext",
      "cases": [
        {
          "description": "Decodes simple text",
          "ciphertext": "qwertygh",
          "key": "asdf1234",
          "expected": "Secret message"
        },
        {
          "description": "Decodes empty string",
          "ciphertext": "",
          "key": "test1234",
          "expected": ""
        }
      ]
    }
  • 👍 Allows groups of tests.
  • 👍 Uses standard names where it doesn't affect human-readability.
  • 👍 Captures most of the structure of existing tests.
  • 👍 Doesn't damage readability too much.
  • 👎 Mixes description, inputs and output in the same object.
  • 👎 Gives special meaning for two case's keys: description and expected.
  • 👎 Has no explicit ordering of cases' input data.

The descriptions could be mandatory or optional.

I would be possible to use multilevel grouping of tests, but I don't think that is used frequently.

Keeping the description, inputs and the expected output together, we have a structure that is more human-friendly, but not so convenient for processing.

@zenspider and @catb0t, would it be too difficult to separate description and expected from the other keys? Would it be reasonable for you to use an implicit alphabetic ordering for the remaining keys, instead of adding metadata?

@devonestes
Copy link
Contributor

I've been thinking about this a bit recently, and I think the most generalized version of this we can get might be the best for as many different needs as possible. What we're really doing in most of these exercises is basically testing functions. There's input, and there's output. By trying to use keys in our JSON objects that are things like "plaintext" and "key", that's creating a need for knowledge about the exercise to accurately understand how those parts interact.

I think if we can generalize on that concept of a function that we're testing, that might be helpful both for human readability, and also for machine readability so we can possibly use this data for automatic tests.

So, here's my example:

{
  "exercise": "cipher",
  "version": "0.1.0 or an object with more detailed information",
  "comments": [
    "Anything you can think of",
    "as a list of strings"
  ],
  "tests": [
    {
      "description": "encodes simple text",
      "function": "encode",
      "input": ["Secret message", "asdf1234"],
      "output": "qwertygh"
    },
    {
      "description": "encodes empty string",
      "function": "encode",
      "input": ["", "test1234"],
      "output": ""
    },
    {
      "description": "decodes simple string",
      "function": "decode",
      "input": ["qwertygh", "asdf1234"],
      "output": "Secret message"
    }
  ]
}

I don't think there are any exercises that require anything other than input and output, but I haven't done too deep of an analysis on that. I'd love any feedback if there are edge cases that would need to be taken care of here. I know that based on the structure above I can think of reasonable ways to parse that and automatically create some skeletons for tests in Ruby, Elixir, Go, JavaScript and Python, but that's really all I can reasonably speak to since those are the only languages I have a decent amount of experience with.

Also, I sort of like the stripped down way of looking at this - when I look at that data I don't need to know the context of the exercise to know what's going on. I just know there's a thing called encode, and that takes some input and returns some output, and there's a text description of what's going on.

I'm not really 100% sure that this would give us everything we want, but I wanted to at least throw this idea out there to get feedback and see if it might be a starting point for an actually good idea!

@rbasso
Copy link
Contributor

rbasso commented Sep 24, 2016

What we're really doing in most of these exercises is basically testing functions. There's input, and there's output.

I think that the general case would be to test assertions...

      "name": "reversibility",
      "description": "Decoding a text encoded with the same key should give the original plaintext",
      "cases": [
        {
          "description": "Only letters",
          "plaintext": "ThisIsASecretMessage",
          "key": "test1234",
        },

... that can be general - like properties, in QuickCheck - or specific, like our common tests.

But I agree that most - if not all - tests are in the form: function inputs == output.

Also, I sort of like the stripped down way of looking at this - when I look at that data I don't need to know the context of the exercise to know what's going on.

This is probably where I disagree...

Maybe we don't need to know the context, but sometimes we want to.

The ability to group tests is so pervasive that I cannot find a single test framework in Haskell that doesn't allow it:

  • HUnit
  • HSpec
  • Tasty

I just know there's a thing called encode, and that takes some input and returns some output, and there's a text description of what's going on.

Exactly! Substituting the keys by a list of arguments, the only thing we know is that there is something that takes inputs and gives an output. We don't know the meaning of those things anymore!

I understand that your proposal makes automatic generation of tests easier while keeping reasonable readability, @devonestes, but that still comes at a price!

The real question

Seems to me that the question that we have to answer is:

  • Are we are willing to exchange a high-level description of tests by a low-level description of function calls, in order to make completely automated test generation feasible in some tracks?

@devonestes
Copy link
Contributor

devonestes commented Sep 25, 2016

@rbasso I see your points, and I actually think we can get a little more of the benefit that you mention. How about something like this:

{
  "exercise": "cipher",
  "version": "0.1.0 or an object with more detailed information",
  "comments": [
    "Anything you can think of",
    "as a list of strings"
  ],
  "tests": [
    {
      "description": "encodes simple text",
      "function": "encode",
      "input": {
        "plaintext": "Secret message",
        "key": "asdf1234"
      },
      "output": "qwertygh"
    }
  ]
}

For the interest of programmatically generating tests, we know what our inputs are (and we can easily ignore the human-specific context in the keys in that object and just look at the values), but for the purpose of assigning some meaning to this data, we can give some context-specific information by adding those keys to the input object.

I think with the above structure we still don't need to understand the context to figure out what's going on, but if we want context it's there for us. I actually think this is a much better version than the original one!

I guess if I were to generalize the structure of a test object in that JSON, it would be this:

{
  "description": "description of what is being tested in this test",
  "function": "name of function (or method) being tested",
  "input": {
    "description of input": "actual input (can be string, int, bool, hash/map, array/list, whatevs)"
  },
  "output": "output of function being tested with above inputs"
}

So, I actually kind of like that. What does everyone else think?

@Insti Insti changed the title Malformed data? canonical-data.json standardisation discussion (was: Malformed data?) Oct 1, 2016
@behrtam
Copy link
Contributor

behrtam commented Nov 3, 2016

I especially like the idea of adding the version and the function key. I'm currently working on adding test data versioning (which ruby and go already have) and test generation to the Python track, so it would be great if we could agree on a standard format.

@catb0t
Copy link
Contributor

catb0t commented Nov 21, 2016

The reason I stopped commenting despite the fact that I'm the one who re-kindled this thread is that these replies really disheartened me:

I think you can get a good start on it for most languages, but that idea doesn't take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). ... I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.

...I know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks. I think it is impossible, in the general case, to auto-magically generate the test suite...

Then what is the goal of this discussion about JSON format at all, if you're not interested in programmatically processing the JSON data to generate the unit tests?

Moreover, I don't see why language-specific differences matter here -- my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too), and since there are already exercise-specific test generators, why not save yourselves the work and write a generic one with better-designed data? (Yes, you should still read and comment the output of the generator for good measure.)

@zenspider
Copy link
Author

I'm sorry you found my comments disheartening. I just think that your notion: "to generate all the tests for all the exercises at once which should be trivially possible" ignores the fact that you're mechanically generating tests for consumption across a bunch of languages with widely different styles and semantics.

That is going to wind up with "least common denominator" tests. All I was suggesting is that mechanically generated tests will be a good rough draft, but that they should be worked on by humans so that they are good pedagogical examples for each language. To skip out on that is to kinda miss the point of exercism in the first place.

For example, I have found a world of difference in the quality of tests and their ability to help teach me the language and assist me in understanding in rust's tests. Some of them are night and day in difference, and the worst ones were the ones that did a bare minimum "least common denominator" approach.

@rbasso
Copy link
Contributor

rbasso commented Nov 22, 2016

I'm the author of one of the disheartening comments, @catb0t, so I think I owe some explanations.

First of all, I believe that it is good to standardize the structure of a JSON. I just disagree a little in the goals.

Then what is the goal of this discussion about JSON format at all, if you're not interested in programmatically processing the JSON data to generate the unit tests?

I believe that the JSON data has two complementary goals:

  • Communicate which cases should be considered when writing the test suite.
  • Make implementing/updating the exercises easier, automatically or manually.

I still disagree about oversimplifying the format to make it easy to automatically generate the tests. This may be extremely valuable in an online judge, because it needs to automatically generate identical tests for a bunch of languages, but it would probably make the exercises less interesting is some languages, as @zenspider already said.

Moreover, I don't see why language-specific differences matter here -- my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too)...

You are right, if everything is just strings!

But I'm not sure if people here like the idea of having all the exercises as stdin-stdout filters.

@devonestes
Copy link
Contributor

devonestes commented Nov 22, 2016

Ok, it seems to me like we've all sort of agreed (in our own ways) that this is a rather difficult problem to solve - so how about we try to make this into a couple smaller problems and tackle them individually? 😉

From what I see, we have two distinct goals we're trying to achieve here:

  1. Consistency in format allows for easier human readability of the files, which means an easier time understanding and maintaining them.

  2. It's possible that if things are consistent enough and we come up with a good enough abstraction, we could programmatically generate the beginnings of test files for some types of language tracks.

Both are indeed noble goals with clear value, and I totally think we should strive to achieve them both - just maybe not at the same time?

Since goal number 2 is clearly really hard, how about we try and get something that's at least solving goal number 1, and then once that's done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we're trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.

Excelsior

@rbasso
Copy link
Contributor

rbasso commented Feb 15, 2017

So many questions... Great! 😃

About incorporating metadata.yml

The purpose of the two files is different. One is used to to be able to talk about the exercise, the other is used to be able to produce an implementation.

I agree!

I would hesitate to conflate the two, but am open to discussing it if any of you have strong feelings about it.

I just imagined that it would be really convenient to pave the way to merge them in a single file describing the exercise, @kytrinyx, as that would allow us to later validate everything together in Travis-CI. I saw it as a low-hanging fruit, so I took it.

But this was just a secondary target. I'll remove it in the next version of the proposal.

The rationale for this proposal

Based on previous proposals, I tried to revive this issue in this message, discussing what would be the minimal structure to capture our current test.

After cluttering this issue with really long messages describing a lot of alternatives I tested, I think we have boiled it down almost to its essence, and so we have to talk about test groups and test types.

About the tests' types and test groups

I was also slightly confused by the "type" field, but that may be because I also missed the discussion.

At least implicitly, @ErikSchierboom , any test case has a type that identifies
a property being tested, most of the times the name of a test function.

To algorithmic-ally generate test suites, we need a way to unambiguously
attach a type to each test in a test suite.

I am confused by "type". Does it mean a class, a method, just a simple stand along function, is it just a variable?

The test's type is just a unique identifier - a string - attached to each test, identifying
what is being tested. It can be a function, a class, a property or anything else that
uniquely identifies the test's type in the test suite.

This should no be confused with the test data. The test's type is a generic reference
to something outside the canonical-data.json file, which cannot be encoded in a
language-neutral way: the test logic.

Each test type in a test suite is a reference to a specific way of turning test data into a test, @rpottsoh.

If the json file is intended to also be human readable why not keep it that way?

There are a lot of "that way"s in x-common, so I'll have to guess a little in this
answer.

We strive for three distinct goals here:

  • Make it human-readable.
  • Make it machine-readable.
  • Make it flexible enough to capture any reasonable test suite.

In most of the test suites with more than one type of test, the test's type is encoded
in a property-key, in an object describing a test group.

There are two problem with that approach:

  • It mixes two different concepts regarding the tests: grouping and identification
  • It doesn't allow nesting of test groups. That would be nice to have, but is not really needed.
  • It doesn't allow grouping of test of different types, which would be really great.

Moving the test type near the test data, we solved all the above problem easily. It is theoretically sound and adds functionality.

Test grouping is really important because it helps organizing the test suites and increases human-readability. We already use grouping in a lot of test suites.

The remaining problem was to decide on how to put the test type and the test data together. Two options where recently listed in this message, other ideas where presented in previous messages.

Why not instead of "type" could it be called "testof".

I guess we need something more descriptive than type...

I would like keep the Google JSON Style convention for naming properties, so here are a few options:

  • type
  • testOf
  • testType
  • property -- This is my favorite for now, just because if fells general enough. (edit)
  • function -- I'm strongly against it because it doesn't apply to some kinds of tests, isn't language-neutral and it is conceptually wrong, IMHO (edit).

Which one is better? Any other suggestion?

About a detailed test data standardization

The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.

After seeing more than two month of absolute silence in this issue, I decided to aim a little lower, so that we can hope to deliver something. I'm intentionally avoiding any discussion about inputs and outputs.

If we jump back to the discussion about how to encode the test data, I'm sure this schema will not be approved, as we are far from a consensus about it. Also, we could use some time to calmly think about it.

After - if - we approve a general schema for the test suite, I think we can open a new issue to discuss test data standardization. Meanwhile, I think it would be more productive to discuss what would be the best practices in test data encoding, and then we see if that can be standardized.

Do you agree with postponing that discussing, @ErikSchierboom?

@rbasso
Copy link
Contributor

rbasso commented Feb 15, 2017

The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.

Ops! I just noticed that there seems to be an agreement about a way to encode error messages after reading #551, so I don't see any problem in including the restrictions to the expected property mentioned there. Is this what you had in mind, @ErikSchierboom?

@ErikSchierboom
Copy link
Member

@rbasso It is! I think we have something like three options for the expected result of a test:

  1. A concrete value.
  2. A missing value (null/optional).
  3. An error.

Obviously, item 1 is trivial: just put the expected value in the JSON data. For 2, we could agree upon a standard value. I think null would be most suitable. As for three, we could do what #551 suggests and return a special error object.

For your type replacement suggestions, I really like the suggested property name, as I think it is most clear. 👍

@rbasso
Copy link
Contributor

rbasso commented Feb 16, 2017

Ok. I'll write a new version of the proposal so that:

  • No properties are taken from metadata.yml 😢 . Should we remove the exercise property also?
  • cases becomes mandatory again. (edit)
  • cases stays, instead of tests, unless anyone is against it.
  • type will be renamed to property, as @ErikSchierboom and I agree, and for lack of another name that is sufficiently general and descriptve.
  • If there is an expected in a test and it has an error, it is the only property and its type is string.

Anything else to remove/add/change?

@petertseng
Copy link
Member

petertseng commented Feb 16, 2017

three options for the expected result of a test:

  1. A concrete value.
  2. A missing value (null/optional).
  3. An error.

Here's an interesting question I have... do you intend that there will be some property for which expected takes on all three of these types of values? Based on the answer...


If the answer is "yes", you don't necessarily have to give an example of a property, though it can be helpful to try to think of one. I then ask:

How might languages faithfully represent the tests? Sorry but I'm going to pick on a specific language: Would Haskell for example have to use an Either a (Maybe b) or Maybe (Either a b)? What does it mean to have both Either and Maybe?

Or would we have a sum type with three variants?


If the answer is "no", I assume that means every property will only exhibit expected values of either 1+2, or 1+3 (we may also consider a property that should only exhibit 2+3, but let's just talk about 1+2 vs 1+3 for now).

I interpret either combination as "the property may fail to produce a value". I then ask:

Given that these combinations both mean that, how might I choose which one of 1+2 vs 1+3 to use in a given json file? (For example, is it "use missing values when the computation is valid but there might be no answer for the input given, and use errors when the input might be malformed"?)

Edit: ... come to think of it this question is important for the "yes" case as well.


I will, of course, use the answer to the question to guide what happens at #551 (comment)

@petertseng
Copy link
Member

petertseng commented Feb 16, 2017

If the answer is "yes", you don't necessarily have to give an example of a property,

Well, although I do have one that might fit.

Let's consider change: Given some coins and a target, we want to find the way to make that target with the fewest coins.

We could certainly have cases where the input is perfectly valid, but there is no way to reach the target. So let's say that that's represented as null in JSON.

And then we could have cases where the input is obviously invalid, such as a negative target*. So maybe you would say we should call this an error case, with some appropriate representation in JSON (I don't care what, but I used #401 because why not).

Is this an example of what we had in mind? Given just these three cases, is it understandable how to have the tests in a target language:

{ "exercise" : "change"
, "version"  : "0.0.0"
, "comments":
    [ "showing all three possible types of values for `expected`"
    ]
, "cases":
    [ { "description": "Make change"
      , "comments":
          [ "All in one group in this example, but can be split apart later"
          ]
      , "cases":
          [ { "description": "Can make change"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : 3
            , "expected"   : [1, 1, 1]
            }
          , { "description": "Can't make change"
            , "type"       : "change"
            , "coins"      : [2]
            , "target"     : 3
            , "expected"   : null
            }
          , { "description": "Negative targets are invalid"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : -1
            , "expected"   : {"error": "negative target is invalid"}
            }
          ]
      }
    ]
}

Or, have I missed the mark completely with when to use null versus an error? In which case please correct me.

* = Let's leave aside a ternary coin system where you might have "negative coins", representing the party for whom change is being made giving the coins to the party making the change, rather than the other way around... because in this situation (as well as any others with negative coins), you can reach negative targets.

@ErikSchierboom
Copy link
Member

@petertseng Your example is spot on, it is exactly how I expect it to be. And to me, it looks very elegant and is very easy to understand. I like it a lot!

I noticed you still use "type", do you prefer that over "property"?

@petertseng
Copy link
Member

Your example is spot on, it is exactly how I expect it to be.

Good to know my rubber-ducking got me onto the right track eventually!

I noticed you still use "type", do you prefer that over "property"?

Oh! No preference =D I just used it in the examples because I used the existing examples with type and didn't want to re-space to property

Property actually seems preferable, since type might get confused with the concept of a type in a type system.

(Okay, fine, slight danger that property will also get confused with properties in OO languages that have them, but it seems to work better than type)

@ErikSchierboom
Copy link
Member

Good to hear! I'm really pleased with the result.

@zenspider
Copy link
Author

Despite being the OP, I've pretty much checked out of this discussion... Maybe I've missed it as this commentary is nine miles long... I've only skimmed from the bottom, but I still see nothing suggested that provides either a uniform enough interface or enough metadata such that a program can generate tests for any/every exercise.

With this requirement going unaddressed, there's really no point IMHO. My entire goal was to make it easier for a new language to bootstrap up to a publishable state (as I was working on racket at the time, and have stopped because of this exact problem).

If each and every problem needs to be hand written, then the JSON is just a suggestion and isn't actually all that valuable. I want a new language to have to write a simple generator and get 50 problems spat out that only really need stylistic changes before publishing. It's up to the generator author to make it generate idiomatic code that uses the test framework / language well... but there needs to be enough information to actually generate that and I don't see that addressed.

Looking at @petertseng's example from yesterday... I have to write a custom generator. It's unique enough that I might as write the tests by hand.

@rbasso
Copy link
Contributor

rbasso commented Feb 17, 2017

Despite being the OP, I've pretty much checked out of this discussion... Maybe I've missed it as this commentary is nine miles long...

True. It is too long. Perhaps I should have opened a new issue with a more detailed title.

I've only skimmed from the bottom, but I still see nothing suggested that provides either a uniform enough interface or enough metadata such that a program can generate tests for any/every exercise.

You are right.

A while ago, in November, we failed this objective for lack of consensus. It was not clear if it was possible and/or desirable to follow the path of completely automated test generation.

Nine days ago, I started to discuss a less ambitious goal: standardizing as much as possible without sacrificing readability and/or flexibility in a significant way. It appear that we have now enough support to follow this path.

With this requirement going unaddressed, there's really no point IMHO.
My entire goal was to make it easier for a new language to bootstrap up to a publishable state (as I was working on racket at the time, and have stopped because of this exact problem).

We still want that, and the proposal that we have makes it easier to generate tests but, as you said, it doesn't allow fully automatic test generation.

If each and every problem needs to be hand written, then the JSON is just a suggestion and isn't actually all that valuable.

Yes, the JSON is a suggestion. Tracks can diverge, but try not to.

Test suites need to be partially hand-written, not completely hand-written.

I want a new language to have to write a simple generator and get 50 problems spat out that only really need stylistic changes before publishing. It's up to the generator author to make it generate idiomatic code that uses the test framework / language well... but there needs to be enough information to actually generate that and I don't see that addressed.

I understand that this is not what you wanted, but it can at least reduce the unneeded diversity in structure that we now have in x-common.

By lack of consensus, we are not following that path of encoding the test logic in the canonical-data.json.

The current objective is to write a schema that captures the structure of the test suite. We are avoiding discussing the data representation of the input data and the test logic (how to turn the test data in a test).

Looking at @petertseng's example from yesterday... I have to write a custom generator. It's unique enough that I might as write the tests by hand.

A small piece of code needs to be written for each exercise. Most of if can be factored out, as I exemplified with code somewhere back in this issue.

Think of the current proposal as a partial solution to what you wanted, @zenspider. If in the future people agree that we should follow the path you pointed, we can easily patch the schema to enforce the input data structure. Everything else will be reused!

What I would really like to avoid is finishing this discussing - as in November - without moving a single step further. So I ask...

Can we compromise with a partial standardization?

The way I see the situation now, we gonna get this or nothing...

@rbasso
Copy link
Contributor

rbasso commented Feb 17, 2017

JSON Schema for 'canonical-data.json' files

Changes:

  • cases is mandatory again.
  • type renamed to property.
  • No properties taken from metadata.yml anymore.
  • Add restriction: If there is an expected object and it has an error property, it must be the only property and also be of type string

The schema may need some minor adjustments - cause I probably made a few mistakes - but I don't see how to improve it more without loosing flexiblity and/or readability.

I think we finally got something good enough here, people! 👍

Edit: I also wrote a test suite as a proof-of-concept in Haskell. It is not beautiful code, but it showcases that we in fact don't need much exercise-specific code to parse and create the tests.

@ErikSchierboom
Copy link
Member

I completely agree with what @rbasso is saying. Yes, this is "only" a partial solution to a much harder problem, but I feel the partial solution by itself has more than enough merit to warrant going through with it.

With the current suggested format, test generator are definitely possible, although not without some exercise-specific code. I think this is fine, but as said, we can always look to improve this format later to better suit test generation even more.

So all in all, I really like what we have now and I think we should go with that, at least for now.

@rbasso
Copy link
Contributor

rbasso commented Feb 18, 2017

If this proposal gets accepted, where should we put the canonical-data-schema.json?

@ErikSchierboom
Copy link
Member

Great question. I don't really have a good answer. Perhaps in the root of the exercises directory? Or else in a separate root schema directory?

petertseng added a commit that referenced this issue Sep 24, 2018
change 1.3.0

As proposed in
#1313
#905 (comment)

In contrast to the proposal in
#336 (comment)

Although -1 is a sentinel value, the sentinel value had been overloaded
in this JSON file to mean two separate conditions:

* "Can't make target" (ostensibly, we might be able to make the target
  with a different set of coins)
* "Target is negative" (no matter what coins we provide, this target is
  always invalid)

To make clear that these two conditions are different, we use an error
object describing each.

This error object was defined in
#401

Note that this commit is not a decree that all languages must represent
these conditions as errors; languages should continue to represent the
conditions in the usual way for that language. It is simply a
declaration that these two conditions bear enough consideration that
we'll represent them with a different type and also clearly
differentiate between the two.

Closes #1313
Checks the related box in #1311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests