-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
canonical-data.json standardisation discussion (was: Malformed data?) #336
Comments
Looks like there are a lot of different structures involved. Please provide hints as to the correct syntax so I can parse this stuff. |
Yeah, this sort of happened a bit at a time, and we weren't sure what the various needs of this data were going to be. We now have enough data to decide on a file format, but I don't think anyone has gone through and figured out what the syntax should be yet. |
@zenspider sounds like you're writing a parser, perhaps you can look through the existing data and tell us what structure we should be using to make parsing convenient. |
@devonestes This is the issue we were talking about on twitter. |
I'm just gonna collect my thoughts from #376 here, because I think this needs fleshing out. I believe we can simultaneously make the JSON easier for humans and programs to read, but the way it is now makes it very hard to make a generalising program. @petertseng linked to examples of code in various tracks using My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible. I don't want a different As it is right now, I could theoretically write code to map x-common's JSON keys to my own internal structure, but this requires a duplication across programs that read this data. Also, it's not scalable, and as such it would be genuinely beneficial to everyone to standardise the keys and their meanings. I am personally willing to manually rewrite all the JSON in this repository to fit a predictable format, but I won't until we have a consensus. |
I'd fully support a more generic structure which would make it unnecessary to have a generator for each exercise. But I have to admit, I have no idea how it could look like. Since you already said you would change them, do you have an idea about the structure already @catb0t? Also since it seems to be the right time, I want to request a feature for this generic format: I had a sleepness night, of how I should handle changes in the canonical data as I wanted to have some versioning test. First I thought I'd could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get "invalidated". Therefore I think it would be a good idea to version the canonical data as well. |
@petertseng makes a good point that I don't have a firm idea of what keys would fix Peter's point, which is a reason I haven't started rewriting it all myself yet. Using descriptive English names makes it hard to access them programmatically, but using numbered keys makes it hard for people (not me, but other maintainers) to read. What strikes a balance? This might be a little bit wild, so bear with me: what if we add a top-level key "cases": { "cases data..." }
"metadata": {
"input_keys": [ "input_key1", "input_key2", "input_key3" ],
"output_keys": [ "output_keyN" ]
} That moves the mapping of human-readable keys from each track's generation code to the JSON itself. Then autogeneration code can read
We could perhaps end up with: "#": "..."
"cases": { "cases data..." }
"metadata": { "..." }
"version": {
"version_hash": "shasum of minified version of this file",
"version_time": "seconds since 1 Jan 1970 here"
} And you can read the |
I do not understand the {
"exercise": "repeat",
"examples": [
{
"function": "repeat",
"description": "tests valid stuff",
"input_count": 5,
"input_string": "foo",
"expected": "foofoofoofoofoo"
},
{
"function": "repeat",
"description": "tests failure",
"input_count": -5,
"input_string": "foo",
"expected": { "error": "no negatives allowed" }
}
]
} Perhaps we can use this as a base, or throw it away instantly? |
and what ensures the order of the args? There's no metadata in place to declare argument names. |
I don't. I think you can get a good start on it for most languages, but that idea doesn't take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). Nor is it realistic about the level of finality. I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from. |
I do not see any sense in specifying order of arguments in the canonical Let's assume we have some data type and we write functions around it. Let's So as you can see order of arguments has to be specifies by the tracks Ryan Davis [email protected] schrieb am Mi., 21. Sep. 2016 23:47:
|
Maybe I'm a little late and out of topic, but I'll try anyway... About automatically generated test suitesI know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks. I think it is impossible, in the general case, to auto-magically generate the test suite, unless we collapse all the types into the ones representable in JSON. I know that, at least in Haskell, that would be bad and wrong! 😄 That said, it is certainly possible to have a generator to automatically update a specific exercise, if the JSON structure is not changed. Is it worthy? That depends on how frequently the data and the structure are updated, but mostly on how fun is the process of writing and maintaining it. So I think it is not unreasonable. 👍 Alternatively - if the desire is really to have auto-magic test suites - it would be more compatible if the exercises where specified as stdin-stdout mappings. That would be similar to how online judge systems work, but I don't think it is exercism's destiny to follow that path. About readability for humans and softwareConsidering that it is generally impossible to automatically generate test suites, I think it doesn't make sense to sacrifice human-readability too much, forging a JSON that is convenient for software but inconvenient for humans. That doesn't mean we shouldn't standardize the files. We should, but remembering that the files are meant to be read first by humans, and then by software.
|
What about something like this: {
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"name": "encode",
"description": "Encodes plaintext",
"cases": [
{
"description": "Encodes simple text",
"plaintext": "Secret message",
"key": "asdf1234",
"expected": "qwertygh"
},
{
"description": "Encodes empty string",
"plaintext": "",
"key": "test1234",
"expected": ""
}
]
},
{
"name": "decode",
"description": "Decodes plaintext",
"cases": [
{
"description": "Decodes simple text",
"ciphertext": "qwertygh",
"key": "asdf1234",
"expected": "Secret message"
},
{
"description": "Decodes empty string",
"ciphertext": "",
"key": "test1234",
"expected": ""
}
]
}
The descriptions could be mandatory or optional. I would be possible to use multilevel grouping of tests, but I don't think that is used frequently. Keeping the @zenspider and @catb0t, would it be too difficult to separate |
I've been thinking about this a bit recently, and I think the most generalized version of this we can get might be the best for as many different needs as possible. What we're really doing in most of these exercises is basically testing functions. There's input, and there's output. By trying to use keys in our JSON objects that are things like "plaintext" and "key", that's creating a need for knowledge about the exercise to accurately understand how those parts interact. I think if we can generalize on that concept of a function that we're testing, that might be helpful both for human readability, and also for machine readability so we can possibly use this data for automatic tests. So, here's my example: {
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"description": "encodes simple text",
"function": "encode",
"input": ["Secret message", "asdf1234"],
"output": "qwertygh"
},
{
"description": "encodes empty string",
"function": "encode",
"input": ["", "test1234"],
"output": ""
},
{
"description": "decodes simple string",
"function": "decode",
"input": ["qwertygh", "asdf1234"],
"output": "Secret message"
}
]
} I don't think there are any exercises that require anything other than input and output, but I haven't done too deep of an analysis on that. I'd love any feedback if there are edge cases that would need to be taken care of here. I know that based on the structure above I can think of reasonable ways to parse that and automatically create some skeletons for tests in Ruby, Elixir, Go, JavaScript and Python, but that's really all I can reasonably speak to since those are the only languages I have a decent amount of experience with. Also, I sort of like the stripped down way of looking at this - when I look at that data I don't need to know the context of the exercise to know what's going on. I just know there's a thing called I'm not really 100% sure that this would give us everything we want, but I wanted to at least throw this idea out there to get feedback and see if it might be a starting point for an actually good idea! |
I think that the general case would be to test assertions... "name": "reversibility",
"description": "Decoding a text encoded with the same key should give the original plaintext",
"cases": [
{
"description": "Only letters",
"plaintext": "ThisIsASecretMessage",
"key": "test1234",
}, ... that can be general - like properties, in QuickCheck - or specific, like our common tests. But I agree that most - if not all - tests are in the form:
This is probably where I disagree... Maybe we don't need to know the context, but sometimes we want to. The ability to group tests is so pervasive that I cannot find a single test framework in Haskell that doesn't allow it:
Exactly! Substituting the keys by a list of arguments, the only thing we know is that there is something that takes inputs and gives an output. We don't know the meaning of those things anymore! I understand that your proposal makes automatic generation of tests easier while keeping reasonable readability, @devonestes, but that still comes at a price! The real questionSeems to me that the question that we have to answer is:
|
@rbasso I see your points, and I actually think we can get a little more of the benefit that you mention. How about something like this: {
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"description": "encodes simple text",
"function": "encode",
"input": {
"plaintext": "Secret message",
"key": "asdf1234"
},
"output": "qwertygh"
}
]
} For the interest of programmatically generating tests, we know what our inputs are (and we can easily ignore the human-specific context in the keys in that object and just look at the values), but for the purpose of assigning some meaning to this data, we can give some context-specific information by adding those keys to the I think with the above structure we still don't need to understand the context to figure out what's going on, but if we want context it's there for us. I actually think this is a much better version than the original one! I guess if I were to generalize the structure of a {
"description": "description of what is being tested in this test",
"function": "name of function (or method) being tested",
"input": {
"description of input": "actual input (can be string, int, bool, hash/map, array/list, whatevs)"
},
"output": "output of function being tested with above inputs"
} So, I actually kind of like that. What does everyone else think? |
I especially like the idea of adding the |
The reason I stopped commenting despite the fact that I'm the one who re-kindled this thread is that these replies really disheartened me:
Then what is the goal of this discussion about JSON format at all, if you're not interested in programmatically processing the JSON data to generate the unit tests? Moreover, I don't see why language-specific differences matter here -- my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too), and since there are already exercise-specific test generators, why not save yourselves the work and write a generic one with better-designed data? (Yes, you should still read and comment the output of the generator for good measure.) |
I'm sorry you found my comments disheartening. I just think that your notion: "to generate all the tests for all the exercises at once which should be trivially possible" ignores the fact that you're mechanically generating tests for consumption across a bunch of languages with widely different styles and semantics. That is going to wind up with "least common denominator" tests. All I was suggesting is that mechanically generated tests will be a good rough draft, but that they should be worked on by humans so that they are good pedagogical examples for each language. To skip out on that is to kinda miss the point of exercism in the first place. For example, I have found a world of difference in the quality of tests and their ability to help teach me the language and assist me in understanding in rust's tests. Some of them are night and day in difference, and the worst ones were the ones that did a bare minimum "least common denominator" approach. |
I'm the author of one of the disheartening comments, @catb0t, so I think I owe some explanations. First of all, I believe that it is good to standardize the structure of a JSON. I just disagree a little in the goals.
I believe that the JSON data has two complementary goals:
I still disagree about oversimplifying the format to make it easy to automatically generate the tests. This may be extremely valuable in an online judge, because it needs to automatically generate identical tests for a bunch of languages, but it would probably make the exercises less interesting is some languages, as @zenspider already said.
You are right, if everything is just strings! But I'm not sure if people here like the idea of having all the exercises as stdin-stdout filters. |
Ok, it seems to me like we've all sort of agreed (in our own ways) that this is a rather difficult problem to solve - so how about we try to make this into a couple smaller problems and tackle them individually? 😉 From what I see, we have two distinct goals we're trying to achieve here:
Both are indeed noble goals with clear value, and I totally think we should strive to achieve them both - just maybe not at the same time? Since goal number 2 is clearly really hard, how about we try and get something that's at least solving goal number 1, and then once that's done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we're trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here. |
So many questions... Great! 😃 About incorporating
|
Ops! I just noticed that there seems to be an agreement about a way to encode error messages after reading #551, so I don't see any problem in including the restrictions to the |
@rbasso It is! I think we have something like three options for the expected result of a test:
Obviously, item 1 is trivial: just put the expected value in the JSON data. For 2, we could agree upon a standard value. I think For your |
Ok. I'll write a new version of the proposal so that:
Anything else to remove/add/change? |
Here's an interesting question I have... do you intend that there will be some If the answer is "yes", you don't necessarily have to give an example of a property, though it can be helpful to try to think of one. I then ask: How might languages faithfully represent the tests? Sorry but I'm going to pick on a specific language: Would Haskell for example have to use an Or would we have a sum type with three variants? If the answer is "no", I assume that means every I interpret either combination as "the Given that these combinations both mean that, how might I choose which one of 1+2 vs 1+3 to use in a given Edit: ... come to think of it this question is important for the "yes" case as well. I will, of course, use the answer to the question to guide what happens at #551 (comment) |
Well, although I do have one that might fit. Let's consider We could certainly have cases where the input is perfectly valid, but there is no way to reach the And then we could have cases where the input is obviously invalid, such as a negative target*. So maybe you would say we should call this an error case, with some appropriate representation in JSON (I don't care what, but I used #401 because why not). Is this an example of what we had in mind? Given just these three cases, is it understandable how to have the tests in a target language: { "exercise" : "change"
, "version" : "0.0.0"
, "comments":
[ "showing all three possible types of values for `expected`"
]
, "cases":
[ { "description": "Make change"
, "comments":
[ "All in one group in this example, but can be split apart later"
]
, "cases":
[ { "description": "Can make change"
, "type" : "change"
, "coins" : [1]
, "target" : 3
, "expected" : [1, 1, 1]
}
, { "description": "Can't make change"
, "type" : "change"
, "coins" : [2]
, "target" : 3
, "expected" : null
}
, { "description": "Negative targets are invalid"
, "type" : "change"
, "coins" : [1]
, "target" : -1
, "expected" : {"error": "negative target is invalid"}
}
]
}
]
} Or, have I missed the mark completely with when to use
|
@petertseng Your example is spot on, it is exactly how I expect it to be. And to me, it looks very elegant and is very easy to understand. I like it a lot! I noticed you still use "type", do you prefer that over "property"? |
Good to know my rubber-ducking got me onto the right track eventually!
Oh! No preference =D I just used it in the examples because I used the existing examples with Property actually seems preferable, since (Okay, fine, slight danger that |
Good to hear! I'm really pleased with the result. |
Despite being the OP, I've pretty much checked out of this discussion... Maybe I've missed it as this commentary is nine miles long... I've only skimmed from the bottom, but I still see nothing suggested that provides either a uniform enough interface or enough metadata such that a program can generate tests for any/every exercise. With this requirement going unaddressed, there's really no point IMHO. My entire goal was to make it easier for a new language to bootstrap up to a publishable state (as I was working on racket at the time, and have stopped because of this exact problem). If each and every problem needs to be hand written, then the JSON is just a suggestion and isn't actually all that valuable. I want a new language to have to write a simple generator and get 50 problems spat out that only really need stylistic changes before publishing. It's up to the generator author to make it generate idiomatic code that uses the test framework / language well... but there needs to be enough information to actually generate that and I don't see that addressed. Looking at @petertseng's example from yesterday... I have to write a custom generator. It's unique enough that I might as write the tests by hand. |
True. It is too long. Perhaps I should have opened a new issue with a more detailed title.
You are right. A while ago, in November, we failed this objective for lack of consensus. It was not clear if it was possible and/or desirable to follow the path of completely automated test generation. Nine days ago, I started to discuss a less ambitious goal: standardizing as much as possible without sacrificing readability and/or flexibility in a significant way. It appear that we have now enough support to follow this path.
We still want that, and the proposal that we have makes it easier to generate tests but, as you said, it doesn't allow fully automatic test generation.
Yes, the JSON is a suggestion. Tracks can diverge, but try not to. Test suites need to be partially hand-written, not completely hand-written.
I understand that this is not what you wanted, but it can at least reduce the unneeded diversity in structure that we now have in By lack of consensus, we are not following that path of encoding the test logic in the The current objective is to write a schema that captures the structure of the test suite. We are avoiding discussing the data representation of the input data and the test logic (how to turn the test data in a test).
A small piece of code needs to be written for each exercise. Most of if can be factored out, as I exemplified with code somewhere back in this issue. Think of the current proposal as a partial solution to what you wanted, @zenspider. If in the future people agree that we should follow the path you pointed, we can easily patch the schema to enforce the input data structure. Everything else will be reused! What I would really like to avoid is finishing this discussing - as in November - without moving a single step further. So I ask... Can we compromise with a partial standardization? The way I see the situation now, we gonna get this or nothing... |
JSON Schema for 'canonical-data.json' files Changes:
The schema may need some minor adjustments - cause I probably made a few mistakes - but I don't see how to improve it more without loosing flexiblity and/or readability. I think we finally got something good enough here, people! 👍 Edit: I also wrote a test suite as a proof-of-concept in Haskell. It is not beautiful code, but it showcases that we in fact don't need much exercise-specific code to parse and create the tests. |
I completely agree with what @rbasso is saying. Yes, this is "only" a partial solution to a much harder problem, but I feel the partial solution by itself has more than enough merit to warrant going through with it. With the current suggested format, test generator are definitely possible, although not without some exercise-specific code. I think this is fine, but as said, we can always look to improve this format later to better suit test generation even more. So all in all, I really like what we have now and I think we should go with that, at least for now. |
If this proposal gets accepted, where should we put the |
Great question. I don't really have a good answer. Perhaps in the root of the |
change 1.3.0 As proposed in #1313 #905 (comment) In contrast to the proposal in #336 (comment) Although -1 is a sentinel value, the sentinel value had been overloaded in this JSON file to mean two separate conditions: * "Can't make target" (ostensibly, we might be able to make the target with a different set of coins) * "Target is negative" (no matter what coins we provide, this target is always invalid) To make clear that these two conditions are different, we use an error object describing each. This error object was defined in #401 Note that this commit is not a decree that all languages must represent these conditions as errors; languages should continue to represent the conditions in the usual way for that language. It is simply a declaration that these two conditions bear enough consideration that we'll represent them with a different type and also clearly differentiate between the two. Closes #1313 Checks the related box in #1311
It appears that all-your-base.json is malformed. Where allergies.json has the structure of:
all-your-base.json has:
cases
should be wrapped in a function name, yes?It appears that bin/jsonlint only checks that the json parses, not that it has good structure.
At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. 😀
The text was updated successfully, but these errors were encountered: