Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add canonical data "identifier" field to separate concern from test "description" #1473

Closed
bencoman opened this issue Mar 1, 2019 · 36 comments

Comments

@bencoman
Copy link
Contributor

bencoman commented Mar 1, 2019

[Edit:] Some languages don't individually identify tests and use the canonical data "description" field to generate comments into their tests code. Other languages individually identify tests, but all that is available for generating a test-name identifiers is the "description" field. This results in some extremely long test-method-names. e.g. For robot-simulator Pharo has...
test18_WhereRTurnRightLTurnLeftAndAAdvanceTheRobotCanFollowASeriesOfInstructionsAndEndUpWithTheCorrectPositionAndDirectionInstructionsToMoveEastAndNorth

Can an optional identifier field be introduced for cases where the description doesn't make a good identifier?

For example, for the Bowling exercise...

{
    "description": "Bonus rolls for a strike in the last frame must be rolled before score can be calculated",
    "property": "score",
    "input": {
      "previousRolls": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
}

generating tests for different tracks produces the following...

CSharp:
public void Bonus_rolls_for_a_strike_in_the_last_frame_must_be_rolled_before_score_can_be_calculated()
https://github.com/exercism/csharp/blob/master/exercises/bowling/BowlingTest.cs#L250

Delphi:
procedure BowlingTests.Bonus_rolls_for_a_strike_in_the_last_frame_must_be_rolled_before_score_can_be_calculated;
https://github.com/exercism/delphi/blob/master/exercises/bowling/uBowlingTests.pas#L324

FSharp:
let ``Both bonus rolls for a strike in the last frame must be rolled before score can be calculated`` () =
https://github.com/exercism/fsharp/blob/master/exercises/bowling/BowlingTest.fs#L177

Adding a optional identifier like below would be useful.

{
    "identifier": "roll_bonus_balls_before_scoring"
    "description": "Bonus rolls for a strike in the last frame must be rolled before score can be calculated",
    "property": "score",
    "input": {
      "previousRolls": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
}
@bencoman
Copy link
Contributor Author

bencoman commented Mar 1, 2019

Further to this, it seems that description ends up trying to serve two purposes:

  • method comment
  • method name

and ends up serving neither well. It constrains the length of method comments and encourages too-long method names.

@bencoman
Copy link
Contributor Author

bencoman commented Mar 1, 2019

Reading more about the schema... https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json

and also that "Comments ... can be used almost anywhere" https://github.com/exercism/problem-specifications/blob/master/README.md#test-data-format-canonical-datajson

to fit current schema perhaps it would be reasonable for me to shorten the description like this...

{
    "description": "roll bonus balls before scoring"
    "comments": 
           [ "Bonus balls for a strike in the last frame must be ",
           , "rolled before score can be calculated" ],
    "property": "score",
    "input": {
      "previousRolls": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
}

[EDIT:] In practice this seems to be turning out to be a bad idea. I now expect there will be a lot of push back trying to slim down the descriptions, and conversation will just bog down going in circles trying to reconcile dual purposes (description and identifier) from the one field. This PR for example... #1525

@rpottsoh
Copy link
Member

rpottsoh commented Mar 1, 2019

Using your example as an example one might consider it easier to just adjust the ordering in the sentence, to simplify it and thus shorten it:

"description": "Score calculated after bonus rolled strike in last frame"

This new description retains the meaning of the original description. There is no guarantee that the comments will translate into a track's test suite. The comments are there for the maintainers benefit in building the test suite for their track.

Instead of reworking the schema for the canonical data I think the first step ought to be seeing if it is possible to refactor the "offensive" descriptions to be less offensive, to be more concise but without making the language too sophisticated. For many people English is not their first language, so one has to be careful to not make the language too efficient so as to be difficult for a non-native English speakers to understand easily.

Don't take what I have said as meaning I am against changing the schema, quite the opposite. I just don't think changing the schema is necessarily the solution we need to rush too.

@ErikSchierboom
Copy link
Member

Instead of reworking the schema for the canonical data I think the first step ought to be seeing if it is possible to refactor the "offensive" descriptions to be less offensive, to be more concise but without making the language too sophisticated.

This would also be my preference. The example you presented (bowling), is one of the test suites with the longest descriptions. So maybe if we can shorten those, the problem goes away?

@macta
Copy link
Contributor

macta commented Mar 7, 2019

We can certainly see if we can improve the descriptions to generate better. Also worth noting that some test cases have nested descriptions - e.g. and often these repeat what the parent description say vs. refining it. Triangle is an example of this and it should be corrected (along with checking all the other sub-cased examples).

(the geek in me did think it would be fun to take a munged description and hit some web service to summarise it - but that's where the angel on my other shoulder shouts....stoppppp)

@bencoman bencoman changed the title Introduce problem specification test name field Introduce field for method name of each test May 3, 2019
bencoman added a commit that referenced this issue Jun 5, 2019
Convert long descriptions into comments per...
per #1473
bencoman added a commit that referenced this issue Jun 5, 2019
Such a long test is really a comment, not a description to be used as a test identifier.
Make it so, per #1473
bencoman added a commit that referenced this issue Jun 5, 2019
Such a long test is really a comment, not a description to be used as a test identifier.
Make it so, per #1473
@bencoman bencoman changed the title Introduce field for method name of each test Add "identifier" field to simplify test method name generation Jun 6, 2019
rpottsoh pushed a commit that referenced this issue Jun 7, 2019
Such a long test is really a comment, not a description to be used as a test identifier.
Make it so, per #1473
@bencoman
Copy link
Contributor Author

I feel like modifying the descriptions to serve the purpose of being an identifier is like forcing a square peg into a round hole by cutting off the square corners.

For many people English is not their first language, so one has to be careful to not make the language too efficient so as to be difficult for a non-native English speakers to understand easily.

What is "too efficient" is highly judgmental with a high-bike-shedding-factor. I fear slimming down descriptions trying to make them more identifier-like will burn everyone's time better spent on other things.

@bencoman bencoman changed the title Add "identifier" field to simplify test method name generation Proposal: Add canonical data "identifier" field to separate concern from test "description" Jun 10, 2019
@SleeplessByte
Copy link
Member

What is "too efficient" is highly judgmental with a high-bike-shedding-factor. I fear slimming down descriptions trying to make them more identifier-like will burn everyone's time better spent on other things.

I think this is a fair assessment. Perhaps we can take this new field under consideration? cc @ErikSchierboom, #1496

@ErikSchierboom
Copy link
Member

ErikSchierboom commented Jun 11, 2019

Somehow I'm a bit hesitant to include this new field, even though I understand the reasoning behind it. My main argument against it is that the current setup actually works fine for virtually all exercises/tracks. Why then change it? As far as I understand, you want to include the description in the code, which I think is something that tracks normally don't do (it is not wrong of course, just unusual).

Let's gather some more opinions to see if I'm the odd man out here. @iHiD @kytrinyx @rpottsoh @petertseng what do you feel about the proposal to add a new "identifier" property that contains the name of the tested method, instead of using the "description" property?

@coriolinus
Copy link
Member

FWIW I'm inclined to agree with @ErikSchierboom: the description field works well enough. Adding an identifier field is not necessary.

Context: the Rust generator does pretty well with a very simple function to convert descriptions to identifiers:

https://github.com/exercism/rust/blob/328e846f193207de9503fdda40ce0324bbc8f6ea/util/exercise/src/lib.rs#L148-L155


The advantage of adding an identifier field which would overwrite such description-based identifier munging isn't obvious to me. Are there languages for which humans must type out the full identifier of a test case to run? If not, who cares how long they are?

The disadvantage is that if we add the new field, it imposes a maintenance burden. All tracks' test generators must handle the field, or be in (harmless-for-now) noncompliance with the canonical data spec. Everyone writing a PR updating the canonical data must decide whether or not to include the optional field. All the reviewers have to decide whether or not it's worth dogshedding it in this case. In my opinion, the cost of this maintenance burden outweighs the benefit of sometimes having cleaner identifiers.

@SleeplessByte
Copy link
Member

@bencoman 's main concern I think is that the cascading descriptions as is are huuuuuge. Their language (Pharo (?)) I think is a visual one, so these huuuuge names show up all over and are not actually very "workable".

I think he's trying to minimise the bikeshedding by having both: a name and a description.

@ErikSchierboom
Copy link
Member

I think he's trying to minimise the bikeshedding by having both: a name and a description.

And my argument is that instead of doing all this bikeshedding, why not just use shorter descriptions?

@rpottsoh
Copy link
Member

And my argument is that instead of doing all this bikeshedding, why not just use shorter descriptions?

Bingo. My #1473 (comment) from the beginning of March.

@rpottsoh
Copy link
Member

For those of us unfamiliar with Pharo (me) perhaps a screen shot or something that illustrates how/why long descriptions work in a negative way in that language / development environment. If that has been provided somewhere else then please supply a link.

It is easier to say no to any changes than it is to say yes when it isn't easy to appreciate what the issue is being faced.

@bencoman
Copy link
Contributor Author

Here are a couple of generic Pharo workflow videos for context...

and here is my Exercism specific example...
ExercismPharoLongMethodNamesDemo

@bencoman
Copy link
Contributor Author

bencoman commented Jun 11, 2019

Context: the Rust generator does pretty well with a very simple function to convert descriptions to identifiers

The first sample I checked... "description": "instructions to move east and north"
that rust function produces "instructions_to_move_east_and_north"
which differs from "follow_instructions_to_move_east_and_north" of the matching rust exercise...
So I'm curious how the word "follow" was introduced by the rust generator.

Also that function doesn't help generate sufficiently short strings to make a good identifier. For example...

pub fn format_exercise_description(description: &str) -> String {
    description
        .chars()
        .filter(|c| c.is_alphanumeric() || *c == ' ')
        .collect::<String>()
        .replace(" ", "_")
        .to_lowercase()
}

fn main() {
    let description = "Where R = Turn Right, L = Turn Left and A = Advance, the robot can follow a series of instructions and end up with the correct position and direction";
    println!("{}", format_exercise_description(description));

    let description = "A dart whose coordinates sum to > 1 but whose radius to origin is <= 1 is scored in the inner circle";
    println!("{}", format_exercise_description(description));

    let description = "instructions to move east and north";
    println!("{}", format_exercise_description(description));
}

produces...

where_r__turn_right_l__turn_left_and_a__advance_the_robot_can_follow_a_series_of_instructions_and_end_up_with_the_correc
t_position_and_direction

a_dart_whose_coordinates_sum_to__1_but_whose_radius_to_origin_is__1_is_scored_in_the_inner_circle

instructions_to_move_east_and_north

The advantage of adding an identifier field which would overwrite such description-based identifier munging isn't obvious to me. Are there languages for which humans must type out the full identifier of a test case to run? If not, who cares how long they are?

That is off point. The full identifier doesn't need to be typed in Pharo where individual tests are run with a single click. The issue is super long method names being awkward in the navigation pane.

@ErikSchierboom
Copy link
Member

The issue is super long method names being awkward in the navigation pane.

Hence my @rpottsoh' suggestion to just shorten the descriptions (which really are excessively long). Once again, why should we add a new field that would require maintenance work for many people, for a problem that appears to only really be problematic in one language track and for which there is a relatively easy fix (shorten the descriptions)?

p.s. in the time we spent discussing this new field we could have done enough bikeshedding to come up with suitably short descriptions I feel...

@bencoman
Copy link
Contributor Author

As far as I understand, you want to include the description in the code,

Including existing descriptions in generated code comments is a secondary concern.
My primary concern is reducing the length of identifiers used for naming test methods.

which I think is something that tracks normally don't do (it is not wrong of course, just unusual).

At least @exercism/scala seems to concatenate nested descriptions into comments.
and @exercism/php nicely nests descriptions as comments
and may be impacted by shortening the description field.

@bencoman
Copy link
Contributor Author

Hence my @rpottsoh' suggestion to just shorten the descriptions (which really are excessively long).

Yeah I'm also trying that path. Some support there would be nice.

p.s. in the time we spent discussing this new field we could have done enough bikeshedding to come up with suitably short descriptions I feel...

Instead the discussion at PR #1525 will be repeated a dozen times - for each exercise needing shorter descriptions.

@macta
Copy link
Contributor

macta commented Jun 11, 2019

I think this comes down to what is the true intent of having problem descriptions in a machine readable format? Why not have an ambiguous descriptive text description and be done with it?

I had understood it was to ease automatic generation of consistent descriptions across languages - at least the intent of tests could be consistent across languages regardless of implementation. You could read a test name and understand what you had to do.

This is bollocks - it’s badly conceived and should be sorted . The descriptions have grown organically without much thought to autogeneration and we are left to muddle through?

This isn’t a language problem or ide problem at all!

Can someone clearly in pseudo code tell me how to unambiguously generate clear and concise test names for all the exercises without having to adjust anything? Ideally so I can read the same test name in any language and compare the unique implementation in that language to any other?

If so, we will change Pharo to that - but I suspect it’s not the case, we’ve all worked around this nonsense and it’s a damn shame. It takes the steam out of wanting to improve this in my opinion , I’ve certainly backed away - I’ll solve a few more interesting problems with the descriptions we’ve got and maybe create a few custom exercises but it saps too much energy to correct things up stream in my opinion.

The interesting thing was that problems had a definition, but this looks rather accidental from my experience so far.

@macta
Copy link
Contributor

macta commented Jun 11, 2019

The pseudo code should be as simple as:

for every "case" (descending in pre-order)
if a top level case and language ide sorts by name, prepend a "counter prefix" (e.g. testxx_)
append the adjusted case description*
end for

*adjusted description:
if the description contains <,>, >= etc (convert to expanded descriptive text like "greater than")
if the description contains invalid language method name characters strip them out (eg. &, + etc)

the following additional rules are a bit empirical (and shouldn't be needed if readability was considered when nesting cases)

if the description is all digits prepend "at "
if the description begins with a digit prepend "and "

The issue is typically nested cases where the author hasn't thought about a generation algorithm (possibly the above) and needlessly repeats information, so concatenation is repetitive and overly verbose.

Beyond a certain size, I don't think students actually read the test name, or struggle to find it useful (regardless of the language or IDE - so you have to manually customise it)

We've painfully corrected some specifications, there are still many that are sub optimal (look at robot-simulator, which starts out ok and then gets a bit silly)

@petertseng
Copy link
Member

petertseng commented Jun 12, 2019

In this comment, in the first section I share my view of our current purpose for descriptions (maintainer-centric, intended to help us evaluate coverage of test suites). The second is to examine the effect that some of these choices (nesting, descriptions) have on various test suites (it appears the original design was only made to support test suites that have nesting. It should be decided what sort of test suites problem-specifications is meant to encourage and make easy).

I am sorry to say that because of travel, I currently do not personally have a recommendation to provide, but I hope some information helps to understand what the elements of a desirable recommendation would be.


First, regarding the intent of descriptions:

I think this comes down to what is the true intent of having problem descriptions in a machine readable format? Why not have an ambiguous descriptive text description and be done with it?

I had understood it was to ease automatic generation of consistent descriptions across languages - at least the intent of tests could be consistent across languages regardless of implementation. You could read a test name and understand what you had to do.

I would say I agree with that understanding. I'll give my own take on the previously-quoted README, the quote being:

the description should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?

... actually I'm almost obligated to give my own take on this, since any examination of the history would show that I wrote those sentences in add5995#diff-04c6e90faac2675aa89e2176d2eec7d8, following the discussion in #451 (comment).

We have a desire, as collective maintainers of the various canonical-data.json, to retain information about why each test case exists, so that we as maintainers can evaluate what coverage we might lose if a test case is proposed to be removed. What mistaken solutions might now be allowed if we remove a test case?

For the sake of coverage of the test suites, I suggest that it is useful to keep that information somewhere, but if the allegation, discussion, and subsequent consensus is that the description is not the right place for it, then perhaps it is time to find a better place for it.


So, given this information we have, with each test case given a description and possibly nested in other groups of cases each with their own description, can we predict how a track's maintainers might choose to render these cases in the track's choice of test suite?

It is hard to say, but I have a suspicion the current structure was written in mind with test suites that support some form of nesting/grouping for their tests. As an example, I show the robot-simulator for Haskell (note that Haskell's tests were not generated as far as I can tell, but it is still a fine example to show the nesting): https://travis-ci.org/exercism/haskell/jobs/542405000#L3389-L3435. Only the leaves (items without further items grouped under them) are individual test cases.

For convenience it'll also be reproduced at the bottom of this comment.

I would say that since guidelines on nesting and descriptions implicitly only considered this style (I don't believe anything was discussed), other styles were left by the wayside.

So I would say something useful is to decide what sorts of test suites the canonical data is meant to encourage. What sort of information should be shown to the student trying to solve the exercise. Based on that, then it is possible to come up with guidelines on what sorts of nesting and descriptions should be done. Without knowing what we are trying to support and encourage, it is difficult to make recommendations if we don't understand what the goal is.


Nested descriptions, reproduced here for convenience (but unfortunately without the proper colour-coding shown in the Travis CI link)

mkRobot
  A robot is created with a position and a direction
  Negative positions are allowed
turning right rotates the robot's direction 90 degrees clockwise
  from North
    should change direction
    shouldn't change position
  from East
    should change direction
    shouldn't change position
  from South
    should change direction
    shouldn't change position
  from West
    should change direction
    shouldn't change position
turning left rotates the robot's direction 90 degrees counter-clockwise
  from North
    should change direction
    shouldn't change position
  from West
    should change direction
    shouldn't change position
  from South
    should change direction
    shouldn't change position
  from East
    should change direction
    shouldn't change position
advancing
  North from (0,1)
    shouldn't change direction
    increases the y coordinate one when facing north
  South from (0,-1)
    shouldn't change direction
    decreases the y coordinate by one when facing south
  East from (1,0)
    shouldn't change direction
    increases the x coordinate by one when facing east
  West from (-1,0)
    shouldn't change direction
    decreases the x coordinate by one when facing west
move
  instructions to move east and north from README
  instructions to move west and north
  instructions to move west and south
  instructions to move east and north

@macta
Copy link
Contributor

macta commented Jun 12, 2019

@petertseng thanks for a thoughtful reply (and apologies for my frustration). Your explanation clarifies many things, and I think it does confirm that we should offer some pseudo code for track authors about how they can approach generation. A reference implementation that goes as far as spitting out test descriptions and pseudo code for a test - would go a long way to helping everyone improve the current specifications as well as think about what to write for future specs.

The robot example you give, is much better that the current specification (and more in line with what I would expect to generate. The current one has (and there are lots of other spec like this too):

"Where R = Turn Right, L = Turn Left and A = Advance, the robot can follow a series of instructions and end up with the correct position and direction",
"cases": [
{
"description": "instructions to move east and north from README",

@bencoman
Copy link
Contributor Author

If my shortened descriptions in #1525 are accepted, I'll ease off on this issue and plug away at other shortenings, although I still think its the wrong way to go.

The descriptions have grown organically without much thought to autogeneration and we are left to muddle through

I'm not sure how it aligns with the growth of the long descriptions, but I noticed that...

description: which will be used to name each generated test

was introduced four years ago and removed two years ago
There is now only the following which encourages verbosity unsuited to identifier generation:

the description should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?

So it seems the description field has evolved over time and ended up with a mixed purpose pulling it in two directions - short versus long.

@bencoman
Copy link
Contributor Author

What sort of information should be shown to the student trying to solve the exercise

I myself got confused about the purpose of the "comment" field. In a greenfields scenario a better choice might be "maintainer_notes" and "student_notes" - but thats easy to see the distinction in hindsight.

Even though it duplicates/summarises the description.md , the following might fit well in "student_notes".

Where R = Turn Right, L = Turn Left and A = Advance, the robot can follow a series of instructions and end up with the correct position and direction"

@ErikSchierboom
Copy link
Member

I myself got confused about the purpose of the "comment" field. In a greenfields scenario a better choice might be "maintainer_notes" and "student_notes" - but thats easy to see the distinction in hindsight.

This is indeed a bit confusing.

So it seems the description field has evolved over time and ended up with a mixed purpose pulling it in two directions - short versus long.

Well spotted. I don't know why we changed this. Anybody has any idea?

SleeplessByte pushed a commit that referenced this issue Jun 12, 2019
* Robot simulator descriptions too long

Convert long descriptions into comments per...
per #1473
@SaschaMann
Copy link
Contributor

Well spotted. I don't know why we changed this. Anybody has any idea?

#336 seems to be the discussion around it (Ctrl-F'ing description shows some of the arguments).


I myself got confused about the purpose of the "comment" field. In a greenfields scenario a better choice might be "maintainer_notes" and "student_notes" - but thats easy to see the distinction in hindsight.

Changing the field name would be a breaking change, but I think this could definitely be clarified in the README.

@sshine
Copy link
Contributor

sshine commented Jun 26, 2019

@bencoman wrote:

So it seems the description field has evolved over time and ended up with a mixed purpose pulling it in two directions - short versus long.

I didn't know this. This is a very reasonable concern.

To grow on your idea, perhaps rather than an "identifier": "roll_bonus_balls_before_scoring" one could also have a "name": "Roll bonus balls before scoring" to have tracks generate whatever snake-case or camel-case identifier they need.

If my shortened descriptions in #1525 are accepted, I'll ease off on this issue and plug away at other shortenings, although I still think its the wrong way to go.

While this issue does not affect any tracks I am involved with personally, in the interest of moving forward with #1492, I would like to know from everyone who is still involved in the discussion of an "identifier" field how they feel about taking another iteration of this proposed schema change:

  1. Does @bencoman still find it interesting to grow on the proposal?
  2. Are there still opponents to the idea of two separate description fields who think we can determine when a description fits in the middle?
  3. Should it be "identifier" or "name" or "short_description" or what?
  4. Is its exact purpose to fill the gap that "description" once did cf. @bencoman's digging?
  5. Should there be any fixed requirements to its length, like git commit headers?
  6. Should it be encoded in the canonical schema?
  7. If we choose to implement an additional key, how do we deal with exercises that don't have it yet?

We may end up with this feature and we may not.

But I'd like for it to 1) be designed well, if at all, and 2) not block #1492.

@bencoman
Copy link
Contributor Author

bencoman commented Jun 27, 2019

Even if its useful to group addition of the other keys together, its also useful to concentrate on a single case for the first one.

  1. I'm having some progress with the alternative of slimming down the descriptions, but although the feedback is worthwhile its slow to work through. It end up this is not required, but I'm not yet ready to let go of it. I think Proposal: Add flag to mark test cases as optional to canonical schema #1492 should proceed independent of this one, otherwise getting this one right would unnecessarily hold up Proposal: Add flag to mark test cases as optional to canonical schema #1492.

  2. btw, By "identifier" I mean http://aboutc.weebly.com/identifiers.html not UUID. Perhaps "name" would better distinguish the intent, but not "short_description".

  3. An absolute limit isn't required, but some guidance via a "recommended" length would be useful to get everyone on the same page. What conventions do people use for manually writing individual unit tests? btw, here is an analysis of test methods (~20,000) in Pharo's own system code...
    PharoMethodNameLength

  4. Make this an optional key in canonical data. Generators can fall back to their existing generation from "description". Indeed some languages don't generate individual methods/functions for each test case, so would this key would not apply to them.

@ErikSchierboom
Copy link
Member

Are there still opponents to the idea of two separate description fields who think we can determine when a description fits in the middle?

Raises hand I still believe that we can shorten almost all descriptions enough to have reasonable length identifiers. Note that if I am the only one objecting, I'd of course get in line and follow the majority.

I do wonder what @kytrinyx's and @iHiD's opinions are on this matter.

@sshine
Copy link
Contributor

sshine commented Jun 27, 2019

I also believe that we can shorten almost all descriptions enough to have reasonable length identifiers.

But descriptions will diverge and become long again unless addressed.

Can the following statements highlighted by @bencoman both be true?

description: which will be used to name each generated test

the description should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?

That is, a description that can be used to name each test that explains what each case is and also why it's there?

Is there an ideal soft max length of N characters, e.g. 60?

@kytrinyx
Copy link
Member

Most tracks use the description to generate the test name, if I understand correctly. Also, it seems like there's no current recommendation for an alternate method for generating names. As such, it is my recommendation that we continue to generate names based on the description, and that we make those descriptions short, allowing for ambiguity.

That said, shorter doesn't really mean more ambiguous most of the time, I think. @bencoman's example earlier, I find roll bonus balls before scoring to be much more understandable than Bonus balls for a strike in the last frame must be rolled before score can be calculated.

Where possible, I think it makes sense to base descriptions on the why if possible, since the how already exists in the implementation, but I don't think we should mandate that with any force, because there are just too many cases where the why is "duh". I suspect that any time we try to make the rules too rigid, we end up with silly edge cases. Since we have people writing these specs, not machines, it seems like a perfectly good time to allow for judgement calls.

I'm not opposed to adding or changing the spec in any way, but at the moment I don't understand how the long descriptions and a new short identifier would be better than short descriptions.

If we do find that we need maintainer notes and student notes, I think it's worth breaking the spec and spending some concerted effort helping those tracks whose generators break when new fields are added. (I would recommend, in that case, leaving "comment" as an empty field until generator maintainers can remove the reliance on it).

@iHiD
Copy link
Member

iHiD commented Jun 27, 2019

I'm going to try and give some thought from a product POV rather than a track maintainer POV (because I don't maintain any tracks and don't know anything about how this part of exercism works).

The most important thing in this is that we consider what is clear to the user's learning. This should be the main guiding principle.

The second most important thing is that maintainers can maintain the tracks easily without lots of extra work. Reducing unnecessary effort maintainers keeps everyone happy and means maintainers can do the most important things.

It's also important to raise that TDD is Exercism's preferred method of solving exercises, so the tests should be clear. I rarely even read the README when solving exercises and instead just attack the test-suite.

My instinct says that there are two clear things here - the name of the test, and some information that explains what's going on with the test. At the moment, these are combined into one field (description), which has worked fine because we've kept a middle ground.

Wearing my product hat, I would fine with splitting these out explicitly into two fields, with the following conditions:

  1. These fields should be explicitly different with their separate purposes clear, I would want them to diverge significantly. e.g.
  • Names are designed to be just test names. Something recognisable when reading output but not verbose or particularly descriptive.
  • Descriptions are designed to be included in the test suites as comments in each tests that explain a bit more about the test. I think this has lots of benefits to students who might want to read more about what the test is doing in "plain english". These could also give hints, or for optional tests explain why they're optional. These could be a paragraph long. (Below I refer to these as information)
  1. I don't want to arbitrarily cause work for the test-suite maintainers, especially as these people tend to also be doing things like auto-analysis atm so are already thinly spread. I think if this is agreed it should be done as follows:
  • a) A new field called name is added, which is populated with the current description.
  • b) A new field called information is added, which is null on all tests.
  • c) Those pushing for this change and shorten names and add information fields.
  • d) We give track maintainers 6 months to change their test generators to use name instead of description. During that period, new exercises must have description and name fields added.
  • e) Once all test generators are updated we remove description.

@bencoman
Copy link
Contributor Author

@iHiD's proposal would be an excellent outcome for me. I'd tend towards a more explicit studentNotes over information but the purpose of the latter might be reasonable specified in the repo README.

I'm am willing to keep on with shortening then existing description if that ends up the only path forward, but I think @sshine raise a significant point, that...

descriptions will diverge and become long again unless addressed.

...because that is the nature of a "description".

@ErikSchierboom
Copy link
Member

I'm am willing to keep on with shortening then existing description if that ends up the only path forward, but I think @sshine raise a significant point, that...

Well, @sshine also suggested a possible fix, by limiting the number of characters that can be used for the description.

It looks like opinions vary a bit on this subject. Reading this thread, it looks like there are basically two options:

  1. Keeping the existing structure and keep on shortening descriptions.
  2. Introduce a new field/rename the existing one and use a separate field for comments/description/notes.

So how to continue? In another thread, it was implied that if no consensus was reached, the Exercism team would make a decision. In this case though, it looks like @kytrinyx and @iHiD have slightly different opinions :) Can we reach a consensus here or can otherwise a decision be made for us? I feel passionate about this issue, but I'm even more passionate about being able to continue and start working or closing this issue :)

SleeplessByte pushed a commit that referenced this issue Jun 28, 2019
…1533)

The purpose of the "description" field is to derive identifiers for naming generated test methods, but some descriptions have tended to verbosity ending up with long identifiers (up to 150 characters). Such long method names are harder for students to work with and also are awkward for some IDE GUIs.

Issue #1473 proposed adding a separate "identifier" field, but I was required to first try improving "descriptions" to better suit "identifier" generation. This PR aims to slim down the generated identifiers without losing substantive information about the test.

To start with, "Check if the given string is a pangram" is a rather banal redundant description unworthy of nesting.
SleeplessByte added a commit that referenced this issue Jul 1, 2019
The purpose of the "description" field is to derive identifiers for naming generated test methods, but some descriptions have tended to verbosity ending up with long identifiers (up to 150 characters).  Such long method names are harder for students to work with and also are awkward for some IDE GUIs.

Issue #1473 proposed adding a separate "identifier" field, but the consensus there was to first try improving "descriptions" to better suit "identifier" generation.    This PR attempts to slim down the generated identifiers without losing substantive information about the test.

For the longest descriptions here, I couldn't recognize the significance of...
```
dart whose coordinates sum to > 1 but whose radius to origin is <= 1
```
While considering a shorter alternative, it helped consistency to also rearranged the shorter descriptions to be more explicit about each test condition, and add a couple of tests each side of boundary conditions.

Co-Authored-By: Erik Schierboom <[email protected]>
Co-Authored-By: Derk-Jan Karrenbeld <[email protected]>
@petertseng
Copy link
Member

In three parts: My personal recommendations, explanation of why I haven't been helping, and some statistics about the current description lengths that may help a decision.


Personal recommendations:

I observe that there are two questions whose answers may be decided on completely separately.

  1. Should there be a length limit on certain fields
  2. Should there be new fields added

They are separate because the adding of new fields clearly defines a translation path between the old schema and the new schema, so if a description is to be shortened, it can be shortened in the old schema or the new schema and the result will be the same. (Mathematically, the two functions commute with one another)

If there should be a length limit, since we are only human, we will inevitably violate it unless a machine checks it for us. The statistics may help understand whether there can/should be a length limit. If there should be one, code should be written to check it.

If a new field is introduced that is suited to contain long descriptions, please do one of the two:

  1. clearly document the difference between the new field and the already-existing comments field.
  2. deprecate the already-existing comments field.

Otherwise, I will not be able to understand the difference between comments vs the new field and I will act unpredictably if I write any future canonical-data.json files.


You may have seen that after I posted my view of why things are the way they are, then I didn't help. It seemed to me that I wouldn't be affected either way by any of the decisions so I had nothing to add. I think that is the natural state of affairs though, so it is not too surprising that we see those unaffected by the decision tend not to make the decision or take action.

The incentives are also not aligned very well because a decision in this issue would benefit the students or the mentors of a track, and there are maintainers that are currently neither students nor mentors. That includes myself. I have historically found it difficult to convince people to do something that doesn't benefit them in some way. But that is a discussion for another issue entirely so I'll get back on topic.


Here is some code that reads all our canonical-data.json files and outputs:

  • the frequency distribution of description lengths. I chose arbitrary bucket sizes of 5 for this
  • the top 10 longest description lengths among the single longest descriptions of each exercise. Note carefully that this is different than if I said "the top 10 longest descriptions". If I said the latter, an exercise might appear multiple times in this list. Since I said the former, instead an exercise will only appear once in this list.

It is capable of concatenating the descriptions from all the nested levels above a case (if you pass the -r command-line flag), but since I've already argued that that doesn't make sense to include these as part of the test identifier (they are shared between many cases, therefore the nesting/grouping features of the test suite should be used instead), I will not show the results of the -r flag and will instead just show the result of running it without arguments.

the code

require 'json'

def cases(h)
  # top-level won't have a description, but all lower levels will.
  # therefore we have to unconditionally flat_map cases on the top level.
  # on lower levels we either flat_map cases or just return individuals.
  cases = ->(hh, path = [].freeze) {
    hh['cases']&.flat_map { |c|
      cases[c, (path + [hh['description'].freeze]).freeze]
    } || [hh.merge(path: path)]
  }
  h['cases'].flat_map(&cases)
end

maxes = []
freq = Hash.new(0)

recurse = ARGV.delete('-r')

Dir.glob("#{__dir__}/exercises/*/canonical-data.json") { |f|
  exercise = File.basename(File.dirname(f))
  lens = cases(JSON.parse(File.read(f))).map { |c|
    arbitrary = ?/
    ((recurse ? c[:path] : []) + [c['description']]).join(arbitrary).size
  }
  maxes << [lens.max, exercise]
  lens.each { |l| freq[l] += 1 }
}

n = ARGV.empty? ? 10 : Integer(ARGV.first)

bucket_size = 5

(0..maxes.map(&:first).max).step(bucket_size) { |left|
  right = left + bucket_size
  v = freq.values_at(*(left...right).to_a).sum
  puts '%3d - %3d: %d' % [left, right, v]
}

puts "total: #{freq.values.sum}"

puts "top #{n}:"
maxes.max_by(n, &:first).each { |x| puts '%3d %s' % x }

the results

  0 -   5: 18
  5 -  10: 55
 10 -  15: 165
 15 -  20: 170
 20 -  25: 277
 25 -  30: 208
 30 -  35: 186
 35 -  40: 147
 40 -  45: 89
 45 -  50: 100
 50 -  55: 66
 55 -  60: 49
 60 -  65: 31
 65 -  70: 18
 70 -  75: 17
 75 -  80: 7
 80 -  85: 15
 85 -  90: 6
 90 -  95: 5
 95 - 100: 2
100 - 105: 0
105 - 110: 2
110 - 115: 0
115 - 120: 0
120 - 125: 0
125 - 130: 0
130 - 135: 0
135 - 140: 0
140 - 145: 2
total: 1635
top 10:
143 simple-cipher
107 two-bucket
107 bowling
 98 poker
 90 saddle-points
 85 react
 84 rational-numbers
 84 custom-set
 81 raindrops
 81 grep

@ErikSchierboom
Copy link
Member

Each test case now has an uuid field! Therefore, this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests