-
-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated description of anagrams exercise #1928
updated description of anagrams exercise #1928
Conversation
exercises/anagram/description.md
Outdated
@@ -1,8 +1,21 @@ | |||
# Description | |||
|
|||
An anagram is a rearrangement of letters to form a new word. | |||
Given a word and a list of candidates, select the sublist of anagrams of the given word. | |||
An anagram is a rearrangement of letters to form a new word: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our Markdown style guide has a one sentence per line rule.
It is true that not many exercises do yet follow this guideline, but it might be good to apply that rule to this PR.
See https://github.com/exercism/problem-specifications/pull/1911/files for an example.
15ee201
to
0f4aa5b
Compare
Sure, I guess. Here is Python to partly automate this process. Text editors really don't deal with it. You should add a check to your CI Markdown linter if you want to enforce this behavior. |
exercises/anagram/description.md
Outdated
The target and candidates are words in ASCII alphabetic characters (`A`-`Z` and `a`-`z`). | ||
Lowercase and uppercase characters are equivalent: for example, `"PoTS"` is an anagram of `"sTOp"`. | ||
The candidate set is represented as an unordered list. | ||
The anagram set must be the sublist of all anagrams in the candidate set (in the same order). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can show that at least two tracks do not enforce that the elements are in the same order as in the original:
https://github.com/exercism/ruby/blob/main/exercises/practice/anagram/anagram_test.rb#L19 - since it sorts the list, the list may be in any order
https://github.com/exercism/rust/blob/main/exercises/practice/anagram/tests/anagram.rs#L6 - since it compares the sets, the sets need only contain the same elements, as sets have no concept of ordering
How shall we handle this? Shall we declare those two tracks, and all others who do the same, to be out of compliance? Or shall we allow for the possibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'm fine with whatever.
I do think we should be clear in the generic description about whether we're thinking of the candidates and the result as a sequence, set, multiset or something else. If the result is to be a set or multiset, I think we should also specify that the candidates are: it would be confusing to change representations between the input and output. Having the input and output be "a sequence representing a set" seems to me to be a bit weird: if we choose a set representation, languages that don't support sets can always choose to represent them as lists and the tests can reflect that (by sorting or whatever).
One of my goals in rewriting the description was to not have to do it again, by being precise enough to clearly indicate exactly what is expected. If we choose to make the input and output a sequence, we need to specify:
-
Whether duplicates can appear in the candidates
- If so, whether duplicates should appear in the result
- If not, whether words that differ only by case are different?
-
What constraints, if any, there are on result ordering
Regardless, we need to clearly specify:
- What constraints, if any, there are on result letter-case
- In particular, should the result case match the candidate case?
I'll wait for thoughts from others before trying again on this.
(Another thing I realized this morning while staring at my work is that I never really directly answered the question "Is 'StOp' an anagram of 'stop'?". My intent was no, but with your blessing I'll modify the first example or add a later example to reflect this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How shall we handle this? Shall we declare those two tracks, and all others who do the same, to be out of compliance? Or shall we allow for the possibility?
I'd say that the set-based solution used by Rust actually makes the most sense. Why would there be an ordering in the results? That said, apparently the set-based approach is fairly unique. Personally, I think that we should probably allow for tracks choosing their own implementation.
I do think we should be clear in the generic description about whether we're thinking of the candidates and the result as a sequence, set, multiset or something else. If the result is to be a set or multiset, I think we should also specify that the candidates are:
This is a great goal, but not always as easy to achieve due to both historic reasons and tracks implementing exercises differently (which is often totally valid). Basically, any time you want to be really specific in the description, it makes it more likely that the description won't correspond to a track's actual implementation. I don't have a great solution unfortunately.
Another thing I realized this morning while staring at my work is that I never really directly answered the question "Is 'StOp' an anagram of 'stop'?". My intent was no, but with your blessing I'll modify the first example or add a later example to reflect this.
I think it shouldn't be an anagram. That seems to be the most consistent with the specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is especially awkward because it has a whole bunch of implicit ASCII-English linguistics assumptions. The attempt to define these away by providing an explicit dictionary is an approach, but it does require establishing what the dictionary is and does. Dictionaries are fundamentally sets (really maps to definitions, but the definitions are usually irrelevant in word games): perhaps the right approach is to describe the input and output as sets here, and allow the various language versions to use implicit or explicit unordered lists as sets. That said, if the dictionaries are sets that means that the same word will not appear in them more than once, so the question of lettercase folding has to be settled: can a word appear multiple times in the dictionary with different lettercase combinations?
If the principals here reach consensus on these kinds of issues I'm happy to draft English as desired. I don't feel like it is good practice to have problem descriptions that conceal details of the actual problem, especially for new coders attempting simple-looking problems. This can lead to a ton of frustration as code fails tests for properties the programmer didn't know were being enforced and that they don't necessarily understand.
Sorry I couldn't be of more help clearing this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the decision were in my hands alone, I would say in the description that the candidate words are a set. If there is a track that makes a decision to instead use their ordered sequence type instead of their set type (for example, the ordered sequence type is more ergonomic to use, or a set simply isn't provided in the language or its libraries), I don't really foresee this drawing complaints of noncompliance, but I can't really say that my predictive power is any better than any average person's.
Do note that in https://github.com/exercism/problem-specifications/blob/main/exercises/anagram/canonical-data.json#L151 we already have a case where "candidates": ["BANANA", "Banana", "banana"]
, so we are allowing candidate words with different case to be considered distinct from each other.
I do note that we've never had a case where the expected result has ever needed to contain something like ["BANANA", "Banana", "banana"]
, so we're free to make whatever decision is easiest, or defer that particular decision.
We do say that BANANA
is still not considered an anagram of banana
, them being different casings of the same word.
It does occur to me, typing this now, that I see that there is an inconsistency.
- For set membership, word equality is case-sensitive, because the same word with different casing can exist in the same set.
- For the anagram test, word equality is case-insensitive, both because
stop
should not be an anagram ofSTOP
andstop
should be an anagram ofPOTS
.
That is an interesting inconsistency, but I don't recall anyone having been confused by that, since they apply in different situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I couldn't be of more help clearing this up.
Oh but this is very helpful! We're just in the process of figuring out the right words, which is hard.
I don't feel like it is good practice to have problem descriptions that conceal details of the actual problem, especially for new coders attempting simple-looking problems. This can lead to a ton of frustration as code fails tests for properties the programmer didn't know were being enforced and that they don't necessarily understand.
Yeah that can be very frustrating.
If the decision were in my hands alone, I would say in the description that the candidate words are a set.
Ah good idea, we omit the ordered-ness property and just refer to it as a "set". What do you think @BartMassey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making them be sets is fine with me. I think we should then correct ["BANANA", "Banana", "banana"]
in the Javascript version, though. @petertseng did you want to file an issue and PR there, or shall I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea to handle the ["BANANA", "Banana", "banana"]
as an issue following this one. I'm not picky on who does it. If I don't see one a few days after this PR is merged I'll do it.
Short version: The focal question of this commit: **Should determining set membership be case-insensitive?** If yes, this commit makes it so. If no, this commit should be rejected. Long version: In exercism#1928 it was codified that the input `candidates` is a set of words. Recall that a set contains no duplicates. Recall that ever since July 2013, it has been the intent of this exercise that determining whether two words are anagram of each other should be case-insensitive: exercism/exercism@bf3e011 Recall that ever since January 2014, it has also been the intent that determining whether a word is an anagram of itself should be case-insensitive: exercism/exercism#1266 The next step is to consider the focal question of this PR: **Should determining set membership be case-insensitive?** If yes, that is consistent with the anagram relation. Then, two current cases are in violation because they contain multiple instances of the same word. These instances are the same, even despite that they have different letter case, becuse set membership is case-insensitive. If this is so, this commit rectifies, while preserving tested functionality. If no, then we are declaring that this inconsistency is acceptable because it simply means two different rules are applied in two different situations, but they are both done consistently: Set membership is case-sensitive. The anagram relation is case-insensitive. If this is so, then this PR should be closed, with the discussion serving as affirmation that the question has been duly considered. Please understand that the opinion of the author of this commit is deliberately and explicitly left unspecified. The reason for that isn't to avoid unfair influence, but instead because the author of this commit does not anticipate having an opinion on this matter either way.
* Format using prettier (exercism#1917) Format using prettier * updated description of anagrams exercise (exercism#1928) * updated description of anagrams * changed anagram description to be one-sentence-per-line * updated description of anagrams to use sets * Update Licence Give a look at the discussion in BR exercism#1930 * rational-numbers: test to reduce abs value (exercism#1938) * Change saddle point references to row, column (exercism#1948) * word-search: Add test case * Update exercises/word-search/canonical-data.json Agreed. Co-authored-by: Erik Schierboom <[email protected]> * meetup: improve descriptions by saying why each case is tested (exercism#1919) descriptions show whether a date is the first, last, or an arbitrary middle date of the week. This helps understand why certain cases are selected. Closes exercism#974 * word-search: Add cases checking for concatenation and wrapping The author of this commit thinks that concatenation is highly unlikely, but the wrapping might be useful to check in languages that allow negative indices. * `flatten-array` Add additional test cases (exercism#1953) * Add additional test cases to flatten-array * Update exercises/flatten-array/canonical-data.json Co-authored-by: Peter Tseng <[email protected]> Co-authored-by: BethanyG <[email protected]> Co-authored-by: Peter Tseng <[email protected]> * Fix bowling game copy (exercism#1955) Fixes exercism#1954 * Add action to format code (exercism#1941) * build(deps): bump DavidAnson/markdownlint-cli2-action (exercism#1952) Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/DavidAnson/markdownlint-cli2-action) from 5.0.0 to 5.1.0. - [Release notes](https://github.com/DavidAnson/markdownlint-cli2-action/releases) - [Commits](DavidAnson/markdownlint-cli2-action@b3c3b40...744f913) --- updated-dependencies: - dependency-name: DavidAnson/markdownlint-cli2-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Reduced rational nr. should be in standard form. (exercism#1958) * Reduced rational should be in standard form. The current instructors fail to mention that a reduced rational number should always be rendered in standard form (without any negative value at the denominator). * remove superflous blank lines; fix wording * scale-generator: use flat and sharp symbols (exercism#1942) * Update configlet part in README (exercism#1949) Co-authored-by: ee7 <[email protected]> * phone number: only one problem per test input (exercism#1959) * [Phone Number] Only one problem per test input Because the area code is not allowed to start with 0 or 1, inputs designed to elicit other errors should not use area codes that start with either of those digits. * Respect immutability * Correct field name: s/comment/comments/ * Comments should contain a list. * Allow prettier to improve comments * book-store: reorder keys * darts: reorder keys * grade-school: reorder keys * hamming: reorder keys * high-scores: reorder keys * largest-series-product: reorder keys * list-ops: reorder keys * luhn: reorder keys * triangle: reorder keys * scale-generator: reorder keys * saddle-points: reorder keys * diffie-hellman: reorder keys * collatz-conjecture: reorder keys * anagram: reorder keys * accumulate: reorder keys * Add CI script to check correct order of keys Co-authored-by: Bart Massey <[email protected]> Co-authored-by: y8l <[email protected]> Co-authored-by: Ivan Ivanov <[email protected]> Co-authored-by: Damian C. Rossney <[email protected]> Co-authored-by: mariohuq <[email protected]> Co-authored-by: mariohuq <[email protected]> Co-authored-by: Peter Tseng <[email protected]> Co-authored-by: Peter Tseng <[email protected]> Co-authored-by: AH WEI <[email protected]> Co-authored-by: BethanyG <[email protected]> Co-authored-by: Cedd Burge <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Davide Alberto Molin <[email protected]> Co-authored-by: wolf99 <[email protected]> Co-authored-by: June <[email protected]> Co-authored-by: ee7 <[email protected]> Co-authored-by: Leah Hanson <[email protected]>
Short version: The focal question of this commit: **Should determining set membership be case-insensitive?** If yes, this commit makes it so. If no, this commit should be rejected. Long version: In #1928 it was codified that the input `candidates` is a set of words. Recall that a set contains no duplicates. Recall that ever since July 2013, it has been the intent of this exercise that determining whether two words are anagram of each other should be case-insensitive: exercism/exercism@bf3e011 Recall that ever since January 2014, it has also been the intent that determining whether a word is an anagram of itself should be case-insensitive: exercism/exercism#1266 The next step is to consider the focal question of this PR: **Should determining set membership be case-insensitive?** If yes, that is consistent with the anagram relation. Then, two current cases are in violation because they contain multiple instances of the same word. These instances are the same, even despite that they have different letter case, becuse set membership is case-insensitive. If this is so, this commit rectifies, while preserving tested functionality. If no, then we are declaring that this inconsistency is acceptable because it simply means two different rules are applied in two different situations, but they are both done consistently: Set membership is case-sensitive. The anagram relation is case-insensitive. If this is so, then this PR should be closed, with the discussion serving as affirmation that the question has been duly considered. Please understand that the opinion of the author of this commit is deliberately and explicitly left unspecified. The reason for that isn't to avoid unfair influence, but instead because the author of this commit does not anticipate having an opinion on this matter either way.
closes #1556, affects exercism/rust#86, exercism/rust#690