-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto-square implementations disagree with README in many tracks #190
Comments
I remember having trouble with inconsistencies on this before, but I've never taken the time to go through everything and tidy it up. I'd love to have a canonical list of test cases ( |
Pinging @exercism/track-maintainers in case you're not watching the repo :) |
Just to be clear, in the linked post it shows show variants of how tracks implement the algorithm for the strings "Madness, and then illumination." and "Vampires are people too!". What is the correct output for those strings? |
There are two fundamental choices to be made in this problem.
I believe all tracks have reached consensus on the first question, but not on the second (but it would be good to check that). Let's examine the first question. r >= c or c >= r?I look at the .yml file for the exercise, it lists source of http://users.csc.calpoly.edu/~jdalbey/103/Projects/ProgrammingPractice.html, and the example on that page is a
This is supported by the following sentence in the README:
So the message "abcde" fills in this way (
Rather than this way (
The following portion also supports
On a related note, I found a contradictory sentence in the README:
This sentence is contradictory because if a column has 8 characters, that implies there will be 8 rows, by definition. The sentence should read something similar to "Broken into 8 columns, it yields 7 rows" (this is a This would bring it in line with the example right below that sentence (the example from the page linked above, in fact!):
So what are the expected outputs, you ask? "Madness, and then illumination.", normalized, has 26 characters, so we want to decide between 5x6 and 6x5. "Vampires are people too!", normalized, has 20 characters, so we want to decide between 4x5 and 5x4. If we believe that it should be If we believe that it should be Notice the choice we make significantly changes the order of letters in the output. Also I didn't add spaces to the output, because that's a decision to be made later. I used the code I had written to solve this problem to generate It seems to me that the various outputs pointed out at exercism/exercism#2432 (comment) all are consistent with Spacing of outputAs long the order of the letters is correct, I'm much less fussed about how they are spaced. In fact, "no spacing" is a perfectly valid option to me. However, there are obvious disadvantages of choosing to space out "mstlnnashladaeutnnnmiediio" as "msemo aanin dnin ndla etlt shui" since it trivially allows recovery of the plaintext - read the first letter of every word, read the second letter of every word, read the third... Unfortunately, such a spacing is what is currently prescribed in the readme ("Output the encoded text grouped by column.")... and is also what is used in the examples in http://users.csc.calpoly.edu/~jdalbey/103/Projects/ProgrammingPractice.html . So there is an argument for doing it this way, despite my protests. This is the chunking of 5, 5, 4, 4, 4, 4 which actually is the behavior specified in the readme (since the number of chunks equals the number of columns) Before 89d1274, the old readme just said always output in chunks of five (regardless of message size!) which makes it harder to recover the plaintext (unless the message size is exactly 25 characters in which case you're out of luck!). This is where the "msemo aanin dninn dlaet ltshu i" grouping comes from. It appears some tracks also have chunking where the number of letters in each chunk (as opposed to the number of chunks) is the same as the number of columns ("msemoa anindn inndla etltsh ui"). A possible solution, sure, but there need be no relation between the number of columns and the number of letters per chunk. As for "vrelaepemsetpaooirpo" (the output for "vampires are people too"), the grouping consistent with the current README is "vrel aepe mset paoo irpo" (again, plaintext easily recovered, and number of chunks equal to number of columns), and the grouping for "always five" is "vrela epems etpao oirpo" (which also happens to be the "chunk size equals number of columns" since the number of columns was five) So we should make a decision on these two things and standardize, I suppose. My personal recommendations:
Footnote: I had originally written this comment assuming the tracks were in disagreement on the first question rather than the second. So that's why I went into so much detail defending |
@petertseng This is a fantastic summary of the problem, thank you so much for doing the analysis and exposé. The origin of this exercise is the "square code" exercise from here: http://users.csc.calpoly.edu/~jdalbey/103/Projects/ProgrammingPractice.html |
@petertseng thanks so much. @jtigger and I have been working on a PR for x-common to clarify the parameters for this exercise. Can we continue the discussion there? |
Wow, thanks a lot for the clarification @petertseng ! |
Total tangent here - sorry - but while we are talking about cryptosquares I wanted to share this phrase in Latin which reads the same even when encoded as a cryptosquare:
(It was found in Pompeii (amongst a few other places)). |
Mentioned in exercism#190, but I don't see that this has its own issue Currently 18 tracks implement this exercise. I did [a survey](https://gist.github.com/IanWhitney/4912b100cca0b02a5c3ce21096a3e673) of their test suites and found that they fell into two types: Type One: Test Just Encoding -- - [python](https://github.com/exercism/xpython/blob/master/exercises/crypto-square/crypto_square_test.py) - [lisp](https://github.com/exercism/xlisp/blob/master/exercises/crypto-square/crypto-square-test.lisp) - [go](https://github.com/exercism/xgo/blob/master/exercises/crypto-square/crypto_square_test.go) - [elixir](https://github.com/exercism/xelixir/blob/master/exercises/crypto-square/crypto_square_test.exs) Are the tracks that followed this approach. Just an `encode` method (or some alias) is implemented and tested. Type Two: Test Intermediate Methods -- The remaining 14 tracks followed this style. There's not a lot of variation between these suites. [Ruby](https://github.com/exercism/xruby/blob/master/exercises/crypto-square/crypto_square_test.rb) is a good, representative example. In these test suites several methods are implemented and tested: - `normalized_plaintext` - `size` - `plaintext_segments` - `ciphertext` - `normalized_ciphertext' Again, exact method names may vary. In implementing this json file I followed the second type. I did this for a few reasons: *It's already the majority*: With 14 of 18 tracks already implementing tests like this, there is some value in following the crowd. *I think it's the best approach*: This one is more subjective, obviously. My problem with the Test Just Encoding approach is that there's a huge gap between starting the exercise and getting a useful passing test. Students have to implement the full algorithm to get tests passing. By breaking the steps down in to smaller methods, each with their own tests, the lag time between starting to code and getting a passing test is smaller. And the tests are ordered so that each new method builds on the methods already built. The downside of this approach, I think, is that we're doing a lot of the design up front. In the Test Just Encoding approach students can implement the algorithm using as many methods as they want. In the Test Intermediate Methods approach, students end up locked to the methods defined in the test suite. In this case I think the trade off is worth it. But that's just my opinion. My kata group also worked through this exercise. 3 people did it in Test Just Encoding languages (Elixir and Python). 2 people did it in Test Intermediate Methods languages (Ruby and Javascript). Their opinions largely mirrored mine. Those that used Just Encoding found it a lot of work to get the 2nd test to pass (since the first test encodes an empty string). But once they got the 2nd test to pass, all tests passed. Those who used the Intermediate Methods approach found the steps between tests easer to manage and though that this approach was better for learning. Though, as an argument for Just Encoding, the Python people were impressed at the variety of designs people used to solve the problem. And our Elixir programmer liked that they could make up their own mind about internal implementations. A suggested middle ground was to have one exercise offer a Intermediate Methods test suite, while a later exercise could cover similar ground with a more free-form Just Encoding test suite. Removed Tests --- I dropped one set of tests that existed in the Test Intermediate Methods approach: `size`. I didn't see a reason for this method. I don't see it being used as part of a 'real' crypto library (though if your real crypto library is using Crypto Square then you probably have other problems). And I didn't see that testing it offered any useful tests not already provided by the `plaintext_segments` tests. Tweaked Method Names --- Method naming varies between current implementations and the Readme. I've tried to use method names that follow the readme. My methods - normalized_plaintext - plaintext_segments - encoded - ciphertext Terms used in the Readme - 'input is normalized' - 'plaintext in a rectangle' - 'encoded text' - 'cyphertext' or 'encoded text chunks' `plaintext_segments` is the method name that deviates most from the readme. It comes from the current implementations and I could not think of a better name. Names = hard.
I believe this can be closed. the |
Update RNA_transcription version test comments to be more clear for user
Taken from xjava:
Chunk sizes are wrong: The README gives several examples of how to deal with strings of text that can be made into perfect ( ie n X n ) squares, and those that can be made into imperfect ( r X c ) squares. In all the README examples, the imperfect squares are r X c with the additional constraint that r >= c.
As documented by @jtigger here most of the tracks take a 26-letter string and chunk it in to 5, 5, 4, 4, 4, 4. This is not the behavior specified in the README, and seems like a error from early on that got propagated to many tracks.
There's no json file for this in x-common, but it appears this needs to be changed in many places, not just on the Java track.
If I'm missing something please let me know, but it seems like this is a good case for common test data.
The text was updated successfully, but these errors were encountered: