-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word Count - Adds multiple apostrophe test case #1628
Conversation
This is something of an oversight, however I believe merging this is currently blocked by #1560 |
Fair enough, wasn't sure if this fell under the scope of "test cases being wrong" as mentioned in that post, but I'll look to send it over to the language-specific repos and see what trickles down until the lock is lifted. |
Yeah, I'm a little uncomfortable with a missing test being the same thing as a test being wrong ... the description.md does mention just a single intervening apostrophe, but we also don't test things like "0th" or other interpolations of digits between letters, or hyphenated words ... I mean what should the word count be for |
Sure, makes sense. Should we close this for now or leave this on hold? I'm not sure how long it is planned for the lock-down to occur. I don't want to unnecessarily clutter the pull request queue, especially since it sounds like the maintainers are incredibly busy. Thanks for your consideration! |
Leave it open on hold. Thank you :) |
I'm not sure that the additional test case is something that we'd need to test for. How likely is it that someone writes an implementation that passes single quotes but fails with double quotes? Similarly, is it reasonable to expect double quotes in input? |
CC @exercism/reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If implementations are unlikely to make this mistake then they will pass implicitly anyway.
If they do make the mistake then it is good to catch it as it may indicate an overly convoluted implementation.
For what it's worth, my current implementation wouldn't have handled this properly since I'm only removing exactly one pair of surrounding single quotes from a given word. And since no test currently has two pairs of surrounding single quotes, this test would add value. If our tests only ever have words surrounded by one pair, no incentive to handle more than one pair, and I predict (but have no hard data to back up) that the path of least resistance is to in fact just handle one pair (because no looping required). Is there a limited scope of what inputs we would want to test for this problem, given its goals (I don't actually know its goals). You may recall that I expressed a preference in Acronym #1436 that the test cases be in a form that someone might expect to see in natural language. Although in this case of word-count, that ship has already sailed because Not yet fully decided but my current stance is best described as: No real desire to add it, but won't stand in the way if the majority decision is to add it. |
We currently don't have explicit goals for exercises. We'll probably add this in the future, but that is potentially problematic as different tracks could have different goals for an exercise. What we could do is add things like you mentioned, where we restrict text to "normal" text. |
@exercism/reviewers Your thoughts on the desirability of the change in this PR? |
Real-language wise, it feels like a bit of an edge case to me. Or at least it feels like an edge-case when it comes to word-counting. I think it's a useful scenario to consider from the perspective of "cleaning up" language you'd then process (count, format, analyze, etc.) "downstream". So maybe this exercise has a related predecessor or extension that deals with the difficulties of sanitizing and normalizing? I think that if we are going to add test cases like these - as well as All that being said -- if a majority decide this is indeed a test case worth adding here, I am not going to stand in their way. |
For what it is worth, I do not think that even the big players always get this right. Do a word count from various tools, such as Lotus Notes, Word, etc, and you are likely to get different answers. That said, if you are splitting words on apostrophes you are probably approaching the problem incorrectly. I guess from a view point of mentoring, I do not mind if the test is there or not. I do not know why having more than one apostrophe in a word would cause a problem for counting words, because I do not think I would have ever considered an apostrophe to be a word boundary. I also was expecting to see a test that covered multiple apostrophes, as opposed to "multiple single quotes", such as:
Where there are two apostrophes in that sentence. |
It's apparently colloquial only, but
|
And both still only one word. |
Also, the test does not test for this, it tests "''hey''" which are single quotes (as used) and not apostrophes (as used in "y'all"), even though they are the same character (though do not have to be). |
So I would say there are two aspects here, definition of "word" in regards to works that contain one or more apostrophes. The second aspect is that I think the test does not cover what the PR title states as well. |
As part of reworking the word-count problem description (#2247) I've taken a look at this PR, and my feeling is that this test case doesn't really fundamentally make the exercise more interesting, and while it's potentially more correct, the definition of "correct" is vague enough that I'm not convinced that it makes enough of a difference. The new problem definition frames the exercise in terms of subtitles for TV shows, which is always going to be fuzzy. Subtitles are not always complete sentences. They're not always correct in terms of grammar or punctuation or spelling. But in the use case described, it doesn't actually matter, because the goal is just "directionally, roughly useful", not "precisely correct". So my conclusion here is that I'd prefer to close this without merging. |
I noticed when working through the word-count problem on the Python track, I was able to pass the test suite with an incorrect solution. Effectively, the test suite does not check that users ignore non-contracting apostrophes. As such, I've added a simple test checking that the word "''hey''" (the middle quotes are apostrophes) successfully comes out to "hey". I could see this being potentially out-of-scope though.