Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Count - Adds multiple apostrophe test case #1628

Closed
wants to merge 1 commit into from

Conversation

c4llmeco4ch
Copy link

@c4llmeco4ch c4llmeco4ch commented Jan 7, 2020

I noticed when working through the word-count problem on the Python track, I was able to pass the test suite with an incorrect solution. Effectively, the test suite does not check that users ignore non-contracting apostrophes. As such, I've added a simple test checking that the word "''hey''" (the middle quotes are apostrophes) successfully comes out to "hey". I could see this being potentially out-of-scope though.

@yawpitch
Copy link
Contributor

This is something of an oversight, however I believe merging this is currently blocked by #1560

@yawpitch yawpitch added the hold label Jan 18, 2020
@kotp kotp changed the title Added multiple apostrophe test case Word Count - Adds multiple apostrophe test case Jan 18, 2020
@c4llmeco4ch
Copy link
Author

Fair enough, wasn't sure if this fell under the scope of "test cases being wrong" as mentioned in that post, but I'll look to send it over to the language-specific repos and see what trickles down until the lock is lifted.

@yawpitch
Copy link
Contributor

Yeah, I'm a little uncomfortable with a missing test being the same thing as a test being wrong ... the description.md does mention just a single intervening apostrophe, but we also don't test things like "0th" or other interpolations of digits between letters, or hyphenated words ... I mean what should the word count be for 'twouldn't've be, anyway? The tests we've written, generally, describe a minimally functional implementation, they don't describe a bug-free one, so while I don't mind the idea of tightening this one up a little further once the block has lifted I'd say it doesn't quite rise to the urgency that would be required to make an exception.

@c4llmeco4ch
Copy link
Author

c4llmeco4ch commented Jan 21, 2020

Sure, makes sense. Should we close this for now or leave this on hold? I'm not sure how long it is planned for the lock-down to occur. I don't want to unnecessarily clutter the pull request queue, especially since it sounds like the maintainers are incredibly busy.

Thanks for your consideration!

@iHiD
Copy link
Member

iHiD commented Jan 21, 2020

Leave it open on hold. Thank you :)

@ErikSchierboom
Copy link
Member

I'm not sure that the additional test case is something that we'd need to test for. How likely is it that someone writes an implementation that passes single quotes but fails with double quotes? Similarly, is it reasonable to expect double quotes in input?

@ErikSchierboom
Copy link
Member

CC @exercism/reviewers

Copy link
Contributor

@wolf99 wolf99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
If implementations are unlikely to make this mistake then they will pass implicitly anyway.
If they do make the mistake then it is good to catch it as it may indicate an overly convoluted implementation.

@petertseng
Copy link
Member

For what it's worth, my current implementation wouldn't have handled this properly since I'm only removing exactly one pair of surrounding single quotes from a given word. And since no test currently has two pairs of surrounding single quotes, this test would add value. If our tests only ever have words surrounded by one pair, no incentive to handle more than one pair, and I predict (but have no hard data to back up) that the path of least resistance is to in fact just handle one pair (because no looping required).

Is there a limited scope of what inputs we would want to test for this problem, given its goals (I don't actually know its goals). You may recall that I expressed a preference in Acronym #1436 that the test cases be in a form that someone might expect to see in natural language. Although in this case of word-count, that ship has already sailed because ",\n,one,\n ,two \n 'three'" and car: carpet as java: javascript!!&@$%^& are quite out there. But I think it's worth thinking about the question of what lengths we want to ask students to go to, especially if it's about handling inputs that we might not even consider realistic. Or perhaps you might consider this text realistic, in the case of someone with a limited keyboard using two pairs of single quotes to substitute for their inability to type double quotes.

Not yet fully decided but my current stance is best described as: No real desire to add it, but won't stand in the way if the majority decision is to add it.

@ErikSchierboom
Copy link
Member

Is there a limited scope of what inputs we would want to test for this problem, given its goals

We currently don't have explicit goals for exercises. We'll probably add this in the future, but that is potentially problematic as different tracks could have different goals for an exercise. What we could do is add things like you mentioned, where we restrict text to "normal" text.

@ErikSchierboom
Copy link
Member

@exercism/reviewers Your thoughts on the desirability of the change in this PR?

@BethanyG
Copy link
Member

Real-language wise, it feels like a bit of an edge case to me.

Or at least it feels like an edge-case when it comes to word-counting. I think it's a useful scenario to consider from the perspective of "cleaning up" language you'd then process (count, format, analyze, etc.) "downstream". So maybe this exercise has a related predecessor or extension that deals with the difficulties of sanitizing and normalizing?

I think that if we are going to add test cases like these - as well as car: carpet as java: javascript!!&@$%^&, we should have the problem description outline what is, and is not considered a word for counting purposes. To do otherwise is to send a student into a spiral of "what if" and (maybe) unproductive defense and checking.

All that being said -- if a majority decide this is indeed a test case worth adding here, I am not going to stand in their way.

@kotp
Copy link
Member

kotp commented Jan 12, 2022

For what it is worth, I do not think that even the big players always get this right. Do a word count from various tools, such as Lotus Notes, Word, etc, and you are likely to get different answers.

That said, if you are splitting words on apostrophes you are probably approaching the problem incorrectly.

I guess from a view point of mentoring, I do not mind if the test is there or not. I do not know why having more than one apostrophe in a word would cause a problem for counting words, because I do not think I would have ever considered an apostrophe to be a word boundary.

I also was expecting to see a test that covered multiple apostrophes, as opposed to "multiple single quotes", such as:

"You're going to count words, but you can't really do it by using the apostrophes."

Where there are two apostrophes in that sentence.

@glennj
Copy link
Contributor

glennj commented Jan 12, 2022

It's apparently colloquial only, but

  • shouldn't've => "You shouldn't've done that!"
  • y'all's

@kotp
Copy link
Member

kotp commented Jan 12, 2022

It's apparently colloquial only, but

  • shouldn't've => "You shouldn't've done that!"
  • y'all's

And both still only one word.

@kotp
Copy link
Member

kotp commented Jan 12, 2022

Also, the test does not test for this, it tests "''hey''" which are single quotes (as used) and not apostrophes (as used in "y'all"), even though they are the same character (though do not have to be).

@kotp
Copy link
Member

kotp commented Jan 12, 2022

So I would say there are two aspects here, definition of "word" in regards to works that contain one or more apostrophes. The second aspect is that I think the test does not cover what the PR title states as well.

@kytrinyx
Copy link
Member

kytrinyx commented Apr 1, 2023

As part of reworking the word-count problem description (#2247) I've taken a look at this PR, and my feeling is that this test case doesn't really fundamentally make the exercise more interesting, and while it's potentially more correct, the definition of "correct" is vague enough that I'm not convinced that it makes enough of a difference.

The new problem definition frames the exercise in terms of subtitles for TV shows, which is always going to be fuzzy. Subtitles are not always complete sentences. They're not always correct in terms of grammar or punctuation or spelling. But in the use case described, it doesn't actually matter, because the goal is just "directionally, roughly useful", not "precisely correct".

So my conclusion here is that I'd prefer to close this without merging.

@kytrinyx kytrinyx closed this Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants