acronym: add leading / trailing and multiple separator case #1432

yawpitch · 2019-01-05T16:04:07Z

Currently the acronym tests do not cover inputs with leading, trailing, or repeated separator characters, and many solutions presented will fail if these are encountered.

A student suggested this as a good test string: " - Annoying string ending - with - multiple separators - " should return "ASEWMS".

The text was updated successfully, but these errors were encountered:

rpottsoh · 2019-01-05T16:28:28Z

I think a single PR that closes #1431 and #1432 would suffice.

yawpitch · 2019-01-05T16:49:41Z

Definitely. I'm just not in a position to provide it right now. Also I think we should consider what should the acronym be for inputs like "3 Men And An _nderscore"? Basically we should positively state what we consider an acronym or if we don't want to worry about those sort of inputs affirmatively say they won't be provided.

…

On Jan 5, 2019, 16:28 +0000, Ryan Potts ***@***.***>, wrote: I think a single PR that closes #1431 and #1432 would suffice. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rpottsoh · 2019-01-05T17:05:12Z

Also I think we should consider what should the acronym be for inputs like "3 Men And An _nderscore"?

Interesting. I was curious so I tried this with my solution; I get MAAN.

Basically we should positively state what we consider an acronym or if we don't want to worry about those sort of inputs affirmatively say they won't be provided.

The description is a little weak and could probably benefit from a definition of some sort that defines what is considered a reasonable phrase from which an acronym could be derived.

yawpitch · 2019-01-05T17:17:05Z

Definitely. Personally I'd say something like: "A valid input will be an all-ASCII word or phrase, possibly containing punctuation, and possibly empty. For the purposes of this excercise you can expect that any word given will begin with an ASCII letter, but may be in any case. Hyphenated words are considered distinct words, for instance 'Self-Contained Underwater Breathing Apparatus' becomes 'SCUBA'. All other punctuation should be ignored, and an empty string or string without any words should return an empty string." Does that map to all the languages that have implemented acronym though? M

…

On Jan 5, 2019, 17:05 +0000, Ryan Potts ***@***.***>, wrote: > Also I think we should consider what should the acronym be for inputs like "3 Men And An _nderscore"? Interesting. I was curious so I tried this with my solution; I get MAAN. > Basically we should positively state what we consider an acronym or if we don't want to worry about those sort of inputs affirmatively say they won't be provided. The description is a little weak and could probably benefit from a definition of some sort that defines what is considered a reasonable phrase from which an acronym could be derived. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rpottsoh · 2019-01-05T17:33:17Z

Does that map to all the languages that have implemented acronym though?

Don't know, but likely.

I think your statement sums things up nicely though. 👍

ErikSchierboom · 2019-01-10T11:51:16Z

I think a more specific description of what defines an acronym would be much appreciated.

sshine · 2019-01-10T20:58:54Z

It's the first letter of each word. For hyphenated words, include the letter after the hyphen(s).

Why does it need to be more specific than this?

yawpitch · 2019-01-11T06:54:28Z

For one that's not the meaning of "acronym" -- the exercise name -- and certainly not "abbreviate" -- the name of the actual property under test in the exercise -- in all languages and locales. In fact it's really only the meaning in American and British English, though it's used more or less the same in a few other territories like Russia (in Cyrrilic) and Vietnam. But fair enough, since we're already assuming ASCII let's assume American initialism "rules" apply... what happens with non-letter characters that start words? The generally accepted "rules" are silent on this, but there are certainly initialisms with numbers (HTML5, CSS3, 3G). For instance many solutions in Python employ use a regex with the \w special character, which in Python 3 allows not only digits and the underscore, but also any Unicode code points that could be part of a word in any locale. Potentially that means acronyms can include Kanji. Should the student be required to limit it to ASCII? And what constitutes a valid separator pattern? Is it just spaces and hyphens immediately preceding a letter? Or is it any run of punctuation _except_ a single conjoining apostrophe? The tests are few and not particularly exhaustive and the problem is loosely defined... it's already lead to more wheel spinning than it desterves because it's not more clearly delineated. But if that definition is "the first ASCII letter of each word that's preceded by the start of the sentence, a single space, or a single hyphen" as implied by the tests, that's fine, we just need to state it clearly.

…

On Jan 10, 2019, 20:58 +0000, Simon Shine ***@***.***>, wrote: It's the first letter of each word. For hyphenated words, include the letter after the hyphen(s). Why does it need to be more specific than this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

sshine · 2019-01-12T09:16:49Z

Ok, let's state it clearly then.

rpottsoh · 2019-01-12T16:49:47Z

Realize that #1436 has been merged recently, deals with underscores. I am gathering that this issue is maybe more for advocating a change to the description.md than to the canonical data.

sshine · 2019-01-13T21:00:02Z

The assumption of ASCII is not unique to this exercise. So "The first letter of each word" should be sufficient here.

sshine · 2019-01-13T21:01:01Z

this issue is maybe more for advocating a change to the description.md

At least a part of the discussion has focused on that.

yawpitch · 2019-01-14T05:31:28Z

The assumption of ASCII is not unique to this exercise. So "The first letter of each word" should be sufficient here.

I'd tend to argue that that assumption is a bug, not a feature of Exercism, and that where it's relevant to the solution the bias should be explicitly called out.

As acronym is an exercise that will very commonly be approached with regular expressions -- in Python it's a core exercise and tagged as the first to involve regex -- the ASCII limitation can be very important to the solution.

For instance in Python 3 without complying the regex with the re.ASCII flag the \w special character will match all of E and È and É and Ę... should those all be included? Should they be excluded? I don't know or have a particular opinion, but us not expressing a preference for ASCII-only solutions leaves it as UB, and UB is pretty confusing for a learner, especially one for whom English isn't a first language and who isn't necessarily typing in ASCII.

If we explicitly limit the character set we make the student's lives easier, and we also get the opportunity to present a bonus exercise in which they extend to handle something like "L'École Française du Bristol", which according to that school should abbreviate to EFB.

emcoding · 2019-02-15T17:17:45Z

Related: #1463

rpottsoh added the new test case idea label Jan 5, 2019

siebenschlaefer mentioned this issue Jun 22, 2022

feature: added test case to acronym exercise to cover missed case exercism/cpp#523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acronym: add leading / trailing and multiple separator case #1432

acronym: add leading / trailing and multiple separator case #1432

yawpitch commented Jan 5, 2019

rpottsoh commented Jan 5, 2019

yawpitch commented Jan 5, 2019 via email

rpottsoh commented Jan 5, 2019 •

edited

Loading

yawpitch commented Jan 5, 2019 via email •

edited by rpottsoh

Loading

rpottsoh commented Jan 5, 2019

ErikSchierboom commented Jan 10, 2019

sshine commented Jan 10, 2019

yawpitch commented Jan 11, 2019 via email

sshine commented Jan 12, 2019

rpottsoh commented Jan 12, 2019

sshine commented Jan 13, 2019

sshine commented Jan 13, 2019

yawpitch commented Jan 14, 2019

emcoding commented Feb 15, 2019

acronym: add leading / trailing and multiple separator case #1432

acronym: add leading / trailing and multiple separator case #1432

Comments

yawpitch commented Jan 5, 2019

rpottsoh commented Jan 5, 2019

yawpitch commented Jan 5, 2019 via email

rpottsoh commented Jan 5, 2019 • edited Loading

yawpitch commented Jan 5, 2019 via email • edited by rpottsoh Loading

rpottsoh commented Jan 5, 2019

ErikSchierboom commented Jan 10, 2019

sshine commented Jan 10, 2019

yawpitch commented Jan 11, 2019 via email

sshine commented Jan 12, 2019

rpottsoh commented Jan 12, 2019

sshine commented Jan 13, 2019

sshine commented Jan 13, 2019

yawpitch commented Jan 14, 2019

emcoding commented Feb 15, 2019

rpottsoh commented Jan 5, 2019 •

edited

Loading

yawpitch commented Jan 5, 2019 via email •

edited by rpottsoh

Loading