-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acronym: add leading / trailing and multiple separator case #1432
Comments
Definitely. I'm just not in a position to provide it right now.
Also I think we should consider what should the acronym be for inputs like "3 Men And An _nderscore"?
Basically we should positively state what we consider an acronym or if we don't want to worry about those sort of inputs affirmatively say they won't be provided.
…
|
Interesting. I was curious so I tried this with my solution; I get
The description is a little weak and could probably benefit from a definition of some sort that defines what is considered a reasonable phrase from which an acronym could be derived. |
Definitely. Personally I'd say something like: "A valid input will be an all-ASCII word or phrase, possibly containing punctuation, and possibly empty. For the purposes of this excercise you can expect that any word given will begin with an ASCII letter, but may be in any case. Hyphenated words are considered distinct words, for instance 'Self-Contained Underwater Breathing Apparatus' becomes 'SCUBA'. All other punctuation should be ignored, and an empty string or string without any words should return an empty string."
Does that map to all the languages that have implemented acronym though?
M
…On Jan 5, 2019, 17:05 +0000, Ryan Potts ***@***.***>, wrote:
> Also I think we should consider what should the acronym be for inputs like "3 Men And An _nderscore"?
Interesting. I was curious so I tried this with my solution; I get MAAN.
> Basically we should positively state what we consider an acronym or if we don't want to worry about those sort of inputs affirmatively say they won't be provided.
The description is a little weak and could probably benefit from a definition of some sort that defines what is considered a reasonable phrase from which an acronym could be derived.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Don't know, but likely. I think your statement sums things up nicely though. 👍 |
I think a more specific description of what defines an acronym would be much appreciated. |
It's the first letter of each word. For hyphenated words, include the letter after the hyphen(s). Why does it need to be more specific than this? |
For one that's not the meaning of "acronym" -- the exercise name -- and certainly not "abbreviate" -- the name of the actual property under test in the exercise -- in all languages and locales. In fact it's really only the meaning in American and British English, though it's used more or less the same in a few other territories like Russia (in Cyrrilic) and Vietnam. But fair enough, since we're already assuming ASCII let's assume American initialism "rules" apply... what happens with non-letter characters that start words? The generally accepted "rules" are silent on this, but there are certainly initialisms with numbers (HTML5, CSS3, 3G). For instance many solutions in Python employ use a regex with the \w special character, which in Python 3 allows not only digits and the underscore, but also any Unicode code points that could be part of a word in any locale. Potentially that means acronyms can include Kanji. Should the student be required to limit it to ASCII?
And what constitutes a valid separator pattern? Is it just spaces and hyphens immediately preceding a letter? Or is it any run of punctuation _except_ a single conjoining apostrophe?
The tests are few and not particularly exhaustive and the problem is loosely defined... it's already lead to more wheel spinning than it desterves because it's not more clearly delineated. But if that definition is "the first ASCII letter of each word that's preceded by the start of the sentence, a single space, or a single hyphen" as implied by the tests, that's fine, we just need to state it clearly.
…On Jan 10, 2019, 20:58 +0000, Simon Shine ***@***.***>, wrote:
It's the first letter of each word. For hyphenated words, include the letter after the hyphen(s).
Why does it need to be more specific than this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, let's state it clearly then. |
Realize that #1436 has been merged recently, deals with underscores. I am gathering that this issue is maybe more for advocating a change to the description.md than to the canonical data. |
The assumption of ASCII is not unique to this exercise. So "The first letter of each word" should be sufficient here. |
At least a part of the discussion has focused on that. |
I'd tend to argue that that assumption is a bug, not a feature of Exercism, and that where it's relevant to the solution the bias should be explicitly called out. As For instance in Python 3 without complying the regex with the re.ASCII flag the \w special character will match all of E and È and É and Ę... should those all be included? Should they be excluded? I don't know or have a particular opinion, but us not expressing a preference for ASCII-only solutions leaves it as UB, and UB is pretty confusing for a learner, especially one for whom English isn't a first language and who isn't necessarily typing in ASCII. If we explicitly limit the character set we make the student's lives easier, and we also get the opportunity to present a bonus exercise in which they extend to handle something like "L'École Française du Bristol", which according to that school should abbreviate to EFB. |
Related: #1463 |
Currently the
acronym
tests do not cover inputs with leading, trailing, or repeated separator characters, and many solutions presented will fail if these are encountered.A student suggested this as a good test string: " - Annoying string ending - with - multiple separators - " should return "ASEWMS".
The text was updated successfully, but these errors were encountered: