Skip to content

Commit

Permalink
Micro-blog: Add tests for different languages
Browse files Browse the repository at this point in the history
Feedback from @SaschaMann
#1509 (comment)

> I think it would be nice to add some test cases that aren't emoji or
> English - perhaps cases with germanic umlauts, cyrillic and/or greek
> letters, historic scripts etc. - because that's one of the main uses
> and goals of unicode.

I've added German, Bulgarian, and Greek examples. All of them have
non-English characters.

None of these characters use multiple UTF-16 codepoints. As such, if you
use a UTF-8 programming language you may first have trouble with the
German example, but if you use a UTF-16 language you will probably first
have trouble at the Emoji example.

I chose not to add an example with historic scripts, because I'm not
aware of any that display nicely in my terminal or text-editor. Perhaps
in future some could be added.

I wanted another example that would be problematic in UTF-16, so I added
a poker hand example using playing cards.
  • Loading branch information
ccouzens committed Apr 20, 2019
1 parent d2f65b4 commit 786c1e6
Showing 1 changed file with 43 additions and 2 deletions.
45 changes: 43 additions & 2 deletions exercises/micro-blog/canonical-data.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
"This exercise is probably too easy in languages that use Unicode aware",
"string slicing.",
"",
"When adding additional tests to the problem specification, prefer tests",
"that pass and fail the same for UTF-8 and UTF-16.",
"When adding additional tests to the problem specification, consider that",
"in progress solutions might not fail due to UTF-8 and UTF-16",
"differences.",
"",
"Avoid adding tests that involve characters (graphemes) that are made up",
"of multiple characters, or introduce them as a more advanced step.",
Expand Down Expand Up @@ -38,6 +39,38 @@
},
"expected": "Hello"
},
{
"description": "German language short (broth)",
"property": "truncate",
"input": {
"phrase": "brühe"
},
"expected": "brühe"
},
{
"description": "Bulgarian language short (good)",
"property": "truncate",
"input": {
"phrase": "Добър"
},
"expected": "Добър"
},
{
"description": "Greek language short (health)",
"property": "truncate",
"input": {
"phrase": "υγειά"
},
"expected": "υγειά"
},
{
"description": "Maths short",
"property": "truncate",
"input": {
"phrase": "a=πr²"
},
"expected": "a=πr²"
},
{
"description": "English and emoji short",
"property": "truncate",
Expand All @@ -61,6 +94,14 @@
"phrase": "❄🌡🤧🤒🏥🕰😀"
},
"expected": "❄🌡🤧🤒🏥"
},
{
"description": "Royal Flush?",
"property": "truncate",
"input": {
"phrase": "🃎🂸🃅🃋🃍🃁🃊"
},
"expected": "🃎🂸🃅🃋🃍"
}
]
}
Expand Down

0 comments on commit 786c1e6

Please sign in to comment.