Add micro-blog exercise (#1509)

* Add micro-blog exercise This is an exercise requiring students to truncate unicode strings. Solves #1507 * Micro-blog: Don't assume native English speaker Thank you @SaschaMann for the feedback and suggestion. #1509 (comment) > I don't like that this assumes the perspective of a native English > speaker. English is a foreign language to most of the world. Perhaps > something along the lines of "text in most of the world's languages and > scripts" would be a better description. * Micro-blog: Add tests for different languages Feedback from @SaschaMann #1509 (comment) > I think it would be nice to add some test cases that aren't emoji or > English - perhaps cases with germanic umlauts, cyrillic and/or greek > letters, historic scripts etc. - because that's one of the main uses > and goals of unicode. I've added German, Bulgarian, and Greek examples. All of them have non-English characters. None of these characters use multiple UTF-16 codepoints. As such, if you use a UTF-8 programming language you may first have trouble with the German example, but if you use a UTF-16 language you will probably first have trouble at the Emoji example. I chose not to add an example with historic scripts, because I'm not aware of any that display nicely in my terminal or text-editor. Perhaps in future some could be added. I wanted another example that would be problematic in UTF-16, so I added a poker hand example using playing cards. * Micro-blog: Add German truncated example Comically, it goes from "bear carpet" to "beards". @SaschaMann, thank you for finding the example for me: #1509 (comment) * Micro-blog: Add longer maths example Empty set is a proper subset of the natural numbers which is a proper subset of the integers, which is a proper subset of the rational numbers which is a proper subset of the reals which is a proper subset of the complex numbers. It remains true when truncated which is quite nice
exercism · Sep 21, 2019 · f928002 · f928002
1 parent f8aaffb
commit f928002
Show file tree

Hide file tree

Showing 3 changed files with 167 additions and 0 deletions.
diff --git a/exercises/micro-blog/canonical-data.json b/exercises/micro-blog/canonical-data.json
@@ -0,0 +1,125 @@
+{
+  "exercise": "micro-blog",
+  "version": "1.0.0",
+  "comments": [
+    "This exercise is only applicable to languages that use UTF-8, UTF-16",
+    "or other variable width Unicode compatible encoding as their internal",
+    "string representation.",
+    "",
+    "This exercise is probably too easy in languages that use Unicode aware",
+    "string slicing.",
+    "",
+    "When adding additional tests to the problem specification, consider that",
+    "in progress solutions might not fail due to UTF-8 and UTF-16",
+    "differences.",
+    "",
+    "Avoid adding tests that involve characters (graphemes) that are made up",
+    "of multiple characters, or introduce them as a more advanced step.",
+    "",
+    "Consider adding a track specific hint.md about if your language uses",
+    "UTF-8, UTF-16 or other for its internal string representation."
+  ],
+  "cases": [
+    {
+      "description": "Truncate a micro blog post",
+      "cases": [
+        {
+          "description": "English language short",
+          "property": "truncate",
+          "input": {
+            "phrase": "Hi"
+          },
+          "expected": "Hi"
+        },
+        {
+          "description": "English language long",
+          "property": "truncate",
+          "input": {
+            "phrase": "Hello there"
+          },
+          "expected": "Hello"
+        },
+        {
+          "description": "German language short (broth)",
+          "property": "truncate",
+          "input": {
+            "phrase": "brühe"
+          },
+          "expected": "brühe"
+        },
+        {
+          "description": "German language long (bear carpet → beards)",
+          "property": "truncate",
+          "input": {
+            "phrase": "Bärteppich"
+          },
+          "expected": "Bärte"
+        },
+        {
+          "description": "Bulgarian language short (good)",
+          "property": "truncate",
+          "input": {
+            "phrase": "Добър"
+          },
+          "expected": "Добър"
+        },
+        {
+          "description": "Greek language short (health)",
+          "property": "truncate",
+          "input": {
+            "phrase": "υγειά"
+          },
+          "expected": "υγειά"
+        },
+        {
+          "description": "Maths short",
+          "property": "truncate",
+          "input": {
+            "phrase": "a=πr²"
+          },
+          "expected": "a=πr²"
+        },
+        {
+          "description": "Maths long",
+          "property": "truncate",
+          "input": {
+            "phrase": "∅⊊ℕ⊊ℤ⊊ℚ⊊ℝ⊊ℂ"
+          },
+          "expected": "∅⊊ℕ⊊ℤ"
+        },
+        {
+          "description": "English and emoji short",
+          "property": "truncate",
+          "input": {
+            "phrase": "Fly 🛫"
+          },
+          "expected": "Fly 🛫"
+        },
+        {
+          "description": "Emoji short",
+          "property": "truncate",
+          "input": {
+            "phrase": "💇"
+          },
+          "expected": "💇"
+        },
+        {
+          "description": "Emoji long",
+          "property": "truncate",
+          "input": {
+            "phrase": "❄🌡🤧🤒🏥🕰😀"
+          },
+          "expected": "❄🌡🤧🤒🏥"
+        },
+        {
+          "description": "Royal Flush?",
+          "property": "truncate",
+          "input": {
+            "phrase": "🃎🂸🃅🃋🃍🃁🃊"
+          },
+          "expected": "🃎🂸🃅🃋🃍"
+        }
+      ]
+    }
+  ]
+}
diff --git a/exercises/micro-blog/description.md b/exercises/micro-blog/description.md
@@ -0,0 +1,39 @@
+You have identified a gap in the social media market for very very short
+posts. Now that Twitter allows 280 character posts, people wanting quick
+social media updates aren't being served. You decide to create your own
+social media network.
+
+To make your product noteworthy, you make it extreme and only allow posts
+of 5 or less characters. Any posts of more than 5 characters should be
+truncated to 5.
+
+To allow your users to express themselves fully, you allow Emoji and
+other Unicode.
+
+The task is to truncate input strings to 5 characters.
+
+## Text Encodings
+
+Text stored digitally has to be converted to a series of bytes.
+There are 3 ways to map characters to bytes in common use.
+* **ASCII** can encode English language characters. All
+characters are precisely 1 byte long.
+* **UTF-8** is a Unicode text encoding. Characters take between 1
+and 4 bytes.
+* **UTF-16** is a Unicode text encoding. Characters are either 2 or
+4 bytes long.
+
+UTF-8 and UTF-16 are both Unicode encodings which means they're capable of
+representing a massive range of characters including:
+* Text in most of the world's languages and scripts
+* Historic text
+* Emoji
+
+UTF-8 and UTF-16 are both variable length encodings, which means that
+different characters take up different amounts of space.
+
+Consider the letter 'a' and the emoji '😛'. In UTF-16 the letter takes
+2 bytes but the emoji takes 4 bytes.
+
+The trick to this exercise is to use APIs designed around Unicode
+characters (codepoints) instead of Unicode codeunits.
diff --git a/exercises/micro-blog/metadata.yml b/exercises/micro-blog/metadata.yml
@@ -0,0 +1,3 @@
+---
+title: "Micro Blog"
+blurb: "Given an input string, truncate it to 5 characters."