fix: minor edits for the documentation of multi-language stats script

opengovsg · Feb 18, 2022 · bfdd9ea · bfdd9ea
1 parent 1997ce3
commit bfdd9ea
Showing 1 changed file with 5 additions and 4 deletions.
diff --git a/scripts/202202117_multi-language-stats/readme.md b/scripts/202202117_multi-language-stats/readme.md
@@ -1,6 +1,6 @@
 ## Context
 
-Script was written to understand multi-language patterns from fromSG use cases. Specifically requireent from Pearly were:
+Script was written to understand multi-language patterns from fromSG use cases. Specifically, requirements from Pearly were:
 
 - how many forms use another language apart from english
 - number of submissions for these forms
@@ -9,11 +9,11 @@ Script was written to understand multi-language patterns from fromSG use cases.
 - some examples of these forms would be great
 - any other related data you think might be interesting!
 
-Detecting language is hard, especially since many form use multiple language for the same text blocks. Singapore has 4 primary languages, so the script attempt to look for these specifically. Chinese and Tamil are easy to spot thanks to dedicated unicode character ranges. Distinguishing English and Malay is harder, so the script uses a very crude heuristic to locate words from these languages that we expect to see in forms.
+Detecting language is hard, especially since many forms use multiple languages in the same text blocks. Singapore has 4 primary languages though, so the script attempts to look for these specifically. Chinese and Tamil are easy to identify thanks their using dedicated unicode character ranges. Distinguishing English and Malay is harder, since they both use latin characters. For these 2, the script uses a very crude heuristic to locate words from these languages that we see frequently in forms.
 
-Anything with no match is categorized as `unknown`, and is likely English with funky words, like [this form](https://form.gov.sg/#!/5e0c9534df378700118f3349).
+Anything with no match is categorized as `unknown`, and is likely English with "funky words", like [this form](https://form.gov.sg/#!/5e0c9534df378700118f3349).
 
-The scripts output some form count overall by language, and then generates more details reports (with agency name, and number of submissions) for 2 categories:
+The script outputs some form counts, overall and by language, and then generates more details reports (with agency name, and number of submissions) for 2 categories:
 
 1. Forms where multiple languages are detected
 2. Form where only one language is detected, and it is not English
@@ -23,6 +23,7 @@ The results are printed to stdout as TSV content, so they can be copy/pasted int
 ## Install and run
 
 ```bash
+cd scripts/multi-language-stats
 cp .env.template .env.production
 
 # edit .env.production with the correct URU