Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case handling for various capitalization issues #2478

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vikivivi
Copy link
Contributor

@vikivivi vikivivi commented Aug 24, 2022

  • Fix multi words capitalization, camelCase, proper nouns, abbreviation
  • Do not change suggested words in dictionary to lower case during build_dict()
  • Capitalization decision is decided in fix_case()

Below is my basic simple test.sh and my dictionary_test.txt

$ cat dictionary_test.txt 
asscii->ASCII
austrailia->Australia
Micosoft->Microsoft
pinapple->pineapple
uspported->supported, unsupported,
skipt->skip, Skype, skipped,
lesstiff->LessTif
mangodb->MongoDB
ebya->eBay
mariadb->MariaDB
mysql->MySQL, mysql,
$ cat test.sh
#!/bin/sh

TEST_FILE="/tmp/test-input.txt"
rm -f "${TEST_FILE}"

# Test abbreviation, acronym, initialism: Suggested word coded as
#      upper case in dictionary.
test_abbreviation_01() {
    echo "asscii" >> "${TEST_FILE}"
    echo "Asscii" >> "${TEST_FILE}"
    echo "AssCii" >> "${TEST_FILE}"
    echo "ASSCII" >> "${TEST_FILE}"
}

# Test proper nouns: Misspelled coded as lower case in dictionary.
test_proper_nouns_lower_01() {
    echo "austrailia" >> "${TEST_FILE}"
    echo "Austrailia" >> "${TEST_FILE}"
    echo "AustRailia" >> "${TEST_FILE}"
    echo "AUSTRAILIA" >> "${TEST_FILE}"
}

# Test proper nouns, brand names: Misspelled coded as capitalize
#      in dictionary.
test_proper_nouns_capitalize_01() {
    echo "micosoft" >> "${TEST_FILE}"
    echo "Micosoft" >> "${TEST_FILE}"
    echo "MicoSoft" >> "${TEST_FILE}"
    echo "MICOSOFT" >> "${TEST_FILE}"
}

# Test typical single: Both misspelled and suggested word both coded
#      as lower case in dictionary.
test_typical_single_01() {
    echo "pinapple" >> "${TEST_FILE}"
    echo "Pinapple" >> "${TEST_FILE}"
}

# Test typical multiple: Both misspelled and multiple suggested words
#      both coded as lower case in dictionary.
test_typical_multiple_01() {
    echo "uspported" >> "${TEST_FILE}"
    echo "Uspported" >> "${TEST_FILE}"
    echo "USPPORTED" >> "${TEST_FILE}"
}

# Test typical multiple & mix: Misspelled coded in lower. Multiple
#      suggested words coded as lower & capitalize case in dictionary.
test_typical_multiple_mix_01() {
    echo "skipt" >> "${TEST_FILE}"
    echo "Skipt" >> "${TEST_FILE}"
    echo "SKIPT" >> "${TEST_FILE}"
}

# Test CamelCase basic: Suggested word coded as CamelCase in dictionary.
test_camelCase_01() {
    echo "lesstiff" >> "${TEST_FILE}"
    echo "lessTiff" >> "${TEST_FILE}"
    echo "Lesstiff" >> "${TEST_FILE}"
    echo "LessTiff" >> "${TEST_FILE}"
    echo "LESSTIFF" >> "${TEST_FILE}"
}

# Test CamelCase brand names: Suggested word coded as CamelCase
#      in dictionary.
test_camelCase_02() {
    echo "mangodb" >> "${TEST_FILE}"
    echo "mangoDb" >> "${TEST_FILE}"
    echo "mangoDB" >> "${TEST_FILE}"
    echo "Mangodb" >> "${TEST_FILE}"
    echo "MangoDb" >> "${TEST_FILE}"
    echo "MangoDB" >> "${TEST_FILE}"
}

# Test CamelCase brand names: Suggested word coded as CamelCase
#      in dictionary.
test_camelCase_03() {
    echo "ebya" >> "${TEST_FILE}"
    echo "eBya" >> "${TEST_FILE}"
    echo "Ebya" >> "${TEST_FILE}"
    echo "EBya" >> "${TEST_FILE}"
    echo "EBYA" >> "${TEST_FILE}"
}

# Special Test CamelCase, brand names: Misspelled is correct spelling
#      but incorrect case. Suggested word is coded as CamelCase in
#      dictionary. For custom dictionary only.
test_valid_word_camelCase_01() {
    echo "mariadb" >> "${TEST_FILE}"
    echo "mariaDb" >> "${TEST_FILE}"
    echo "mariaDB" >> "${TEST_FILE}"
    echo "Mariadb" >> "${TEST_FILE}"
    echo "MariaDb" >> "${TEST_FILE}"
    echo "MariaDB" >> "${TEST_FILE}"
}

# Special Test CamelCase, brand names: Misspelled is correct spelling
#      but incorrect case. Multiple suggested words are coded as CamelCase
#      and lower case in dictionary. For custom dictionary only.
test_valid_word_camelCase_02() {
    echo "mysql" >> "${TEST_FILE}"
    echo "mySql" >> "${TEST_FILE}"
    echo "mySQL" >> "${TEST_FILE}"
    echo "Mysql" >> "${TEST_FILE}"
    echo "MySql" >> "${TEST_FILE}"
    echo "MySQL" >> "${TEST_FILE}"
}

run_codespell() {
    codespell -D dictionary_test.txt "${TEST_FILE}"
    echo ""
    codespell -D dictionary_test.txt -w "${TEST_FILE}"
}


test_abbreviation_01
test_proper_nouns_lower_01
test_proper_nouns_capitalize_01
test_typical_single_01
test_typical_multiple_01
test_typical_multiple_mix_01
test_camelCase_01
test_camelCase_02
test_camelCase_03
test_valid_word_camelCase_01
test_valid_word_camelCase_02

run_codespell

@luzpaz
Copy link
Collaborator

luzpaz commented Aug 28, 2022

@peternewman please review at your convenience

@peternewman
Copy link
Collaborator

I've finally merged #2223 which resolves some related issues, so can you merge that in @vikivivi ?

It's also got some test infrastructure we can make use of and extend!

* Fix multi words capitalization, camelCase, proper nouns, abbreviation
* Do not change suggested words in dictionary to lower case during build_dict()
* Capitalization decision is decided in fix_case()
@vikivivi vikivivi force-pushed the patch-case-handling branch from 0535c1f to ba2a4e8 Compare September 3, 2022 10:32
@vikivivi
Copy link
Contributor Author

vikivivi commented Sep 3, 2022

@peternewman I have Git rebase my PR branch and generalise #2223 test infrastructure for various case handling checks. The test case from #2223 is re-adapted to this generalised function. All my test cases are added as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants