-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different suggested fix wrt capitalization #2470
Comments
So there's two things driving this, firstly do you want to correct the dictionary correction so it's in the correct case: codespell/codespell_lib/data/dictionary.txt Line 20034 in 101498d
Secondly the algorithm that tries to predict what case the correction should be in: codespell/codespell_lib/_codespell.py Lines 496 to 503 in 14af437
Obviously this can't currently deal with camel case, where your first line it capitalised, it offers that, otherwise it gets confused and gives up. I think we've generally gone for the typo being in lower case, but actually if we stored this typo in camel case and then made sure it was |
@peternewman @luzpaz
diff --git a/codespell_lib/_codespell.py b/codespell_lib/_codespell.py
index 1ed70e89..f0607c1e 100644
--- a/codespell_lib/_codespell.py
+++ b/codespell_lib/_codespell.py
@@ -454,10 +454,10 @@ def build_dict(filename, misspellings, ignore_words):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
for line in f:
[key, data] = line.split('->')
- # TODO for now, convert both to lower. Someday we can maybe add
- # support for fixing caps.
+ # Convert key to lower case.
+ # Do not modify data to lower case. Leave it as per dictionary.
key = key.lower()
- data = data.lower()
+ # data = data.lower()
if key in ignore_words:
continue
data = data.strip()
@@ -494,12 +494,16 @@ def is_text_file(filename):
def fix_case(word, fixword):
- if word == word.capitalize():
+ if fixword == fixword.upper():
+ # fixword is in all upper case as per dictionary. Eg. ASCII
+ return fixword
+ elif word == word.capitalize() and fixword == fixword.lower():
+ # word is capitalized and fixword in lower. Capitalize fixword. Eg. Pineapple
return fixword.capitalize()
elif word == word.upper():
+ # word is in all upper case, change fixword to upper. Eg. MONDAY
return fixword.upper()
- # they are both lower case
- # or we don't have any idea
+ # word is in lower, capitalize, CamelCase or whatever. Use fixword as per dictionary
return fixword
$ cat test.sh
#!/bin/sh
# Suggested word in all upper case in dictionary
echo "asscii" | codespell -
echo "Asscii" | codespell -
echo "ASSCII" | codespell -
# Misspelling coded in dictionary as lower
echo "tusday" | codespell -
echo "Tusday" | codespell -
echo "TUSDAY" | codespell -
# Misspelling coded in dictionary as Capitalize
echo "micosoft" | codespell -
echo "Micosoft" | codespell -
echo "MICOSOFT" | codespell -
# Misspelling and suggested both in lower case in dictionary
echo "pinapple" | codespell -
echo "Pinapple" | codespell -
# Suggested word in CamelCase in dictionary
echo "lesstiff" | codespell -
echo "lessTiff" | codespell -
echo "Lesstiff" | codespell -
echo "LessTiff" | codespell -
echo "LESSTIFF" | codespell - |
Thanks very much @vikivivi however would you mind doing it as a Pull Request please? You'll get credit, it's easier to comment or improve on it and the tests will be run automatically. We can also work on getting your test cases added to the code too. |
@peternewman I will trying to work on a pull request with my additional test cases. |
Great thanks @vikivivi . Although feel free to open the PR with the code as is above and others could help you with the test cases too. |
@peternewman Please see #2478 for latest patch changes. |
I'm getting two different suggested fixes for what seems to be the same misspelling with codespell 2.1.0:
See attachment lesstif.txt
BTW, according to Wikipedia the correct capitalization is LessTiff
The text was updated successfully, but these errors were encountered: