-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix chardet test and add ordering option #11621
Conversation
Signed-off-by: Andrew Thornton <[email protected]>
Signed-off-by: Andrew Thornton <[email protected]>
Signed-off-by: Andrew Thornton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is missing Latvian windows-1257
@lafriks it appears that github.com/gogs/chardet doesn't detect or assign to windows-1257 |
Signed-off-by: Andrew Thornton <[email protected]>
This is the right direction, of course! However, I'm concerned that this will need to be done every time. IMHO the best way to address this is to assign the priority inside gogs/chardet (we would need to take over that library). I elaborated about this here: #8474 (comment) |
Signed-off-by: Andrew Thornton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I believe this is a good compromise.
Nice work!!
modules/charset/charset.go
Outdated
priority, has := setting.Repository.DetectedCharsetScore[strings.ToLower(strings.TrimSpace(topResult.Charset))] | ||
for _, result := range results { | ||
if result.Confidence == topConfidence { | ||
resultPriority, resultHas := setting.Repository.DetectedCharsetScore[strings.ToLower(strings.TrimSpace(result.Charset))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we could attempt to normalize our list casing to the lib's casing in order to avoid calling ToLower()
in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehehe have you looked at the libs casing? I copied the names exactly in to the setting, app.ini.sample and config cheat sheet. There's literally no fixed pattern for them.
One thing (sorry about the afterthought): if |
But not before utf-8 |
Heh ANSI charset simply overrides any detected charset that isn't utf8... It's not a very sensible option. I'm happy to make a breaking change by removing it and adding a default undetected charset or something like that? |
@zeripath on second thought you're right, it would be breaking. Charset detection is a sensitive issue. Maybe we should leave the (*) We could deprecate it, but ANSI characters sets should already be a thing of the past. I believe most people dealing with them have really no choice, as they need to deal with ages of old code. Deprecating the option in a way they can't force anymore doesn't sound like a nice thing to do. |
Sorry, bad operation! 😓 |
Signed-off-by: Andrew Thornton <[email protected]>
Ping LG-TM |
* Fix chardet test and add ordering option Signed-off-by: Andrew Thornton <[email protected]> * minor fixes Signed-off-by: Andrew Thornton <[email protected]> * remove log Signed-off-by: Andrew Thornton <[email protected]> * remove log2 Signed-off-by: Andrew Thornton <[email protected]> * only iterate through top results Signed-off-by: Andrew Thornton <[email protected]> * Update docs/content/doc/advanced/config-cheat-sheet.en-us.md * slight restructure of for loop Signed-off-by: Andrew Thornton <[email protected]> Co-authored-by: techknowlogick <[email protected]>
Add DETECTED_CHARSET_ORDER to repository config to allow setting of tie-breaking for detected charset ordering.
Fixes intermittent failure of chardet test
Signed-off-by: Andrew Thornton [email protected]