-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failures with cchardet-2.1.7 and chardet are installed #318
Comments
I ran into the same problem. Here's a snippet that can be used to show the differences between chardet and cchardet. import cchardet
import chardet
import glob
for path in glob.glob('tests/illformed/chardet/*'):
data = open(path, 'rb').read()
enc1 = chardet.detect(data)['encoding']
enc2 = cchardet.detect(data)['encoding']
print('%-40s %-20s %-20s %s' % (path, enc1, enc2, 'same' if enc1 == enc2 else 'different'))
|
maksverver
added a commit
to maksverver/feedparser
that referenced
this issue
Aug 29, 2024
feedparser imports cchardet or chardet depending on what's installed: https://github.com/kurtmckee/feedparser/blob/11990ea1d8791acc76c67781f1d2011daf0c3a99/feedparser/encodings.py#L37-L40 Although these libraries are mostly equivalent, they return slightly different encoding strings, even though both are correct and lead to succesful decoding. This change allows the tests to be run with either library by accepting both encoding names as correct. cchardet detects slightly different encodings from chardet,
maksverver
added a commit
to maksverver/feedparser
that referenced
this issue
Aug 29, 2024
feedparser imports cchardet or chardet depending on what's installed: https://github.com/kurtmckee/feedparser/blob/11990ea1d8791acc76c67781f1d2011daf0c3a99/feedparser/encodings.py#L37-L40 Although these libraries are mostly equivalent, they return slightly different encoding strings, even though both are correct and lead to succesful decoding. This change allows the tests to be run with either library by accepting both encoding names as correct.
maksverver
added a commit
to maksverver/feedparser
that referenced
this issue
Aug 29, 2024
feedparser imports cchardet or chardet depending on what's installed: https://github.com/kurtmckee/feedparser/blob/11990ea1d8791acc76c67781f1d2011daf0c3a99/feedparser/encodings.py#L37-L40 Although these libraries are mostly equivalent, they return slightly different encoding strings, even though both are correct and lead to succesful decoding. This change allows the tests to be run with either library by accepting both encoding names as correct.
maksverver
added a commit
to maksverver/feedparser
that referenced
this issue
Aug 29, 2024
feedparser imports cchardet or chardet depending on what's installed: https://github.com/kurtmckee/feedparser/blob/11990ea1d8791acc76c67781f1d2011daf0c3a99/feedparser/encodings.py#L37-L40 Although these libraries are mostly equivalent, they return slightly different encoding strings, even though both are correct and lead to succesful decoding. This change allows the tests to be run with either library by accepting both encoding names as correct.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When cchardet-2.1.7 and chardet-5.0.0 are both installed, the following tests fail.
FWICS two of them fail because of encoding name mismatches (expected is mixed-case, the value is uppercase), and two of them are recognized as a superset-encoding of the specified encoding (i.e. EUC-KR as UHC, and GB2312 as GB18030).
The text was updated successfully, but these errors were encountered: