-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to assign encoding of response content? #26
Comments
Hi, thank you for filing the issue, I will take a look. Normally we would read the encoding from the HTTP headers, but maybe in this case it does not quite work and we can think of alternatives. |
I checked the response from this url that hadn't an encoding value in the response headers so the current code can't get the correct encoding. Maybe it's an alternative way to check meta values of the response body, such as: |
@winglight in this case, you can use indexOf function (and other string analysis functions) of Buffer to digest the encoding from body. Please pay particular attention that by default Node.js doesn't support too many character encodings, and big5 is not in the supporting list, so you may need to find decoder/transcoder before processing big5 encoded content given most likely your code is working with utf-8. |
same problem here with a page contains |
+1 |
When you don't set the encoding, the crawler will not do any encoding work for you (actually Node.js itself does not support other encoding except |
I found wrong charset from the response content from non-utf8 web page. Here's a url for example: http://www.cartoomad.com/comic/276400012051002.html
The text was updated successfully, but these errors were encountered: