-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode with utf8 by default for non-text (or all?) content types #175
Comments
Good comments about this in I don't think it's the HTTP clients job to do the decoding. The header is no guarantee that the content will be of that type. |
@zoechi The client has to try to decode using a best effort approach. The http client is already decoding, but using the wrong encoding. There are 2 attributes in the response, body and bodyBytes.
Knowing that the content type is json should be enough for the client to interpret it, or at least to avoid intepreting it with the wrong encoding. This is not an edge case, json is probably the most popular format for api communication at the moment. You are right on the wrapper part, but if the headers are coherent, the client should be able to handle the body properly too. |
So, the situation is a bit murky, so bear with me.. The first layer in any of this is HTTP. JSON, in turn, explicitly does not define the charset parameter. Next, the Dart
So as discussed above, HTTP in absence of a defined charset is assumed to be encoded in ISO-8859-1 (Latin-1). And The problem of course is that there are servers out there that do not set charset for JSON (which is valid), but which is also a bit of a grey area in between the two specs:
A "smart" HTTP client could choose to follow the JSON definition closer than the HTTP definition and simply say any As for this bug I'm inclined to say that HttpClientRequest request = await HttpClient().post(_host, 4049, path) /*1*/
..headers.contentType = ContentType.json /*2*/
..write(jsonEncode(jsonData)); /*3*/
HttpClientResponse response = await request.close(); /*4*/
await response.transform(utf8.decoder /*5*/).forEach(print); Hope it helps. I'll close this issue assuming all open questions are resolved. |
Got it. Thanks for this. |
Note that applies to "text" media types. JSON is "application/json". Correct clients should treat JSON (or any other non-text media type) responses as bytestrings, rather than text. (ie. use |
@tomchristie, right. I also elaborated a bit on this in #186, but basically saying the same but with more words. :) |
I still believe this is a bad behavior and added comments to #186 |
…o avoid text garbling. The original processing is using `response.body` to deserialize as json. However, this is decoded by latin1 if the header contains only "application/json" instead of "application/json; charset=utf-8". Because of this behavior, if the response body is encoded UTF-8 but the headers doesn't contain charset, the body will garbling. cf: dart-lang/http#175 Since playframework 2.6 returns "Content-Type: application/json" without "charset=utf-8", I changed this parsing algolithm.
…o avoid text garbling. (#1700) * fix: force to decode as utf-8 when header contains application/json to avoid text garbling. The original processing is using `response.body` to deserialize as json. However, this is decoded by latin1 if the header contains only "application/json" instead of "application/json; charset=utf-8". Because of this behavior, if the response body is encoded UTF-8 but the headers doesn't contain charset, the body will garbling. cf: dart-lang/http#175 Since playframework 2.6 returns "Content-Type: application/json" without "charset=utf-8", I changed this parsing algolithm. * fix: force to decode as utf-8 when header contains application/json to avoid text garbling on error.
…o avoid text garbling. (OpenAPITools#1700) * fix: force to decode as utf-8 when header contains application/json to avoid text garbling. The original processing is using `response.body` to deserialize as json. However, this is decoded by latin1 if the header contains only "application/json" instead of "application/json; charset=utf-8". Because of this behavior, if the response body is encoded UTF-8 but the headers doesn't contain charset, the body will garbling. cf: dart-lang/http#175 Since playframework 2.6 returns "Content-Type: application/json" without "charset=utf-8", I changed this parsing algolithm. * fix: force to decode as utf-8 when header contains application/json to avoid text garbling on error.
By default, the Dart HTTP decoding will assume that the charset is ISO-8859-1 (latin-1). This causes emojis, certain apostrophe characters, etc to not function correctly. See here for extended discussion on why: dart-lang/http#175 Essentially, this occurs when the content-type header returns without a charset.
By default, the Dart HTTP decoding will assume that the charset is ISO-8859-1 (latin-1). This causes emojis, certain apostrophe characters, etc to not function correctly. See here for extended discussion on why: dart-lang/http#175 Essentially, this occurs when the content-type header returns without a charset.
This error also happens when the content-type is |
I'm stuck with a server I don't have hands on and that does not return header specifying that the json content is using utf8. That is implicitely expected |
@cskau-g the interpretation that HTTP uses ISO for text content is outdated and that requirement has been removed from the HTTP spec:
Furthermore, the relevant part of the current spec does not mention at all a default charset to be applied to textual representations for any media-type: https://tools.ietf.org/html/rfc7231#section-3.1.1.2 The JSON RFC, meanwhile, determines that the
It has also been amended to make UTF-8 mandatory in the case of data transmitted over a network, which is the primary use-case for HTTP: https://tools.ietf.org/html/rfc8259#appendix-A
Section 8.1:
Hopefully, this is enough to show that the currently most widely used data exchange format on the internet is not supported correctly by the Dart HTTP Server. Please consider changing this behavior as keeping it as it is is only going to hurt Dart's standing for no good reason. |
Reopening to track - I do think we should consider changing the defaults since most users are likely to benefit. Note that the expected pattern to use today when you know the result is json is |
Changing the default for all responses, or even for non-text responses, is breaking. At least one internal usage is impacted. Changing the default only when the content type is |
As stated before in #186, this behavior is wrong and should be corrected. It doesn't matter if it breaks bc or not. Release a new major if that is necessary. In 2018 we were talking about this with a lot of effort on explaining HTTP and it was just ignored. If it was taken into consideration at the time, everything would have been adopted today. How many years more do we need to wait? |
I believe both FF/Chrome, for quite a while, treats Some systems have even deprecated Processing JSON as non-UTF8 by default makes no reasonable sense. Please make |
I'd like to raise this issue again -- like @renatoathaydes noted, RFC7231 (circa 2014) supersedes 2616 (circa 1999) to make interpretation of I understand the suggested way to access json data from a response is to use the |
Hi, is there any status update on making utf8 the default for decoding json responses? It's not fun to discover that |
You should be using |
Thanks @0xNF for the correction, I mistyped (wouldn't have made sense to |
RFC 8259, the current RFC reference for
|
@daenney I mentioned this almost 4 years ago: #175 (comment) I suspect Google would have too much work to do if this was changed, hence it will probably stay as it is even when it's clearly failing to follow the specs. |
I am requesting some information from the a server, which returns the following (using postman):
Headers:
Allow →GET, HEAD, OPTIONS
CF-RAY →439e4801cf3db955-MIA
Connection →keep-alive
Content-Encoding →gzip
Content-Type →application/json
Body:
{
"name": "SARA LUCIA OSSA PEÑA",
}
But, using http, I am getting the following (using a simple request get('url')):
{
"name": "SARA LUCIA OSSA PE�A",
}
To fix this, I had to do something like:
UTF8.decode(response.bodyBytes)
This works as expected, and the information is retrieved fine. This, however, is a pain to setup (and inconsistent with post, where utf8 is used as default encoding).
Is there a better way to handle this? An argument to the get parameter to force encoding? shouldn't application/json assume utf8 by default?
I came up with the solution after reading https://pub.dartlang.org/documentation/http/latest/http/Response-class.html and the body property. Probably it is encoding the body with a wrong format.
Anyway, thanks for the hard work. Awesome library.
The text was updated successfully, but these errors were encountered: