-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with json::parse decoding codepoints #3142
Comments
According to Wikipedia, the codepoint for the degree sign {"property":"Temperature is 22\u00b0C"} roundtrips correctly: #include <iostream>
#include <nlohmann/json.hpp>
using json = nlohmann::json;
int main() {
// parse with \u00b0
std::string json_string = R"({"property":"Temperature is 22\u00b0C"})";
json j = json::parse(json_string);
std::cout << j << std::endl;
// parse with Unicode and dump with "ensure_ascii=true"
std::string json_string2 = R"({"property":"Temperature is 22°C"})";
json j2 = json::parse(json_string);
std::cout << j2.dump(-1, ' ', true) << std::endl;
} |
(I you mixed the code point |
Thank you for the prompt response and proposed fix. Correct, I am using UTF8 encoding on the embedded system. I read here that the library just supports UTF8, so I thought that was the way to go.
Since I get the string as bytes from the embedded system (via TCP/IP), processing it on the PC must not rely on any pre-processing literals, such What would you suggest to be the best encoding practice to have a smooth 2-way transition of contents between your library, and the bare-metal embedded system using UTF8 in C? i.e.:
Thanks for any advise! |
The Unicode codepoint
That said, the library expects UTF-8 encoding. So a string needs to have the bytes std::string s;
s.push_back(0xc2);
s.push_back(0xb0); In any case, you should not mix the |
I'm now feeding flat (non-escaped) UTF8 bytes ( std::string json_from_embedded_system = "{\"property\":\"Temperature is 22°C\"}";
nlohmann::json j = json::parse(json_from_embedded_system);
cout << j.dump() << std::endl; // returns `{"property":"Temperature is 22°C"}`
cout << j.dump(-1, ' ', true) << std::endl; // returns `{"property":"Temperature is 22\u00b0C"}` I think this can be closed on my side, thank you for clarifying the input & output encoding options of the library! |
I narrowed down a basic issue with
json::parse
in decoding escaped ASCII characters. The encoding is done in "bare metal C" by embedded software, using a standard UTF8 encoding algorithm, and I need to read that JSON string back from a PC:"{\"property\":\"Temperature is 22°C\"}"
"{\"property\":\"Temperature is 22\\uc2b0C\"}"
(°
is encoded as unicode0xc2
+0xb0
)When I test-serialize the parsed JSON with:
The content of
sb
is:Problem is those two extra
ìŠ
chars, sincejson::parse
treats the encoded character as 3 bytes, instead of 2. This is the code portion where it does this:Thanks for any hint!
The text was updated successfully, but these errors were encountered: