-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt parsing environment variables as UTF-8 #86936
Conversation
Since this applies to all these platforms we need to confirm this also works on all these platforms The POSIX standard specifies the portable character set for values, for portability, unsure if this is a strict standard |
Portable character set is ASCII without control chars, so it's compatible with UTF-8. |
Unsure what happens if it's encoded differently but that'll just be differently garbled, so not a serious issue |
String s; | ||
if (s.parse_utf8(val) == OK) { | ||
return s; | ||
} | ||
return String(val); // Not valid UTF-8, so return as-is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String s; | |
if (s.parse_utf8(val) == OK) { | |
return s; | |
} | |
return String(val); // Not valid UTF-8, so return as-is | |
return String::utf8(val); |
I'm not sure if this check is necessary, String(val)
will work correctly only if string is Latin-1 (ASCII will be handled by UTF-8 parser), and this is an arbitrary assumption.
A proper way to check encoding is likely reading the value of LC_CTYPE
variable. But for all modern systems it should be UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this idea from
godot/drivers/unix/os_unix.cpp
Lines 515 to 519 in c8c483c
if (pipe_out.parse_utf8(buf) == OK) { | |
(*r_pipe) += pipe_out; | |
} else { | |
(*r_pipe) += String(buf); // If not valid UTF-8 try decode as Latin-1 | |
} |
If you want me to change that to the suggested, sure, but this seemed more bullet-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK as is. It both cases, we only guarantee to parse UTF-8 and ASCII.
So... how would I get this merged? I'm currently stuck forward-porting this patch locally :) |
Thanks! |
What?
This PR makes the Unix
get_environment()
implementation attempt to decode values as UTF-8, and if that fails, return the value as-is, as has been the previous behavior. (Since the UNIX specification does not specify an encoding for environment variables, this sounds like the safe bet.)This matches what the Windows
get_environment()
implementation does in spirit; it usesGetEnvironmentVariableW()
andString::utf16
.Why?
I have a project which renders texts in short movie sequences based on inputs read from environment variables (as that seemed the easiest way to get values into the Godot process), and there was an audible groan when I saw the all-too-familiar UTF-8 mojibake for
ä
being interpreted as Latin-1:With vanilla
4.2.1.stable.official.b09f793f5
on macOS:With this PR: