Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt parsing environment variables as UTF-8 #86936

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

akx
Copy link
Contributor

@akx akx commented Jan 7, 2024

What?

This PR makes the Unix get_environment() implementation attempt to decode values as UTF-8, and if that fails, return the value as-is, as has been the previous behavior. (Since the UNIX specification does not specify an encoding for environment variables, this sounds like the safe bet.)

This matches what the Windows get_environment() implementation does in spirit; it uses GetEnvironmentVariableW() and String::utf16.

Why?

I have a project which renders texts in short movie sequences based on inputs read from environment variables (as that seemed the easiest way to get values into the Godot process), and there was an audible groan when I saw the all-too-familiar UTF-8 mojibake for ä being interpreted as Latin-1:

With vanilla 4.2.1.stable.official.b09f793f5 on macOS:

pre

With this PR:

post

@AThousandShips
Copy link
Member

Since this applies to all these platforms we need to confirm this also works on all these platforms

The POSIX standard specifies the portable character set for values, for portability, unsure if this is a strict standard

@bruvzg
Copy link
Member

bruvzg commented Jan 7, 2024

The POSIX standard specifies the portable character set for values, for portability, unsure if this is a strict standard

Portable character set is ASCII without control chars, so it's compatible with UTF-8.

@AThousandShips
Copy link
Member

Unsure what happens if it's encoded differently but that'll just be differently garbled, so not a serious issue

Comment on lines +710 to +714
String s;
if (s.parse_utf8(val) == OK) {
return s;
}
return String(val); // Not valid UTF-8, so return as-is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String s;
if (s.parse_utf8(val) == OK) {
return s;
}
return String(val); // Not valid UTF-8, so return as-is
return String::utf8(val);

I'm not sure if this check is necessary, String(val) will work correctly only if string is Latin-1 (ASCII will be handled by UTF-8 parser), and this is an arbitrary assumption.

A proper way to check encoding is likely reading the value of LC_CTYPE variable. But for all modern systems it should be UTF-8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this idea from

if (pipe_out.parse_utf8(buf) == OK) {
(*r_pipe) += pipe_out;
} else {
(*r_pipe) += String(buf); // If not valid UTF-8 try decode as Latin-1
}
, introduced in #62735.

If you want me to change that to the suggested, sure, but this seemed more bullet-proof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK as is. It both cases, we only guarantee to parse UTF-8 and ASCII.

@akx
Copy link
Contributor Author

akx commented Feb 9, 2024

So... how would I get this merged? I'm currently stuck forward-porting this patch locally :)

@akx akx requested a review from bruvzg March 11, 2024 11:49
@AThousandShips AThousandShips changed the title Attempt to parse environment variables as UTF-8 Attempt parsing environment variables as UTF-8 Mar 11, 2024
@akien-mga akien-mga modified the milestones: 4.x, 4.3 Mar 11, 2024
@akien-mga akien-mga merged commit b0d07b1 into godotengine:master Mar 11, 2024
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants