Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document common options for output encoding of Windows tools #353

Open
gaborcsardi opened this issue Mar 1, 2023 · 2 comments
Open

Document common options for output encoding of Windows tools #353

gaborcsardi opened this issue Mar 1, 2023 · 2 comments
Labels
documentation tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@gaborcsardi
Copy link
Member

Create some output with non-ASCII characters, e.g. this on German or Fresh Windows:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "windows-1252")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die f\u0081r Hyper-V erforderlich sind, werden nicht ange"

So the \u0081 is not converted, apparently, even though that seems to be the default encoding:

❯ [System.Text.Encoding]::Default


IsSingleByte      : True
BodyName          : iso-8859-1
EncodingName      : Westeuropäisch (Windows)
HeaderName        : Windows-1252
WebName           : Windows-1252
WindowsCodePage   : 1252
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1252

This might be some systeminfo or Windows thing, because according to https://en.wikipedia.org/wiki/Windows-1252 \x81 should be unused:

According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[15]

However, 850 seems to work well:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "850")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die für Hyper-V erforderlich sind, werden nicht ange"
@gaborcsardi
Copy link
Member Author

Seems like processx will use the default code page, unless the console is inherited:

> processx::run("chcp")$stdout
[1] "Aktive Codepage: 437.\r\n"
> processx::run("chcp", stdout = "")$stdout
Aktive Codepage: 65001.

@gaborcsardi
Copy link
Member Author

gaborcsardi commented Mar 1, 2023

Per https://serverfault.com/questions/80635/how-can-i-manually-determine-the-codepage-and-locale-of-the-current-os/836221#836221 we can get the code page(s) from the registry, both for command line apps and for old gui apps. OEMCP could be the default for processx, although that means that we would need to explicitly set encoding = "" when we call (new) R from R.

@gaborcsardi gaborcsardi added the tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

No branches or pull requests

1 participant