Document common options for output encoding of Windows tools #353

gaborcsardi · 2023-03-01T09:00:16Z

Create some output with non-ASCII characters, e.g. this on German or Fresh Windows:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "windows-1252")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die f\u0081r Hyper-V erforderlich sind, werden nicht ange"

So the \u0081 is not converted, apparently, even though that seems to be the default encoding:

❯ [System.Text.Encoding]::Default


IsSingleByte      : True
BodyName          : iso-8859-1
EncodingName      : Westeuropäisch (Windows)
HeaderName        : Windows-1252
WebName           : Windows-1252
WindowsCodePage   : 1252
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1252

This might be some systeminfo or Windows thing, because according to https://en.wikipedia.org/wiki/Windows-1252 \x81 should be unused:

According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[15]

However, 850 seems to work well:

res <- processx::run("systeminfo", c("/FO", "csv"), encoding = "850")$stdout
substr(res, 2200, 2300)
#> [1] "5d4\",\"Es wurde ein Hypervisor erkannt. Features, die für Hyper-V erforderlich sind, werden nicht ange"

The text was updated successfully, but these errors were encountered:

gaborcsardi · 2023-03-01T09:30:19Z

Seems like processx will use the default code page, unless the console is inherited:

> processx::run("chcp")$stdout
[1] "Aktive Codepage: 437.\r\n"
> processx::run("chcp", stdout = "")$stdout
Aktive Codepage: 65001.

gaborcsardi · 2023-03-01T10:40:46Z

Per https://serverfault.com/questions/80635/how-can-i-manually-determine-the-codepage-and-locale-of-the-current-os/836221#836221 we can get the code page(s) from the registry, both for command line apps and for old gui apps. OEMCP could be the default for processx, although that means that we would need to explicitly set encoding = "" when we call (new) R from R.

gaborcsardi mentioned this issue Mar 1, 2023

Add a FAQ bullet about system() not marking string output with encoding gaborcsardi/rencfaq#7

Open

gaborcsardi added the documentation label Oct 31, 2023

gaborcsardi added the tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document common options for output encoding of Windows tools #353

Document common options for output encoding of Windows tools #353

gaborcsardi commented Mar 1, 2023

gaborcsardi commented Mar 1, 2023

gaborcsardi commented Mar 1, 2023 •

edited

Loading

Document common options for output encoding of Windows tools #353

Document common options for output encoding of Windows tools #353

Comments

gaborcsardi commented Mar 1, 2023

gaborcsardi commented Mar 1, 2023

gaborcsardi commented Mar 1, 2023 • edited Loading

gaborcsardi commented Mar 1, 2023 •

edited

Loading