Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error or crash when writing a string with escape sequence #105

Closed
cjyetman opened this issue Sep 9, 2021 · 6 comments
Closed

error or crash when writing a string with escape sequence #105

cjyetman opened this issue Sep 9, 2021 · 6 comments

Comments

@cjyetman
Copy link

cjyetman commented Sep 9, 2021

# causes error
yaml::write_yaml("\x9b", file = tempfile())
#> Error in as.yaml(x, ...): Emitter error: expected SCALAR, SEQUENCE-START, MAPPING-START, or ALIAS

# causes crash/abort/hang
yaml::write_yaml(list(test = "\x9b"), file = tempfile())
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Monterey 12.0         
#>  system   aarch64, darwin20           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2021-09-09                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  pillar        1.6.2   2021-07-29 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.1)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.1.1)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.4   2021-08-25 [1] CRAN (R 4.1.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.1.0)
#>  tibble        3.1.4   2021-08-25 [1] CRAN (R 4.1.1)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.25    2021-08-06 [1] CRAN (R 4.1.1)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
@spgarbet
Copy link
Member

spgarbet commented Sep 9, 2021

A single byte 0x9b is not defined in UTF-8 which is the default file encoding. The high bit being set UTF-8 expects another byte in the string. I suspect there's code waiting for the next byte which will never come. I tried 'latin1' for the fileEncoding and get the same result.

@spgarbet
Copy link
Member

spgarbet commented Sep 9, 2021

It is properly defined in ISO-8859. yaml::write_yaml(list(test = "\x9b"), file = tempfile(), fileEncoding="ISO-8859-13") hangs. However, this only specifies the file encoding out, not the string encoding in. This leaves me to guess that the string encoding specification isn't correct.

@spgarbet
Copy link
Member

spgarbet commented Sep 9, 2021

> x <- "\x9b"
> Encoding(x) <- "ISO-8859-13"
> yaml::write_yaml(list(test=x), file=tempfile())

Fails same way.

I tried latin1 and got a nice core dump about out of bounds memory access.

This works:

> x <- "\x9b"
> Encoding(x) <- "latin1"
> enc2utf8(x)
[1] "›"
> y <- enc2utf8(x)
> cat(y)
›> Encoding(y)
[1] "UTF-8"
> yaml::write_yaml(list(test=y), file=tempfile())
> 

A work around for right now is to make sure the input string is UTF-8 encoded.

@spgarbet
Copy link
Member

spgarbet commented Sep 9, 2021

Out of curiosity the default encoding is "unknown" in R.

@spgarbet
Copy link
Member

This is related to #90

spgarbet added a commit to spgarbet/r-yaml that referenced this issue Sep 13, 2021
@spgarbet spgarbet marked this as a duplicate of #90 Feb 14, 2022
@spgarbet spgarbet marked this as not a duplicate of #90 Feb 14, 2022
@spgarbet spgarbet reopened this Feb 14, 2022
@spgarbet
Copy link
Member

Moving everything into a single issue ticket. #113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants