Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with parsing when an option includes escapes Python does not like #51

Closed
mdhall272 opened this issue Mar 28, 2024 · 5 comments · Fixed by #52
Closed

Problems with parsing when an option includes escapes Python does not like #51

mdhall272 opened this issue Mar 28, 2024 · 5 comments · Fixed by #52
Assignees
Labels

Comments

@mdhall272
Copy link

MWE:

parser <- ArgumentParser(description='foo')
parser$add_argument('bar', default = "[\\D]")
parser$parse_args()

The error is:

Error in if (grepl("^\\{|^\\[", output)) { : the condition has length > 1

The reason is apparently that python is adding a warning string <stdin>:19: SyntaxWarning: invalid escape sequence '\\D' to output which causes an error at line 251 of argparse.R

@trevorld
Copy link
Owner

trevorld commented Mar 28, 2024

Can you provide me your sessionInfo() and python3 --version?

This doesn't seem to reproduce for me on Ubuntu 22.04.4 LTS with Python 3.10.12 in the bash shell

~/tmp$ cat argparse_bug.R 
library("argparse")
parser <- ArgumentParser(description='foo')
parser$add_argument('--bar', default = "[\\D]")
a <- parser$parse_args()
cat("bar: ", a$bar, "\n")
~/tmp$ Rscript argparse_bug.R 
bar:  [\D] 
~/tmp$ Rscript argparse_bug.R --bar=[\\D]
bar:  [\D] 

@trevorld trevorld added the reprex Reproducible example needed label Mar 28, 2024
@mdhall272
Copy link
Author

> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] argparse_2.2.2

loaded via a namespace (and not attached):
[1] compiler_4.3.3   R6_2.5.1         tools_4.3.3      findpython_1.0.8 jsonlite_1.8.8  

And presuming you use the vanilla findpython::find_python_cmd:

> find_python_cmd()
[1] "/opt/homebrew/bin/python3.12"

@trevorld trevorld added bug and removed reprex Reproducible example needed labels Apr 2, 2024
@trevorld trevorld self-assigned this Apr 2, 2024
@trevorld
Copy link
Owner

trevorld commented Apr 2, 2024

  • Okay, thanks for the extra info.
  • I've compiled Python 3.12 locally and this issue reproduces with that version of Python.
  • This issue does not seem to reproduce with Python 3.11 and lower.

@trevorld
Copy link
Owner

trevorld commented Apr 2, 2024

Looks like a new warning in Python 3.12:

~/tmp$ python3.11
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "\D"

~/tmp$ python3.12
Python 3.12.2 (main, Apr  2 2024, 07:10:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "\D"
<stdin>:1: SyntaxWarning: invalid escape sequence '\D'

@trevorld
Copy link
Owner

trevorld commented Apr 2, 2024

Options seem to be:

  1. Send to python as a "raw" string. This will probably escape all escape sequences even ones python recognizes. This may break code that relies on {argparse} passing in a recognized escape sequence to python.

  2. Manually escape any unrecognized escape sequence with an extra \ but don't escape any recognized escape sequence that Python recognizes. This may be non-trivial to do. This seems to be a list of recognized escape characters: https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences

  3. Don't do either one but provide a more informative error message or warning.

trevorld added a commit that referenced this issue Apr 5, 2024
* Character values are now passed as "raw" strings to Python

  + In particular this avoids triggering an error in Python 3.12
    when creating a string with escapes not supported by
    Python e.g. `default = "\\D"` will continue to return a
    default value of `"\\D"` instead of triggering an error.
  + However, this also means that any Python accepted escape
    sequences will no longer be interpreted as an escape sequence.
    If relying on such behaviour you may need to instead use
    the appropriate R escape sequences or Unicode values instead of Python escape sequences:

    - E.g. `default = "\\t"` will now return a default value of
      `"\\t"` (a `t` preceded by an (escaped) backslash)
      instead of `"\t"` (a horizontal tab).
    - https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences
    - https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)

  + Thanks Matthew Hall (@mdhall272) for bug report.

closes #51
trevorld added a commit that referenced this issue Apr 5, 2024
* Character values are now passed as "raw" strings to Python

  + In particular this avoids triggering an error in Python 3.12
    when creating a string with escapes not supported by
    Python e.g. `default = "\\D"` will continue to return a
    default value of `"\\D"` instead of triggering an error.
  + However, this also means that any Python accepted escape
    sequences will no longer be interpreted as an escape sequence.
    If relying on such behaviour you may need to instead use
    the appropriate R escape sequences or Unicode values instead of Python escape sequences:

    - E.g. `default = "\\t"` will now return a default value of
      `"\\t"` (a `t` preceded by an (escaped) backslash)
      instead of `"\t"` (a horizontal tab).
    - https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences
    - https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)

  + Thanks Matthew Hall (@mdhall272) for bug report.

closes #51
trevorld added a commit that referenced this issue Apr 9, 2024
* Character values are now passed as "raw" strings to Python

  + In particular this avoids triggering an error in Python 3.12
    when creating a string with escapes not supported by
    Python e.g. `default = "\\D"` will continue to return a
    default value of `"\\D"` instead of triggering an error.
  + However, this also means that any Python accepted escape
    sequences will no longer be interpreted as an escape sequence.
    If relying on such behaviour you may need to instead use
    the appropriate R escape sequences or Unicode values instead of Python escape sequences:

    - E.g. `default = "\\t"` will now return a default value of
      `"\\t"` (a `t` preceded by an (escaped) backslash)
      instead of `"\t"` (a horizontal tab).
    - https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences
    - https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)

  + Thanks Matthew Hall (@mdhall272) for bug report.

closes #51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants