Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping in quoted strings (within interpolations) is too complex #615

Closed
odelalleau opened this issue Mar 17, 2021 · 2 comments · Fixed by #695
Closed

Escaping in quoted strings (within interpolations) is too complex #615

odelalleau opened this issue Mar 17, 2021 · 2 comments · Fixed by #695
Assignees
Labels
bug Something isn't working In progress
Milestone

Comments

@odelalleau
Copy link
Collaborator

Describe the bug

Technically the following is working as intended, but it is hard to understand and error prone. To explain the issue I have to show increasingly more complex scenarios:

(1) Consider the following code (note that I'm using a raw string r'...' to avoid having to double all \ due to Python string processing):

OmegaConf.register_new_resolver("identity", lambda x: x)
cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: ${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: C:\

Importantly, the escape at the end of the quoted string is doubled so that it is not mistaken for an escaped quote by the grammar (side note: there is a bug related to this, but I'll discuss it in another issue, we can ignore it here). So far, so good.

(2) Now let's say you want to print instead The root drive is: ${drive}:\, i.e., the interpolation shouldn't be resolved. Here is how to achieve it:

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: \${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: ${drive}:\

Still looks good :)

(3) What if instead you want to print: The root drive is: \C:\? (i.e., the interpolation should be resolved, but with a backslash in front). The "naive" way of adding an extra backslash (to have an escaped backslash instead of an escaped interpolation) won't cut it:

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: \\${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: ${drive}:\

Instead, you need to use four backslashes (or three, it gives the same result):

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: \\\\${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: \C:\

Now this is getting confusing! The reason is essentially that escaped backslashes get un-escaped twice:

  • First, to un-escape \\, that may be used as \\" to avoid confusion with an escaped quoted \". This step happens before we feed the resulting string to quoted_string_callback()
  • Second, to un-escape \\, that may be used as \\${..} to avoid confusion with an escaped interpolation \${..}. This step happens within quoted_string_callback()

Expected behavior

Scenarios (1) and (2) should still work the same. However, in scenario (3) the first example should work, i.e.:

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: \\${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: \C:\

This is doable by skipping the first un-escaping step described above, and rely instead on the second one to un-escape \\.

I don't see any major drawback to it. A minor one is that it means we will need to process the content of all quoted strings as if they contained interpolations, so as to properly un-escape \\, which incurs some extra overhead (but since we are already in the middle of resolving an interpolation it shouldn't make a big difference).

Additional context
Related discussion: #606 (comment)
(in the example discussed there, proper escaping proved very confusing as I was trying to achieve the desired result)

@odelalleau odelalleau added the bug Something isn't working label Mar 17, 2021
@odelalleau odelalleau self-assigned this Mar 17, 2021
@odelalleau odelalleau added this to the OmegaConf 2.1 milestone Mar 17, 2021
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 17, 2021
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 17, 2021
@omry
Copy link
Owner

omry commented Mar 17, 2021

the double un-escaping gave me an uneasy feeling, so yet - I think we should try to fix it.

Playing with it:
In case 1, the double quote at the end is not needed right now, is that the bug you were hinting at?

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: ${drive}:\\"}',
  "msg2": r'${identity:"The root drive is: ${drive}:\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: C:\
print(cfg.msg2)  # The root drive is: C:\

The following cases all seems right to me:

In [50]: cfg = OmegaConf.create({
    ...:   "i0": "${a}",
    ...:   "i1": r'${identity:"${a}:"}',
    ...:   "i2": r'${identity:"\${identity:${a}}:"}',
    ...:   "i3": r'${identity:"\${identity:\${a}}:"}',
    ...:   "a": "43",
    ...: })
    ...: for i in range(0, 4):
    ...:     k = f"i{i}"
    ...:     print(f"{k} {str(cfg._get_node(k)):<35} -> {cfg[k]}")
    ...: 
i0 ${a}                                -> 43
i1 ${identity:"${a}:"}                 -> 43:
i2 ${identity:"\${identity:${a}}:"}    -> ${identity:43}:
i3 ${identity:"\${identity:\${a}}:"}   -> ${identity:${a}}:

@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 17, 2021

In case 1, the double quote at the end is not needed right now, is that the bug you were hinting at?

Yes :) will post another issue about it shortly
(although it could be argued here that it's a good thing, but the reason why it's behaving this way is causing other problems)

The following cases all seems right to me:
(...)

Yes all good here. The problems really start with complex cases where we want backslashes within quoted strings.

odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 18, 2021
* In top-level strings, the only backslashes that can and should be
  escaped are those preceding an interpolation. For instance, if "x"
  is a node evaluating to "X":
        \${x}   -> ${x}
        \\${x}  -> \X
        \\\${x} -> \${x}
        ${x}\\  -> X\\

* In quoted strings, the only backslashes that can and should be escaped
  are those preceding a quote of the same type as the enclosing quotes.
  For instance:
        "abc\"def" -> abc"def
        "abc\\" -> abc\
        "abc\\\"def" -> abc\"def
        "abc\\\'def" -> abc\\\'def

This also fixes a bug with the parsing of quoted strings (see omry#617).

Fixes omry#615
Fixes omry#617
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 18, 2021
* In top-level strings, the only backslashes that can and should be
  escaped are those preceding an interpolation. For instance, if "x"
  is a node evaluating to "X":
        \${x}   -> ${x}
        \\${x}  -> \X
        \\\${x} -> \${x}
        ${x}\\  -> X\\

* In quoted strings, the only backslashes that can and should be escaped
  are those preceding a quote of the same type as the enclosing quotes.
  For instance:
        "abc\"def" -> abc"def
        "abc\\" -> abc\
        "abc\\\"def" -> abc\"def
        "abc\\\'def" -> abc\\\'def

This also fixes a bug with the parsing of quoted strings (see omry#617).

Fixes omry#615
Fixes omry#617
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Apr 25, 2021
Quoted values are now identified through lexer modes, which removes the
need for escaping quotes in nested interpolations.

Fixes omry#615
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Apr 26, 2021
Quoted values are now identified through lexer modes, which removes the
need for escaping quotes in nested interpolations.

Fixes omry#615
odelalleau added a commit that referenced this issue Apr 30, 2021
New syntax for nested quoted values

Quoted values are now identified through lexer modes, which removes the
need for escaping quotes in nested interpolations.

Fixes #615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment