Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ON HOLD] Simpler escaping in the grammar, and fix to quoted values #621

Closed
wants to merge 2 commits into from

Conversation

odelalleau
Copy link
Collaborator

@odelalleau odelalleau commented Mar 18, 2021

  • In top-level strings, the only backslashes that can and should be
    escaped are those preceding an interpolation. For instance, if "x"
    is a node evaluating to "X":
        \${x}   -> ${x}
        \\${x}  -> \X
        \\\${x} -> \${x}
        ${x}\\  -> X\\
  • In quoted strings, the only backslashes that can and should be escaped
    are those preceding a quote of the same type as the enclosing quotes.
    For instance:
        "abc\"def"   -> abc"def
        "abc\\"      -> abc\
        "abc\\\"def" -> abc\"def
        "abc\\\'def" -> abc\\\'def

This also fixes a bug with the parsing of quoted strings (see #617).

Fixes #615
Fixes #617

* In top-level strings, the only backslashes that can and should be
  escaped are those preceding an interpolation. For instance, if "x"
  is a node evaluating to "X":
        \${x}   -> ${x}
        \\${x}  -> \X
        \\\${x} -> \${x}
        ${x}\\  -> X\\

* In quoted strings, the only backslashes that can and should be escaped
  are those preceding a quote of the same type as the enclosing quotes.
  For instance:
        "abc\"def" -> abc"def
        "abc\\" -> abc\
        "abc\\\"def" -> abc\"def
        "abc\\\'def" -> abc\\\'def

This also fixes a bug with the parsing of quoted strings (see omry#617).

Fixes omry#615
Fixes omry#617
@odelalleau
Copy link
Collaborator Author

Right now we don't really have a documentation on the grammar. I wonder if there should be one?

@omry
Copy link
Owner

omry commented Mar 18, 2021

  • In top-level strings, the only backslashes that can and should be
    escaped are those preceding an interpolation. For instance, if "x"
    is a node evaluating to "X":

I annotated them, is this correct?

        \${x}   -> ${x}  # prevent interpolation resolution
        \\${x}  -> \X    # add a single backslash, resolve interpolation (x is similar to X, I didn't understand initially)
        \\\${x} -> \${x} # prevent resolution and add a backslash before the interpolation
        ${x}\\  -> X\\   # are you saying that \\ is only escaped \ before an interpolation?
  • In quoted strings, the only backslashes that can and should be escaped
    are those preceding a quote of the same type as the enclosing quotes.
    For instance:

I annotated them:

        "abc\"def"   -> abc"def    # add a quote similar to the outer quote.
        "abc\\"      -> abc\          # add a backslash at the end of the string (avoiding escaping the trailing quote)
        "abc\\\"def" -> abc\"def  # add a backslash in the middle of the string and also a double quote.
        "abc\\\'def" -> abc\\\'def  # I guess this is saying that double backslash is only a backslash if it's before a a matching quote?

One thing I am wondering about is quoted string inside nested interpolationn.
Is this possible?

'foo ${env:FOO,'X'} bar'

Do you need to use double quotes on the inner string ('X')?
What happens if you have another nesting level?

Right now we don't really have a documentation on the grammar. I wonder if there should be one?

Yes, I think we need it (can come after 2.1.0rc1 but before 2.1.0).

@odelalleau
Copy link
Collaborator Author

        \${x}   -> ${x}  # prevent interpolation resolution
        \\${x}  -> \X    # add a single backslash, resolve interpolation (x is similar to X, I didn't understand initially)
        \\\${x} -> \${x} # prevent resolution and add a backslash before the interpolation
        ${x}\\  -> X\\   # are you saying that \\ is only escaped \ before an interpolation?

Correct (including the last one -- the motivation is to avoid "escaping hell" when we start getting into quoted strings => I want to escape only the backslashes that must be escaped)

        "abc\"def"   -> abc"def    # add a quote similar to the outer quote.
        "abc\\"      -> abc\          # add a backslash at the end of the string (avoiding escaping the trailing quote)
        "abc\\\"def" -> abc\"def  # add a backslash in the middle of the string and also a double quote.
        "abc\\\'def" -> abc\\\'def  # I guess this is saying that double backslash is only a backslash if it's before a a matching quote?

All correct.

One thing I am wondering about is quoted string inside nested interpolationn.
Is this possible?

'foo ${env:FOO,'X'} bar'

Do you need to use double quotes on the inner string ('X')?
What happens if you have another nesting level?

Assuming that 'foo ${env:FOO,'X'} bar' is the list of characters in a quoted value (similar to the list of examples just above), then your expression is invalid, it should be one of:

'foo ${env:FOO,\'X\'} bar'
'foo ${env:FOO,"X"} bar'

Generally speaking, through proper escaping you should be able to go as deep as you want. The tests have one example: 'AB${test:\'CD${test:\\\'EF\\\'}GH\'}'

Yes, I think we need it (can come after 2.1.0rc1 but before 2.1.0).

Ok, noted

@omry
Copy link
Owner

omry commented Mar 19, 2021

This approach is unconventional.
Usually escaping handled uniformly.

Correct (including the last one -- the motivation is to avoid "escaping hell" when we start getting into quoted strings => I want to escape only the backslashes that must be escaped)

Can you show me some examples of hard cases that would happen if escaping would be handled more uniformly?
Here are some simple cases to start from:

'a' -> a
'a\\' -> a\
'\\a' -> \a # different than your proposed solution
'a\'b' -> a'b
'a"b' -> a"b
'a\"b' -> a"b # not needed, but should be harmless

# Everything above with outer double quote and inner single quote (where there is an inner quote).

@odelalleau
Copy link
Collaborator Author

Can you show me some examples of hard cases that would happen if escaping would be handled more uniformly?

#615 has an example (scenario 3, we need to use the second option instead of the first one). Copying it here:

cfg = OmegaConf.create({
  "msg": r'${identity:"The root drive is: \\\\${drive}:\\"}',
  "drive": "C",
})
print(cfg.msg)  # The root drive is: \C:\

This is because of the double un-escaping that is described in the issue. With this PR, the escaped interpolation is not un-escaped when un-escaping the quotes, so we can get away with simply \\${drive}:\\.
Initially I thought I could get rid of the double un-escaping, but I was wrong (I think). I can explain more if needed.

Another example is when we were discussing #606 (comment)
Here's what I had ended up with in order to be able to obtain the "dynamic" DictConfig discussed there:

cfg = OmegaConf.create({
  "b":  "${oc.decode:'{foo:\"\\\\\\\\\\${a}\"}'}"
})

This gave me some headache and triggered this whole thing. This PR cuts down the number of backslashes in half.

I tend to agree with you that it's a bit unconventional, but I personally feel like the pros outweigh the cons here. It also feels natural: you only escape things that need escaping (instead of having multiple options leading to the same result).

@omry
Copy link
Owner

omry commented Mar 19, 2021

Let's see, with the objective of getting the following in a quoted string

# before resolution
The root drive is: \${drive}:\

This seems to be enough:

# before resolution
"The root drive is: \\\${drive}:\\"

Double backslash at the trailing \ to avoid escaping the quote.
One backslash before the $ to escape the interpolation, and double backslash before that to get an a ,
In this form, any other \ is assumed to be escaping something so free \ that are not matching an escapable characters should be an error.

The above doesn't seem too bad.
Note that the double unescaping is an implementation detail.
I don't think 4 backslashes are correct before the $:
\ should match the $, and you end up with 3. two are matching together and you have another free one.
arguably it's not an error because it's escaping the "new" backslash.

Looking at the second example now.

@omry
Copy link
Owner

omry commented Mar 19, 2021

Can you show what is the desired output for this? I want to work from there.

b: "${oc.decode:'{foo:\"\\\\\\\\\\${a}\"}'}"

Going to assume it's this from my comment:

b:  "${oc.decode:{'foo:\${a}'}"

@omry
Copy link
Owner

omry commented Mar 19, 2021

With this as the target, let's see if what I expect it similar to what you had to do:

b:  "${oc.decode:{'foo:\${a}'}"

The intention I read in my comment is to get this output:

${oc.decode:{'foo:\${a}'}

The first interpolation should be resolved, so we leave it as is.
The ${a} interpolation is escaped so it should not be resolved.

When we process the outer string we will unescape the nested ${a}, so we need it double escape it.

${oc.decode:{'foo:\\${a}'}

This means that after processing the outer string, we end up with

${oc.decode:{'foo:\${a}'}

This is exactly what the objective is, so I guess we have some subsequent un-escaping happening after that?

@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 19, 2021

Several points below (including answers to your question):

Additional motivation

I forgot to mention another motivation for this change, which is to be more intuitive, in the sense that users wouldn't need to worry about escaping until they run into situations where they need to escape interpolations or quotes (or if they want to use special characters in unquoted strings, but that's a completely different topic).

On current master we have this potentially surprising behavior:

cfg = OmegaConf.create({
    "no_inter": "Two backslashes \\\\ and two slashes //",
    "inter":    "Two backslashes \\\\ and two slashes ${slash}${slash}",
    "slash": "/",
})
print(f"no_inter: {cfg.no_inter}")  # Two backslashes \\ and two slashes //
print(f"inter   : {cfg.inter}")     # Two backslashes \ and two slashes //  <== actually ONE backslash!!

Another similar (i.e., potentially surprising) situation occurs with quoted strings:

print("Two backslashes \\\\ and two slashes //")  # Two backslashes \\ and two slashes //

OmegaConf.clear_resolvers()
OmegaConf.register_new_resolver("print", lambda x: print(x))
cfg = OmegaConf.create({
    "x": '${print:"Two backslashes \\\\ and two slashes //"}'  # <== same quoted string as above
})
cfg.x  #  Two backslashes \ and two slashes //  <== actually ONE backslash!!

With the current PR, in both situations the extra backslash is preserved.

Quoted strings ending with a backslash

One subtle but important change to this PR is that quoted strings ending with a single \ aren't valid anymore, e.g., "C:\" is invalid. This is similar to how x = "C:\" is a SyntaxError in Python. This is a consequence of the new regex I'm using to fix #617 (and I'm not sure if there would be a way to fix #617 while still allowing such strings).

Just pointing it out to be sure you are ok with this as well.

Answers to your comments & questions

Let's see, with the objective of getting the following in a quoted string

# before resolution
The root drive is: \${drive}:\

I think this wasn't clear, the objective in this example is to write The root drive is: \C:\.

Also I believe what you wrote is how you think things should work, not how they work right now on master (just pointing it out to be sure we're on the same page).

This seems to be enough:

# before resolution
"The root drive is: \\\${drive}:\\"

I agree (that's actually how it works in this PR).

In this form, any other \ is assumed to be escaping something so free \ that are not matching an escapable characters should be an error.

Generally speaking I think enforcing \ to escape things would make things worse, for instance The path is ${drive}:\folder wouldn't be a valid config value anymore.

Note that the double unescaping is an implementation detail.

Just to be sure it's clear, it has an important role in how things currently work on master, and one objective of this PR is meant to address the issues it causes.
I'm still unsure how to solve it otherwise. I think I'll need to expand on that, but this comment is already long enough.

I don't think 4 backslashes are correct before the $:
\ should match the $, and you end up with 3. two are matching together and you have another free one.
arguably it's not an error because it's escaping the "new" backslash.

That's something we need to agree on. Personaly I think \\\\${drive} should be interpreted as "two escaped backslashes followed by a regular (non-escaped) interpolation". Similar to how print("\\\\n") in Python prints \\n.

Can you show what is the desired output for this? I want to work from there.

b: "${oc.decode:'{foo:\"\\\\\\\\\\${a}\"}'}"

Currently it behaves on master as I want it to behave, i.e. cfg.b is the DictConfig {'foo': '${a}'}
In other words, the output of the decode() function must be the string '${a}'.

With this as the target, let's see if what I expect it similar to what you had to do:
(...)

I'm having hard time following this example, at least in part because there are things that clearly don't work (braces that don't match, decode taking as input something that looks like a dictionary while it should be a string).
Can you rewrite it in a way that's more correct, with the objective of obtaining the same DictConfig as I just mentioned for cfg.b?

@odelalleau odelalleau marked this pull request as draft March 19, 2021 20:36
@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 19, 2021

Converting to draft to focus on fixing #617 as a first step (see #630)

@odelalleau odelalleau changed the title Simpler escaping in the grammar, and fix to quoted values [ON HOLD] Simpler escaping in the grammar, and fix to quoted values Mar 22, 2021
@odelalleau
Copy link
Collaborator Author

Superseded by #695 => closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants