-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ON HOLD] Simpler escaping in the grammar, and fix to quoted values #621
Conversation
* In top-level strings, the only backslashes that can and should be escaped are those preceding an interpolation. For instance, if "x" is a node evaluating to "X": \${x} -> ${x} \\${x} -> \X \\\${x} -> \${x} ${x}\\ -> X\\ * In quoted strings, the only backslashes that can and should be escaped are those preceding a quote of the same type as the enclosing quotes. For instance: "abc\"def" -> abc"def "abc\\" -> abc\ "abc\\\"def" -> abc\"def "abc\\\'def" -> abc\\\'def This also fixes a bug with the parsing of quoted strings (see omry#617). Fixes omry#615 Fixes omry#617
Right now we don't really have a documentation on the grammar. I wonder if there should be one? |
I annotated them, is this correct? \${x} -> ${x} # prevent interpolation resolution
\\${x} -> \X # add a single backslash, resolve interpolation (x is similar to X, I didn't understand initially)
\\\${x} -> \${x} # prevent resolution and add a backslash before the interpolation
${x}\\ -> X\\ # are you saying that \\ is only escaped \ before an interpolation?
I annotated them:
One thing I am wondering about is quoted string inside nested interpolationn.
Do you need to use double quotes on the inner string ('X')?
Yes, I think we need it (can come after 2.1.0rc1 but before 2.1.0). |
Correct (including the last one -- the motivation is to avoid "escaping hell" when we start getting into quoted strings => I want to escape only the backslashes that must be escaped)
All correct.
Assuming that
Generally speaking, through proper escaping you should be able to go as deep as you want. The tests have one example:
Ok, noted |
This approach is unconventional.
Can you show me some examples of hard cases that would happen if escaping would be handled more uniformly?
|
#615 has an example (scenario 3, we need to use the second option instead of the first one). Copying it here: cfg = OmegaConf.create({
"msg": r'${identity:"The root drive is: \\\\${drive}:\\"}',
"drive": "C",
})
print(cfg.msg) # The root drive is: \C:\ This is because of the double un-escaping that is described in the issue. With this PR, the escaped interpolation is not un-escaped when un-escaping the quotes, so we can get away with simply Another example is when we were discussing #606 (comment) cfg = OmegaConf.create({
"b": "${oc.decode:'{foo:\"\\\\\\\\\\${a}\"}'}"
}) This gave me some headache and triggered this whole thing. This PR cuts down the number of backslashes in half. I tend to agree with you that it's a bit unconventional, but I personally feel like the pros outweigh the cons here. It also feels natural: you only escape things that need escaping (instead of having multiple options leading to the same result). |
Let's see, with the objective of getting the following in a quoted string
This seems to be enough:
Double backslash at the trailing \ to avoid escaping the quote. The above doesn't seem too bad. Looking at the second example now. |
Can you show what is the desired output for this? I want to work from there.
Going to assume it's this from my comment:
|
With this as the target, let's see if what I expect it similar to what you had to do:
The intention I read in my comment is to get this output:
The first interpolation should be resolved, so we leave it as is. When we process the outer string we will unescape the nested ${a}, so we need it double escape it.
This means that after processing the outer string, we end up with
This is exactly what the objective is, so I guess we have some subsequent un-escaping happening after that? |
Several points below (including answers to your question): Additional motivation I forgot to mention another motivation for this change, which is to be more intuitive, in the sense that users wouldn't need to worry about escaping until they run into situations where they need to escape interpolations or quotes (or if they want to use special characters in unquoted strings, but that's a completely different topic). On current master we have this potentially surprising behavior: cfg = OmegaConf.create({
"no_inter": "Two backslashes \\\\ and two slashes //",
"inter": "Two backslashes \\\\ and two slashes ${slash}${slash}",
"slash": "/",
})
print(f"no_inter: {cfg.no_inter}") # Two backslashes \\ and two slashes //
print(f"inter : {cfg.inter}") # Two backslashes \ and two slashes // <== actually ONE backslash!! Another similar (i.e., potentially surprising) situation occurs with quoted strings: print("Two backslashes \\\\ and two slashes //") # Two backslashes \\ and two slashes //
OmegaConf.clear_resolvers()
OmegaConf.register_new_resolver("print", lambda x: print(x))
cfg = OmegaConf.create({
"x": '${print:"Two backslashes \\\\ and two slashes //"}' # <== same quoted string as above
})
cfg.x # Two backslashes \ and two slashes // <== actually ONE backslash!! With the current PR, in both situations the extra backslash is preserved. Quoted strings ending with a backslash One subtle but important change to this PR is that quoted strings ending with a single \ aren't valid anymore, e.g., Just pointing it out to be sure you are ok with this as well. Answers to your comments & questions
I think this wasn't clear, the objective in this example is to write Also I believe what you wrote is how you think things should work, not how they work right now on master (just pointing it out to be sure we're on the same page).
I agree (that's actually how it works in this PR).
Generally speaking I think enforcing \ to escape things would make things worse, for instance
Just to be sure it's clear, it has an important role in how things currently work on master, and one objective of this PR is meant to address the issues it causes.
That's something we need to agree on. Personaly I think
Currently it behaves on master as I want it to behave, i.e.
I'm having hard time following this example, at least in part because there are things that clearly don't work (braces that don't match, decode taking as input something that looks like a dictionary while it should be a string). |
Superseded by #695 => closing |
escaped are those preceding an interpolation. For instance, if "x"
is a node evaluating to "X":
are those preceding a quote of the same type as the enclosing quotes.
For instance:
This also fixes a bug with the parsing of quoted strings (see #617).
Fixes #615
Fixes #617