-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaping in quoted strings (within interpolations) is too complex #615
Comments
the double un-escaping gave me an uneasy feeling, so yet - I think we should try to fix it. Playing with it: cfg = OmegaConf.create({
"msg": r'${identity:"The root drive is: ${drive}:\\"}',
"msg2": r'${identity:"The root drive is: ${drive}:\"}',
"drive": "C",
})
print(cfg.msg) # The root drive is: C:\
print(cfg.msg2) # The root drive is: C:\ The following cases all seems right to me: In [50]: cfg = OmegaConf.create({
...: "i0": "${a}",
...: "i1": r'${identity:"${a}:"}',
...: "i2": r'${identity:"\${identity:${a}}:"}',
...: "i3": r'${identity:"\${identity:\${a}}:"}',
...: "a": "43",
...: })
...: for i in range(0, 4):
...: k = f"i{i}"
...: print(f"{k} {str(cfg._get_node(k)):<35} -> {cfg[k]}")
...:
i0 ${a} -> 43
i1 ${identity:"${a}:"} -> 43:
i2 ${identity:"\${identity:${a}}:"} -> ${identity:43}:
i3 ${identity:"\${identity:\${a}}:"} -> ${identity:${a}}: |
Yes :) will post another issue about it shortly
Yes all good here. The problems really start with complex cases where we want backslashes within quoted strings. |
* In top-level strings, the only backslashes that can and should be escaped are those preceding an interpolation. For instance, if "x" is a node evaluating to "X": \${x} -> ${x} \\${x} -> \X \\\${x} -> \${x} ${x}\\ -> X\\ * In quoted strings, the only backslashes that can and should be escaped are those preceding a quote of the same type as the enclosing quotes. For instance: "abc\"def" -> abc"def "abc\\" -> abc\ "abc\\\"def" -> abc\"def "abc\\\'def" -> abc\\\'def This also fixes a bug with the parsing of quoted strings (see omry#617). Fixes omry#615 Fixes omry#617
* In top-level strings, the only backslashes that can and should be escaped are those preceding an interpolation. For instance, if "x" is a node evaluating to "X": \${x} -> ${x} \\${x} -> \X \\\${x} -> \${x} ${x}\\ -> X\\ * In quoted strings, the only backslashes that can and should be escaped are those preceding a quote of the same type as the enclosing quotes. For instance: "abc\"def" -> abc"def "abc\\" -> abc\ "abc\\\"def" -> abc\"def "abc\\\'def" -> abc\\\'def This also fixes a bug with the parsing of quoted strings (see omry#617). Fixes omry#615 Fixes omry#617
Quoted values are now identified through lexer modes, which removes the need for escaping quotes in nested interpolations. Fixes omry#615
Quoted values are now identified through lexer modes, which removes the need for escaping quotes in nested interpolations. Fixes omry#615
New syntax for nested quoted values Quoted values are now identified through lexer modes, which removes the need for escaping quotes in nested interpolations. Fixes #615
Describe the bug
Technically the following is working as intended, but it is hard to understand and error prone. To explain the issue I have to show increasingly more complex scenarios:
(1) Consider the following code (note that I'm using a raw string
r'...'
to avoid having to double all\
due to Python string processing):Importantly, the escape at the end of the quoted string is doubled so that it is not mistaken for an escaped quote by the grammar (side note: there is a bug related to this, but I'll discuss it in another issue, we can ignore it here). So far, so good.
(2) Now let's say you want to print instead
The root drive is: ${drive}:\
, i.e., the interpolation shouldn't be resolved. Here is how to achieve it:Still looks good :)
(3) What if instead you want to print:
The root drive is: \C:\
? (i.e., the interpolation should be resolved, but with a backslash in front). The "naive" way of adding an extra backslash (to have an escaped backslash instead of an escaped interpolation) won't cut it:Instead, you need to use four backslashes (or three, it gives the same result):
Now this is getting confusing! The reason is essentially that escaped backslashes get un-escaped twice:
\\
, that may be used as\\"
to avoid confusion with an escaped quoted\"
. This step happens before we feed the resulting string toquoted_string_callback()
\\
, that may be used as\\${..}
to avoid confusion with an escaped interpolation\${..}
. This step happens withinquoted_string_callback()
Expected behavior
Scenarios (1) and (2) should still work the same. However, in scenario (3) the first example should work, i.e.:
This is doable by skipping the first un-escaping step described above, and rely instead on the second one to un-escape
\\
.I don't see any major drawback to it. A minor one is that it means we will need to process the content of all quoted strings as if they contained interpolations, so as to properly un-escape
\\
, which incurs some extra overhead (but since we are already in the middle of resolving an interpolation it shouldn't make a big difference).Additional context
Related discussion: #606 (comment)
(in the example discussed there, proper escaping proved very confusing as I was trying to achieve the desired result)
The text was updated successfully, but these errors were encountered: