-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quoted strings are not always properly parsed by the grammar #617
Comments
Looking at this: I really have no idea what you are going for here. |
On master, the output is ' hello" } ':
|
So, the intention is for However, the current code instead considers that the interpolation block is |
No, your code shows there is a backslash too (which isn't the problem -- the problem is that there should be no space after the double quote) |
Like the following failed attempts? # not working:
OmegaConf.create({"a": r"${identity: hello\}"}).a
...
GrammarParseError: missing BRACE_CLOSE at '<EOF>'
full_key: a
object_type=dict
# not working:
OmegaConf.create({"a": r"${identity: hello\\}"}).a
'hello\\'
OmegaConf.create({"a": r"${identity: 'hello\'}"}).a
'hello\\'
OmegaConf.create({"a": r"${identity: 'hello\\'}"}).a
'hello\\' |
Can you show me how you get Python to print ' |
Your last 3 examples actually worked, but: OmegaConf.create({"a": r"${identity: hello\\}"}).a => only works without a quoted string because OmegaConf.create({"a": r"${identity: 'hello\'}"}).a => works because the regex extracting the quoted string actually identifies OmegaConf.create({"a": r"${identity: 'hello\\'}"}).a => same thing, the quoted string is
Easy: What happened is you haven't been using print(repr(r"There is a single backslash \ in this string"))
'There is a single backslash \\ in this string' => always use print() when there are \ in the string, otherwise it's quite confusing :) |
* In top-level strings, the only backslashes that can and should be escaped are those preceding an interpolation. For instance, if "x" is a node evaluating to "X": \${x} -> ${x} \\${x} -> \X \\\${x} -> \${x} ${x}\\ -> X\\ * In quoted strings, the only backslashes that can and should be escaped are those preceding a quote of the same type as the enclosing quotes. For instance: "abc\"def" -> abc"def "abc\\" -> abc\ "abc\\\"def" -> abc\"def "abc\\\'def" -> abc\\\'def This also fixes a bug with the parsing of quoted strings (see omry#617). Fixes omry#615 Fixes omry#617
* In top-level strings, the only backslashes that can and should be escaped are those preceding an interpolation. For instance, if "x" is a node evaluating to "X": \${x} -> ${x} \\${x} -> \X \\\${x} -> \${x} ${x}\\ -> X\\ * In quoted strings, the only backslashes that can and should be escaped are those preceding a quote of the same type as the enclosing quotes. For instance: "abc\"def" -> abc"def "abc\\" -> abc\ "abc\\\"def" -> abc\"def "abc\\\'def" -> abc\\\'def This also fixes a bug with the parsing of quoted strings (see omry#617). Fixes omry#615 Fixes omry#617
This time I paid special attention to how GitHub is rendering this crap, hopefully it will be clearer than my past attempts: Going back to the original example:
It's not clear to me what you are after with r""" ${identity: "hello\\" }"} """ This seems ambiguous to me: Interpretation 1: Interpretation 2: To get your desired output, this seems to work and makes sense to me: In [45]: cfg = OmegaConf.create({"a": r'${identity:"hello\\\"} "}'}); print(f"'{cfg.a}'");cfg.a == r"hello\"} "
'hello\"} '
Out[45]: True |
That's (almost) the correct interpretation: since there are two OmegaConf.create({
"a": "Hello ${world}!", # nothing special
"b": "Hello '${world}'!", # kind of a quoted interpolation, though nothing special is done with the quotes
"c": "Hello ${world}'!", # quote mismatch but we don't care
"d": "Hello ${world}}'!", # quote and brace mismatch but we still don't care
})
Just to be clear, the purpose of this issue isn't that we can't get a specific output, but to show a situation where the current code isn't outputting what it should. |
But we are not outside of an interpolation.
the ... should be a sequence of valid elements. we have ended an element (the string). the next thing should be a comma or closing the custom resolver. (ignoring whitespace).
|
I agree, and that's actually the case in |
Okay, I FINALLY understand what you did there. I will take another look at this later. |
Sorry, I know it's a confusing example, but that's the whole point: it is confusing to the current parser.
So did the grammar :) |
So your intention is that the identity is getting Shouldn't the trailing double quote be an error then? It should be the beginning of a new quotes string that is never terminating. At a high level, I agree that when we see |
Correct.
No because what's outside of interpolations is free-form, you can use quotes however you want. But you can add an extra quote if you prefer -- it shouldn't change the underlying issue.
Ok good, we're on the same page then :) |
Fix edge cases in the parsing of quoted values Fixes #617
Describe the bug
There exist tricky edge cases where quoted strings are not parsed as expected. Here is an example:
The output is
hello\" }
but the expected output ishello\"}
(the difference is no space between the"
and}
)The reason for this lies in the regex for quoted strings:
Because of the
\\"
in the parsed string, it believes that this quote is an escaped quote, and thus keeps parsing until the next quote is found. However, the intent here is for\\"
to represent a backslash followed by a quote, i.e., the string should end with this backslash.(if you wonder why the test case looks so weird, it is so that it can run error-free with and without a potential fix)
Expected behavior
See expected output.
Additional context
There is a link to #615: if we didn't care about quoted strings ending with a backslash, we wouldn't need to escape backslashes, which would solve both issues at the same time. But I think we do care :)
The text was updated successfully, but these errors were encountered: