-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaped interpolations trigger a GrammarParseError if they don't follow the expected interpolation grammar #666
Comments
The bandaid fix as you call it is also a performance concern. The idea with having a fast is_interpolation that can return False positives was to improve performance by avoiding full grammar parse to detect interpolations. The first solution should be acceptable if the performance is good in some real world scenarios. The second solution sounds terrible. The last two are going in a direction of caching the parsing results. Right now the logic in get_value_kind is: # We identify potential interpolations by the presence of "${" in the string.
# Note that escaped interpolations (ex: "esc: \${bar}") are identified as
# interpolations: this is intended, since they must be processed as interpolations
# for the string to be properly un-escaped.
# Keep in mind that invalid interpolations will only be detected when
# `strict_interpolation_validation` is True.
if isinstance(value, str) and "${" in value:
if strict_interpolation_validation:
# First try the cheap regex matching that detects common interpolations.
if SIMPLE_INTERPOLATION_PATTERN.match(value) is None:
# If no match, do the more expensive grammar parsing to detect errors.
parse(value)
return ValueKind.INTERPOLATION One thing that looks weird to me is: what is the purpose of calling parse(value)? Second thing that jumps out: |
It's actually improving performance (
Agreed.
Yes. If I remember correctly personally I was fine with having
I think that would cause problems because badly formed interpolations wouldn't be detected anymore.
Same reason as above: to detect errors early.
Running the regex is pointless if we don't do anything when the regex doesn't return a match: the regex is only here to avoid the costly grammar parsing in common situations. |
You misunderstood me. if isinstance(value, str) and "${" in value:
# First try the cheap regex matching that detects common interpolations.
if SIMPLE_INTERPOLATION_PATTERN.match(value) is None:
# If no match, do the more expensive grammar parsing to detect errors.
parse(value)
return ValueKind.INTERPOLATION |
Now I am confused. If this is not an interpolation, why does the the strictest _get_value_kind says that it is? |
What's your objective here? If we remove the
It is to be less strict so that we don't notice that the string is a badly formed interpolation.
I'm not sure I understand your last sentence, but I think the confusion comes from me not explaining well enough what is happening. Let me write in more details the steps that occur when we run this code: cfg = OmegaConf.create({"x": r"\${not an interpolation}"})
cfg.x
|
The objective here was to to always be strict, which would solve the wrong problem if this failing by not being strict enough. Going to look at the second part of your answer now. |
Just to be sure it's clear, the performance cost would come from the various calls to Also all strings that contain |
This could be one way to attack this issue. Went through the rest and ran it in the debugger. This is a change from the behavior of OmegaConf 2.0, which allows malformed interpolations as values: In [1]: from omegaconf import *
In [2]: __version__
Out[2]: '2.0.6'
In [3]: cfg = OmegaConf.create({"x": r"${n"})
In [4]: cfg.x
Out[4]: '${n' We have discussed it before, and I suggested that get_value_kind() does not raise exceptions. Changing _get_value_kind interpolation detection to this: if isinstance(value, str) and "${" in value:
if strict_interpolation_validation:
# First try the cheap regex matching that detects common interpolations.
if SIMPLE_INTERPOLATION_PATTERN.match(value) is None:
# If no match, do the more expensive grammar parsing to detect errors.
try:
parse(value)
except GrammarParseError:
return ValueKind.VALUE Fixes this particular issue and is breaking a handful of tests because as expected some things are not considered errors now. My preference is actually to retain the old behavior and consider broken interpolations as values. |
"that now would all trigger the regex" if the string contains |
It buys us the un-escaping of escaped interpolations. If we said "it's just a regular string" then the following code would return cfg = OmegaConf.create({"x": r"\${y}"})
cfg.x Note also that even if we make it so that escaped interpolations aren't considered interpolations anymore, it may help in the specific example I gave here, but the same problem would occur if for instance a resolver returns a string that looks like an interpolation: OmegaConf.register_new_resolver("oops", lambda: "${trigger crash}")
cfg = OmegaConf.create({"x": "${oops:}"})
cfg.x # GrammarParseError I guess what might work is:
I'm not 100% sure I like it because it makes the nodes' internals a bit more complex and adds extra processing on each access, but at first glance it seems doable. What do you think? (edit: actually the "extra processing on each access" isn't a problem I think: it's cheaper than resolving interpolations, the extra cost for regular strings should be minor, and we could cache the result => I'm starting to like this direction, though there remain design questions to be resolved) (edit2: having a second look at it, one potential implementation concern is that currently
Just to understand, are you saying that it should be the case only when If we do it also when
If your point is that we only pay a price for interpolations, then yes I agree. But if we don't care why did we bother with the regex version in the first place? (it's only there to speed-up the processing of interpolations). |
@omry just realized we hadn't converged on this one, can you please have a look when you get a chance? |
Yes, I postponed this because having multiple parallel design discussions was too time consuming and I felt it's best to focus on one at a time. |
What is the desired behavior here? that the interpolation is not resolved?
It seems like a reasonable solution.
We can resolve it by providing an API to access the raw value to be used by lower level functions like
As we have discussed several times, the computation result can potentially be cached to minimize the damage.
Yes, this is a bigger problem now that that grammar is significantly more complicated.
I am not too worried about this one. Going to check your PR now. |
Correct,
Yeah, whether it's a new flag, a new method, or just doing a direct access, all are pretty much equivalent (IMO), i.e., it should work but it looks a bit clumsy. |
Fix crash with "interpolation-like" strings from interpolations This commit introduces a new node type `InterpolationResultNode` that systematically wraps interpolation results that either (a) are not already nodes, or (b) need to be converted. Fixes #666
Describe the bug
The following code triggers a GrammarParseError:
Expected behavior
There should be no grammar syntax constraint after a
\${
since it is not a real interpolation.Discussion
What happens is that when we dereference
cfg.x
, the resulting string ("${not an interpolation}"
) is wrapped within a node (AnyNode
because herex
is untyped), whose constructor callsValueNode._set_value()
, that checks if the the value is an interpolation... and this check raises an exception because it's a badly formed interpolation.The bandaid fix is to set
strict_interpolation_validation=False
atomegaconf/omegaconf/nodes.py
Line 37 in a03f681
It would break the tests that specifically test for this check, but overall things would still work fine.
This bandaid fix, however, wouldn't change the fact that
cfg._get_node("x")._dereference_node()._is_interpolation()
is True, i.e., when we dereferencecfg.x
we obtain a node that believes it is an interpolation (!). Right now this isn't really a problem because we don't do anything with this node except getting its value, but it may bite us later.Right now I don't have a solution that I really like in mind, so posting this issue for discussion.
Here are options I've thought about so far:
StringNode
specifically when an interpolation result is a string with${
. This subclass would override what it takes to avoid a crash. One issue with this is that ifcfg.x
is actually typed, we'd like to return another type of node (ex:IntegerNode
), so there should be an exception instead (it seems doable, just a bit clumsy)StringNode
that says "I am not an interpolation" even if the value looks like one. This would be similar to the previous idea, but without adding an extra class.is_interpolation={True,False,None}
, whereNone
would be the current behavior (check if the value is a string looking like an interpolation),False
would make it possible to "declare" the value as not being an interpolation (even if it looks like one), andTrue
would declare it as being an interpolation (regardless of the value). Besides fixing this issue, this could enable more things like (i) caching the interpolation status in that flag for faster checks, or (ii) enabling new kinds of interpolations that aren't tied to the string content (e.g. to point to config nodes that aren't accessible by a path from the root config, as I once did in a draft PR). I kinda like this option but I expect it to be non trivial and uncover issues I haven't thought about yet, so I'd rather not go there in 2.1Thoughts?
The text was updated successfully, but these errors were encountered: