-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of interpolations pointing to missing nodes #545
Conversation
tests/test_interpolation.py
Outdated
@pytest.mark.parametrize("ref", ["missing", "invalid"]) | ||
def test_invalid_intermediate_result_when_not_throwing( | ||
ref: str, restore_resolvers: Any | ||
) -> None: | ||
""" | ||
Check that the resolution of nested interpolations stops on missing / resolution failure. | ||
""" | ||
OmegaConf.register_new_resolver("check_none", lambda x: x is None) | ||
cfg = OmegaConf.create({"x": f"${{check_none:${{{ref}}}}}", "missing": "???"}) | ||
x_node = cfg._get_node("x") | ||
assert isinstance(x_node, Node) | ||
assert ( | ||
x_node._dereference_node( | ||
throw_on_missing=False, throw_on_resolution_failure=False | ||
) | ||
is None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of the change this is testing.
Why do ywe want to ever get None from _dereference_node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a hopefully better explanation in dfea0a0
Before this PR, what would happen is that for instance ${invalid}
would be resolved to None
(because throw_on_resolution_failure
is False), and the resolver would thus be called with None
as input (which in general could result in an arbitrary result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do ywe want to ever get None from _dereference_node?
One example is ValueNode._is_none()
3dc5c20
to
dfea0a0
Compare
omegaconf/base.py
Outdated
key=key, | ||
parent=parent, | ||
) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catch is very broad.
Is this to handle userland exceptions from custom resolvers?
Worth considering to never suppress those and have a narrow catch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catch is very broad.
Is this to handle userland exceptions from custom resolvers?
Yes.
Worth considering to never suppress those and have a narrow catch here.
Maybe, but that's how it was working before as well => best left for another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the exceptions we are expecting here after the recent conclusions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, can now be restricted to InterpolationResolutionError, done in 2fa1d7f
omegaconf/basecontainer.py
Outdated
|
||
if has_default and val is None: | ||
return default_value | ||
|
||
if is_mandatory_missing(val): | ||
if has_default: | ||
return default_value | ||
raise MissingMandatoryValue("Missing mandatory value: $FULL_KEY") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simplify by combining the has_default?
if has_default:
if val is None:
return default_value
if is_mandatory_missing(val):
raise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing it -- the problem is to make sure that we call is_mandatory_missing()
only once (and ideally that we don't call it at all if we don't need to). Can you provide a full snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I looked at the existing code and I am a bit confused by this change.
The current code is checking if a value is mandatory missing, if it is it returns the default if provided.
it then resolves it, and check again on the resolved value. if it's missing - it returns the default otherwise it raises.
not sure how you can consolidate the two checks into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I understand the confusion. The reason why I could move up the check (that was after the resolution) is because it's not possible anymore for _maybe_resolve_interpolation()
to return a node that is missing unless val
itself is missing (since an interpolation pointing to a missing node isn't considered as missing now).
But yes, this change is a bit tricky and it's probably easier to understand by considering two scenarios to see how they play out (before / after this change):
val
is missing ("???"
)val
is an interpolation pointing to a missing node ("${foo}"
withfoo
set to"???"
)
In scenario 1:
- If there is a default value it is obvious that both before/after return this default value
- If there is no default value, before we would resolve
val
, which would return"???"
, and since it is missing theMissingMandatoryValue
exception would be raised. After this PR, this is now caught before resolving, and the same exception is raised.
In scenario 2:
- If there is a default value, before we would still resolve
val
(because it's an interpolation and not the missing string), which would result in a missing value, and the default value would be returned. After this PR,resolved_node
will beNone
because the resolution will fail (due tofoo
being missing), and as a result the default value will also be returned. - If there is no default value, before we would resolve
val
, resulting in a missing value and theMissingMandatoryValue
exception being thrown here. After this PR, sincethrow_on_resolution_failure
is True, the resolution will fail (also with aMissingMandatoryValue
exception)
==> in the end the behavior is the same in all situations
omegaconf/basecontainer.py
Outdated
|
||
if has_default and val is None: | ||
return default_value | ||
|
||
if is_mandatory_missing(val): | ||
if has_default: | ||
return default_value | ||
raise MissingMandatoryValue("Missing mandatory value: $FULL_KEY") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I looked at the existing code and I am a bit confused by this change.
The current code is checking if a value is mandatory missing, if it is it returns the default if provided.
it then resolves it, and check again on the resolved value. if it's missing - it returns the default otherwise it raises.
not sure how you can consolidate the two checks into one.
I feel that the There are three possible error states when calling
In this branch, it seems that if Would it make sense to introduce another argument to the OmegaConf.to_container(cfg, resolve=True, throw_on_missing=False) (#503 is a work in progress, introducing a @odelalleau This is related to your comments the other day (#516 (comment)), where we discussed interpolations that might contain typos. |
Actually no, in (3) it raises an
I'm not familiar with all use cases for |
One use-case is motivated by the work-in-progress PR #502, which will introduce an In any case, the spirit of
My sense is that OmegaConf.to_container(cfg, resolve=True, throw_on_missing=False) should never raise OmegaConf.to_container(cfg, resolve=True, throw_on_missing=True) should raise (
Oh, I see. In this case the |
Sounds good. I think that "simpler" is often good design. For the record, here is the current throw_on_resolution_failure=True
throw_on_missing=True
case1: MissingMandatoryValue
case2: MissingMandatoryValue
case3: InterpolationResolutionError
throw_on_missing=False
case1: '???'
case2: MissingMandatoryValue
case3: InterpolationResolutionError
throw_on_resolution_failure=False
throw_on_missing=True
case1: MissingMandatoryValue
case2: None
case3: None
throw_on_missing=False
case1: '???'
case2: None
case3: None Here is the script that I used to generate the above output: |
I agree it may be weird to get a For Maybe one way to reduce the potential confusion could be to replace the |
Are you saying that
Hmm... My personal opinion is that the two error modes (2) and (3), which mean "interpolated node is missing" and "interpolated node does not exist" are fundamentally different, and it is useful for the end user if they are distinguished. |
a missing node cannot be dereferenced, just like a node with invalid interpolation key cannot be dereferenced. foo: ??? # mandatory missing exception
bar: ${zzz} # interpolation resolution exception If an interpolation pointing to a missing node: foo : ${bar}
bar: ??? To me it's clear that this is not a resolution error. One case that can confuse things is string interpolation: foo: abc_${bar}
bar: ??? My intuition is that if throw_on_missing is true, this raises, otherwise is returns abc_???. |
If my understanding is correct, this is described by the following table for throw_on_resolution_failure=True
throw_on_missing=True
case1: MissingMandatoryValue
case2: MissingMandatoryValue
case3: InterpolationResolutionError
throw_on_missing=False
case1: '???'
case2: '???'
case3: InterpolationResolutionError
throw_on_resolution_failure=False
throw_on_missing=True
case1: MissingMandatoryValue
case2: MissingMandatoryValue
case3: None
throw_on_missing=False
case1: '???'
case2: '???'
case3: None where
|
Updated your script. from omegaconf import OmegaConf
cases = dict(
direct_missing={"x": "???"},
indirect_missing={"x": "${missing}", "missing": "???"},
str_indirect_missing={"x": "foo_${missing}", "missing": "???"},
inter_not_found={"x": "${does_not_exist}"},
str_inter_not_found={"x": "foo_${does_not_exist}"},
)
print("--")
for throw_on_resolution_failure in [True, False]:
for throw_on_missing in [True, False]:
print(f"{throw_on_resolution_failure = }")
print(f"{throw_on_missing = }")
for case_name, case in cases.items():
case = OmegaConf.create(case)
try:
node = case._get_node("x")
result = repr(
node._dereference_node(
throw_on_missing=throw_on_missing,
throw_on_resolution_failure=throw_on_resolution_failure,
)
)
except Exception as excinfo:
result = "ERR: " + type(excinfo).__name__
print(f" {case_name}: {result}")
print("--") Output:
|
I think |
Can we articulate what is the value of those two flags right now? |
Looking at current use of |
Sounds reasonable. If the caller needs to distinguish between |
But should we consider that by "traversal" we also mean the operations performed during interpolation resolution? I suggest that we shouldn't. This has the advantage that
Ok, I was thinking that maybe we could extend it to any kind of exception raised during interpolation resolution (including also resolver failures by the way, which right now can be anything since we don't control this custom code). But it was just a suggestion, I don't feel particularly strongly about it.
If we decide to return "???" then I think this PR should be changed so that
That's something I feel more strongly against. It might seem intuitive for string interpolations (although it did surprise you), but what about accessing foo: ${x.${bar}}
foo2: ${resolver:${bar}}
foo3: ${resolver:abc_${bar}}
bar: ???
x:
y: 0 What I believe makes more sense is one of these two options:
|
I think we should articulate what they mean :) So far I've interpreted We can change it to say that it's only meant to silence exceptions raised when an interpolated node (or resolver?) doesn't exist, but I'm not entirely sure of the potential consequences. I feel like it'd be better to do it in a different PR too, to avoid mixing too many changes together. |
It's becoming hard to follow what's going on in the discussions because it's changing many things at the same time. Updated script producing results. (Added line numbers for easier discussion). I ran the script on master and with this PR to compare the results.
wdiff shows word difference with a {} block on the right:
Let's try to get a consensus on each of the changing lines (6,7,11,12,16,17,19). |
To make this easier, here is the same output with all the unchanged lines removed:
|
Let me try to slice this by use case. line 6: seems wrong to me, should not throw MissingMandatoryValue if throw_on_missing = False. line 11: Line 16: That's it for indirect missing. let's get a consensus on it before trying to get it on the other ones. |
My main comment is that what you suggest goes against what I proposed in #543 (section "Expected behavior" at the bottom, point 1): if "a node is never missing if it is an interpolation", then line 6 should not return "???". We can go back to the previous "an interpolation pointing to a missing node is also missing": I think it can work too, I just personally believe it makes things a bit more complex and I still don't see what we gain from it (#462 is an example where things would have been simpler if interpolations were never missing). But I don't really feel strongly about it, so I'm happy to implement either version. |
I see what you're saying about #462: For the purposes of OmegaConf.merge, it makes sense for cfg1 = OmegaConf.create({"a" : 1, "b" : "???"})
cfg2 = OmegaConf.create({"a" : "???"})
assert OmegaConf.merge(cfg1, cfg2) == {"a": 1, "b" : "???"} But cfg1 = OmegaConf.create({"a" : 1, "b" : "???"})
cfg2 = OmegaConf.create({"a" : "${b}"})
assert OmegaConf.merge(cfg1, cfg2) == {"a": "${b}", "b": "???"} This is a strong argument that
I still feel that if a user calls I feel like OmegaConf's API can be divided into two broad categories:
|
One idea that came to my mind is to change the def _dereference_node(
self,
throw_on_missing: bool = False,
throw_on_resolution_failure: bool = True,
) -> Optional["Node"]: to something like def _dereference_node(
self,
throw_on_str_indirect_missing: bool = False,
throw_on_resolution_failure: bool = True,
) -> Optional["Node"]: The motivation for this change is:
So the new behavior would be that _dereference_node never throws a MandatoryMissingValue (except maybe in the |
That would actually be the case with what I'm suggesting, i.e., in short: interpolations can never be missing (and encoutering a missing value while trying to resolve an interpolations should be considered as an error, that may or may not be silenced with My suggestion is based on previous experience with OmegaConf internals (#462 that I mentioned), but also trying to think of use cases for if OmegaConf.is_missing(cfg, "foo"):
logging.error("You forgot to set the `foo` variable: edit `config.yaml` and edit the line `foo: ???`")
sys.exit(1) => In that case, if if OmegaConf.is_missing(cfg, "n_processes"):
logging.info("`n_processes` is not set, defaulting to 4 processes")
cfg.n_processes = 4
if OmegaConf.is_missing(cfg, "n_cpus"):
logging.info("`n_cpus` is not set, defaulting to twice the number of processes")
cfg.n_cpus = cfg.n_processes * 2 => in that case, if If we agree that "being missing" only means "being set to ???", then things become simpler IMO, both in these examples and for internal operations like #462.
Stepping back for a minute, can you think of a use case where we would want |
8dcef05
to
ef1d38f
Compare
This pull request introduces 2 alerts when merging ef1d38f into d46d058 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 5718bf0 into d46d058 - view on LGTM.com new alerts:
|
…polations to non-existing nodes
8e5d774
to
c87648e
Compare
tests/test_interpolation.py
Outdated
c = OmegaConf.create(dict(my_key="${env:my_key}", other_key=123)) | ||
with pytest.raises(InterpolationResolutionError): | ||
monkeypatch.setenv("MYKEY", "${other_key}") | ||
c = OmegaConf.create(dict(my_key="${env:MYKEY}", other_key=123)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you touch dict()
in tests, can you convert to {} style?
c = OmegaConf.create(dict(my_key="${env:MYKEY}", other_key=123)) | |
c = OmegaConf.create({"my_key": "${env:MYKEY}", "other_key": 123}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the whole file in 85478e4
tests/test_select.py
Outdated
if exc_no_default is None: | ||
# Not providing a default value is expected to work and return `None`. | ||
assert OmegaConf.select(cfg, key) is None | ||
else: | ||
# Not providing a default value is expected to raise an exception. | ||
with exc_no_default: | ||
OmegaConf.select(cfg, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer splitting tests into positive and negative cases in scenarios like this one.
It makes the tests significantly simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in two steps:
First, c396527 addresses this specific issue. It's easier to see the diff against master after applying this commit, which looks like this:
@pytest.mark.parametrize("struct", [True, False, None])
@@ -60,11 +61,11 @@ def test_select(
@pytest.mark.parametrize("struct", [False, True])
@pytest.mark.parametrize("default", [10, None])
@pytest.mark.parametrize(
- "cfg, key",
+ ("cfg", "key"),
[
pytest.param({}, "not_found", id="empty"),
pytest.param({"missing": "???"}, "missing", id="missing"),
- pytest.param({"inter": "${bad_key}"}, "inter", id="inter_bad_key"),
+ pytest.param({"int": 0}, "int.y", id="non_container"),
],
)
def test_select_default(
@@ -78,6 +79,31 @@ def test_select_default(
assert OmegaConf.select(cfg, key, default=default) == default
+@pytest.mark.parametrize("struct", [False, True])
+@pytest.mark.parametrize(
+ ("cfg", "key", "exc"),
+ [
+ pytest.param(
+ {"int": 0},
+ "int.y",
+ pytest.raises(
+ ConfigKeyError,
+ match=re.escape(
+ "Error trying to access int.y: node `int` is not a container "
+ "and thus cannot contain `y`"
+ ),
+ ),
+ id="non_container",
+ ),
+ ],
+)
+def test_select_error(cfg: Any, key: Any, exc: Any, struct: bool) -> None:
+ cfg = _ensure_container(cfg)
+ OmegaConf.set_struct(cfg, struct)
+ with exc:
+ OmegaConf.select(cfg, key)
+
Second, in ef61b9a I refactored test_select.py
so that all tests are within a common class parameterized on the struct
flag (otherwise it seemed a bit random)
Co-authored-by: Omry Yadan <[email protected]>
* Remove test_select_key_from_empty() which is already tested in test_select() * Move all tests inside a common class that is parameterized on the "struct" flag
re-request review when ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. a couple of nits about the news fragments.
otherwise we can merge this.
Co-authored-by: Omry Yadan <[email protected]>
* Interpolations are never considered to be missing anymore, even if they point to a missing node * When resolving an expression containing an interpolation pointing to a missing node, an `InterpolationToMissingValueError` exception is raised * When resolving an expression containing an interpolation pointing to a node that does not exist, an `InterpolationKeyError` exception is raised * `key in cfg` returns True whenever `key` is an interpolation, even if it cannot be resolved (including interpolations to missing nodes) * `get()` and `pop()` no longer return the default value in case of interpolation resolution failure (same thing for `OmegaConf.select()`) * If `throw_on_resolution_failure` is False, then resolving an interpolation resulting in a resolution failure always leads to the result being `None` (instead of potentially being an expression computed from `None`) Fixes omry#543 Fixes omry#561 Fixes omry#562 Fixes omry#565
Interpolations are never considered to be missing anymore, even if
they point to a missing node
When resolving an expression containing an interpolation pointing to a
missing node, an
InterpolationToMissingValueError
exception is raisedWhen resolving an expression containing an interpolation pointing to a
node that does not exist, an
InterpolationKeyError
exception is raisedkey in cfg
returns True wheneverkey
is an interpolation, even if it cannot be resolved (including interpolations to missing nodes)get()
andpop()
no longer return the default value in case of interpolation resolution failure (same thing forOmegaConf.select()
)If
throw_on_resolution_failure
is False, then resolving aninterpolation resulting in a resolution failure always leads to the
result being
None
(instead of potentially being an expression computedfrom
None
)Fixes #543
Fixes #561
Fixes #562
Fixes #565