-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add force_add to OmegaConf.update(), effectively using open_dict for all nodes along the path #665
Conversation
def test_update_force_add( | ||
large_dict_config: Any, force_add: bool, benchmark: Any | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a benchmark to verify the excessive flag overrides are not causing performance regression.
The concern is that the recursive flags cache invalidation when a flag is set will slow things down.
It does not seem to cause a significant slowdown for a large dict_config here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's too much to worry here because in the worst case it would only invalidate the whole config cache once: after it's cleared, subsequent cache invalidation wouldn't trickle down anymore.
That being said, I'm not sure if large_dict_config
actually comes from a cache set on all config nodes, which would be needed to evaluate this worst case scenario. If that's not the case then the benchmark isn't very relevant (at least to evaluate the impact of cache invalidation on performance).
I also suggested an optimization that should reduce the amount of cache invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't notice the gating in _invalidate_flags_cache.
Good point about the benchmark not doing much.
I fixed the benchmark to populate the flags cache on all nodes. We don't have a baseline though.
Overall I am not too concerned about relative performance here though, at 400us per call on the big config this is plenty fast enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had time for a quick look, I'll need to re-review it more carefully, but already sharing a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just a few comments & suggestions
struct_override = False if force_add else root._get_node_flag("struct") | ||
with flag_override(root, "struct", struct_override): | ||
k = split[i] | ||
# if next_root is a primitive (string, int etc) replace it with an empty map | ||
next_root, key_ = _select_one(root, k, throw_on_missing=False) | ||
if not isinstance(next_root, Container): | ||
root[key_] = {} | ||
root = root[key_] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization by only calling flag_override
when needed:
struct_override = False if force_add else root._get_node_flag("struct") | |
with flag_override(root, "struct", struct_override): | |
k = split[i] | |
# if next_root is a primitive (string, int etc) replace it with an empty map | |
next_root, key_ = _select_one(root, k, throw_on_missing=False) | |
if not isinstance(next_root, Container): | |
root[key_] = {} | |
root = root[key_] | |
k = split[i] | |
# if next_root is a primitive (string, int etc) replace it with an empty map | |
next_root, key_ = _select_one(root, k, throw_on_missing=False) | |
if not isinstance(next_root, Container): | |
if force_add: | |
with flag_override(root, "struct", False): | |
root[key_] = {} | |
else: | |
root[key_] = {} | |
root = root[key_] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here will not apply cleanly, (see syntax error in 775).
In fact we do have a does_not_raise
context manager similar to nullcontext
in the tests. we can promote it to utils.
Feel free to followup with a PR optimizing both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, oops, looks like my mouse select skills need improving. I just added the missing bracket.
def test_update_force_add( | ||
large_dict_config: Any, force_add: bool, benchmark: Any | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's too much to worry here because in the worst case it would only invalidate the whole config cache once: after it's cleared, subsequent cache invalidation wouldn't trickle down anymore.
That being said, I'm not sure if large_dict_config
actually comes from a cache set on all config nodes, which would be needed to evaluate this worst case scenario. If that's not the case then the benchmark isn't very relevant (at least to evaluate the impact of cache invalidation on performance).
I also suggested an optimization that should reduce the amount of cache invalidation.
struct_override = False if force_add else root._get_node_flag("struct") | ||
with flag_override(root, "struct", struct_override): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar optimization could be done to avoid overriding for no reason. Something like:
struct_override = False if force_add else root._get_node_flag("struct") | |
with flag_override(root, "struct", struct_override): | |
ctx = flag_override(root, "struct", False) if force_add else nullcontext() | |
with ctx: |
However nullcontext
was only aldded in 3.7 so it would require writing our own implementation for 3.6 in _utils.py
.
@@ -739,7 +744,8 @@ def update( | |||
:param merge: If value is a dict or a list, True for merge, False for set. | |||
True to merge | |||
False to set | |||
None (default) : deprecation warning and default to False | |||
None (default): deprecation warning and default to False | |||
:param force_add: insert the entire path regardless of Struct flag or Structured Config nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be surprising for force_add
not to allow adding to readonly configs, is that intended?
(if it's changed, it may be easier to rename it into force
so that it can force both adding new values and overriding existing ones, as otherwise you'd need to make the difference between both situations for readonly configs, which seems tricky)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reported problem is specifically about adding a field to a Structured Config.
The original implementation in Hydra was using open_dict
on the root of the config and this was not sufficient because of Structured Configs are not inheriting the struct flag from their parent.
read-only is used as a mechanism to protect against changes to nodes here.
In contrast to the reported problem, using with read_write(cfg):
will work fine because it's behavior us "normal" and in fact frozen support for dataclasses is implemented by setting the recursive read-only flag on the DictConfig node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I feel like it may be possible to have a version of open_dict()
(or, more generally, flag_override()
) that behaves as if all children had their flag set to the desired value (which would avoid the issue of intermediate nodes causing trouble because of their local flags). It'd be a bit tricky to get it to work efficiently though (you don't want to actually propagate the desired flag to all children -- only those whose flags are being accessed within the context)...
Closes #664.