Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force_add to OmegaConf.update(), effectively using open_dict for all nodes along the path #665

Merged
merged 5 commits into from
Apr 7, 2021

Conversation

omry
Copy link
Owner

@omry omry commented Apr 7, 2021

Closes #664.

@omry omry requested review from odelalleau and Jasha10 April 7, 2021 01:06
Comment on lines +155 to +157
def test_update_force_add(
large_dict_config: Any, force_add: bool, benchmark: Any
) -> None:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a benchmark to verify the excessive flag overrides are not causing performance regression.
The concern is that the recursive flags cache invalidation when a flag is set will slow things down.
It does not seem to cause a significant slowdown for a large dict_config here though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's too much to worry here because in the worst case it would only invalidate the whole config cache once: after it's cleared, subsequent cache invalidation wouldn't trickle down anymore.
That being said, I'm not sure if large_dict_config actually comes from a cache set on all config nodes, which would be needed to evaluate this worst case scenario. If that's not the case then the benchmark isn't very relevant (at least to evaluate the impact of cache invalidation on performance).

I also suggested an optimization that should reduce the amount of cache invalidation.

Copy link
Owner Author

@omry omry Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't notice the gating in _invalidate_flags_cache.
Good point about the benchmark not doing much.

I fixed the benchmark to populate the flags cache on all nodes. We don't have a baseline though.
Overall I am not too concerned about relative performance here though, at 400us per call on the big config this is plenty fast enough.

docs/source/usage.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had time for a quick look, I'll need to re-review it more carefully, but already sharing a few comments

docs/source/usage.rst Outdated Show resolved Hide resolved
news/664.feature Outdated Show resolved Hide resolved
omegaconf/omegaconf.py Show resolved Hide resolved
tests/test_update.py Show resolved Hide resolved
Copy link
Owner Author

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

omegaconf/omegaconf.py Show resolved Hide resolved
tests/test_update.py Show resolved Hide resolved
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a few comments & suggestions

Comment on lines +766 to +773
struct_override = False if force_add else root._get_node_flag("struct")
with flag_override(root, "struct", struct_override):
k = split[i]
# if next_root is a primitive (string, int etc) replace it with an empty map
next_root, key_ = _select_one(root, k, throw_on_missing=False)
if not isinstance(next_root, Container):
root[key_] = {}
root = root[key_]
Copy link
Collaborator

@odelalleau odelalleau Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization by only calling flag_override when needed:

Suggested change
struct_override = False if force_add else root._get_node_flag("struct")
with flag_override(root, "struct", struct_override):
k = split[i]
# if next_root is a primitive (string, int etc) replace it with an empty map
next_root, key_ = _select_one(root, k, throw_on_missing=False)
if not isinstance(next_root, Container):
root[key_] = {}
root = root[key_]
k = split[i]
# if next_root is a primitive (string, int etc) replace it with an empty map
next_root, key_ = _select_one(root, k, throw_on_missing=False)
if not isinstance(next_root, Container):
if force_add:
with flag_override(root, "struct", False):
root[key_] = {}
else:
root[key_] = {}
root = root[key_]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here will not apply cleanly, (see syntax error in 775).
In fact we do have a does_not_raise context manager similar to nullcontext in the tests. we can promote it to utils.
Feel free to followup with a PR optimizing both places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, oops, looks like my mouse select skills need improving. I just added the missing bracket.

Comment on lines +155 to +157
def test_update_force_add(
large_dict_config: Any, force_add: bool, benchmark: Any
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's too much to worry here because in the worst case it would only invalidate the whole config cache once: after it's cleared, subsequent cache invalidation wouldn't trickle down anymore.
That being said, I'm not sure if large_dict_config actually comes from a cache set on all config nodes, which would be needed to evaluate this worst case scenario. If that's not the case then the benchmark isn't very relevant (at least to evaluate the impact of cache invalidation on performance).

I also suggested an optimization that should reduce the amount of cache invalidation.

Comment on lines +785 to +786
struct_override = False if force_add else root._get_node_flag("struct")
with flag_override(root, "struct", struct_override):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar optimization could be done to avoid overriding for no reason. Something like:

Suggested change
struct_override = False if force_add else root._get_node_flag("struct")
with flag_override(root, "struct", struct_override):
ctx = flag_override(root, "struct", False) if force_add else nullcontext()
with ctx:

However nullcontext was only aldded in 3.7 so it would require writing our own implementation for 3.6 in _utils.py.

tests/test_update.py Show resolved Hide resolved
@@ -739,7 +744,8 @@ def update(
:param merge: If value is a dict or a list, True for merge, False for set.
True to merge
False to set
None (default) : deprecation warning and default to False
None (default): deprecation warning and default to False
:param force_add: insert the entire path regardless of Struct flag or Structured Config nodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be surprising for force_add not to allow adding to readonly configs, is that intended?

(if it's changed, it may be easier to rename it into force so that it can force both adding new values and overriding existing ones, as otherwise you'd need to make the difference between both situations for readonly configs, which seems tricky)

Copy link
Owner Author

@omry omry Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported problem is specifically about adding a field to a Structured Config.
The original implementation in Hydra was using open_dict on the root of the config and this was not sufficient because of Structured Configs are not inheriting the struct flag from their parent.

read-only is used as a mechanism to protect against changes to nodes here.
In contrast to the reported problem, using with read_write(cfg): will work fine because it's behavior us "normal" and in fact frozen support for dataclasses is implemented by setting the recursive read-only flag on the DictConfig node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I feel like it may be possible to have a version of open_dict() (or, more generally, flag_override()) that behaves as if all children had their flag set to the desired value (which would avoid the issue of intermediate nodes causing trouble because of their local flags). It'd be a bit tricky to get it to work efficiently though (you don't want to actually propagate the desired flag to all children -- only those whose flags are being accessed within the context)...

@omry omry merged commit 9e7df56 into master Apr 7, 2021
@omry omry deleted the update_force_add branch April 7, 2021 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow OmegaConf.update to ignore struct flag and structured configs and add anyway
2 participants