-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecating auto-expand of nested dataclasses without default value in OmegaConf 2.0 #412
Comments
This makes sense. In my current use case: pytorch/hydra-torch#21, |
Sorry for digging this up, but I spent my day struggling with the updated behavior of nesting dataclasses. First, the difference between not assigning anything and an actual object instance is not documented in the manual. It is only implicitely hinted at by using factories in the examples. Second, the situation that lead to my confusion:
I find this result highly confusing and would very much prefer the old behavior because:
Here is an example: from dataclasses import dataclass
from omegaconf import OmegaConf, errors
from contextlib import suppress
@dataclass
class SubConfig:
x: int
y: str = "hi"
@dataclass
class Config:
s: SubConfig
c = OmegaConf.structured(Config)
print(c)
# {'s': '???'}
# merging c first works as expected...
print(OmegaConf.merge(c, dict(s=dict(x=3))))
# {'s': {'x': 3, 'y': 'hi'}}
# ... but I can't add new keys:
with suppress(errors.ConfigKeyError):
OmegaConf.merge(c, dict(s=dict(x=3), extra=5))
# when trying to merge the other way round, the nested field 'y' is
# carried into the resulting config, but the value is lost:
print(OmegaConf.merge(dict(s=dict(x=3)), c))
# {'s': {'x': 3, 'y': '???'}} What is the advantage of the updated behavior? The first post mentions that is should create less surprises, but I think actually the opposite is true. |
Pinging @Jasha10 since he seems more active recently in the repo. If you find time, please tell me your thoughts or what I did wrong. |
Hi @raphCode, Let me start here:
Being unable to add new keys is a feature of using structured configs; this feature is active whether there are missing values present or not. c_non_missing = OmegaConf.structured(Config(SubConfig(123)))
print(c_non_missing) # {'s': {'x': 123, 'y': 'hi'}}
with suppress(errors.ConfigKeyError):
OmegaConf.merge(c_non_missing, dict(s=dict(x=3), extra=5)) The same
|
Hi, thank you for your explanation. I agree that the example about adding new keys was a bit out of context. As a hydra user, I value structured configs because I can check the types and presence of values in a user specified config. For this validation to happen, hydra merges the structured config with the user config at some point. OmegaConf.merge(structured_config, user_config)
OmegaConf.merge(user_config, structured_config) I can control which one happens with the hydra defaults list.
I played around and noticed that this correctly merges nested default values, in other words fulfills requirement 2: OmegaConf.merge(structured_config, user_config) Please note this works even without default assignments for the nested dataclasses, see my first example here. For requirement 1 I need to swap the merge partners.
I now suspect that with the old pre-2.0 behavior, I would get the same results. |
To make sure I understand correctly: the point is that the following two print statements give different results, right? c = OmegaConf.structured(Config)
print(OmegaConf.merge(dict(s=dict(x=3)), c)) # {'s': {'x': 3, 'y': '???'}}
print(OmegaConf.merge(c, dict(s=dict(x=3)))) # {'s': {'x': 3, 'y': 'hi'}}
You might be right about this. We could also potentially regard this as an
In my own work I tend to guard against missing values being passed to my app by using one of the following OmegaConf routines:
|
Yes, I don't understand why they differ. My actual point is that I am complaining about the behavior change and don't see a benefit in the new way of handling nested dataclasses: Merge order discrepancyWhile I was experimenting, I discovered that omegaconf still seems to auto-expand structured configs sometimes, as you correctly summarized. Bad migration optionsThe recommended options suggest a dataclass instance to be created and assign Useless default assignmentsIt forces me to add default assignments to nested dataclasses, which I do not like because:
The dependency and instancing concern annoy me especially because I want use the dataclasses as a container for the config at runtime and for typechecking my code which accesses it. Structured configs are all about nestingThis is implicitly acknowledged because even without default assignments, the types of nested dataclasses are validated - or should I say auto-expanded? Default values in ListConfigAuto-expand lets me carry default values into a ListConfig with varying length: from attrs import define
from omegaconf import OmegaConf
@define
class SubConfig:
x: int
y: str = "hi"
@define
class Config:
s: list[SubConfig]
c = OmegaConf.structured(Config)
print(OmegaConf.merge(c, dict(s=[dict(x=3)]*2)))
# {'s': [{'x': 3, 'y': 'hi'}, {'x': 3, 'y': 'hi'}]} I found not way to perform this operation with the new behavior, since the list length must be given in the default assignment of the dataclass: s: list[SubConfig] = [SubConfig] # actual list instance with fixed length, does not adapt to user configs of differing lengths
s: list[SubConfig] = list[SubConfig] # Invalid value assigned: GenericAlias is not a ListConfig, list or tuple. |
This does feel unexpected. |
cc @odelalleau |
Yes I agree, I feel like this should probably be considered as a bug. The merge() code is somewhat complex and has a bunch of ad-hoc logic to handle the various scenarios it can encounter, and it's possible that this one is just not handled correctly right now. |
One motivation for keeping the current behavior is the special treatment of MISSING with respect to overwriting other arguents: if the second argument to OmegaConf.merge({'y': 'bye'}, {'x': 3, 'y': '???'}) # {'y': 'bye', 'x': 3}
OmegaConf.merge({'y': 'bye'}, {'x': 3, 'y': 'hi'}) # {'y': 'hi', 'x': 3} This means aggressively auto-expanding default arguments would have implications for multi-step merges. # current behavior:
>>> merge({'s': {'y': 'bye'}}, merge(dict(s=dict(x=3)), c))
{'s': {'y': 'bye', 'x': 3}}
# behavior after proposed change:
>>> merge({'s': {'y': 'bye'}}, merge(dict(s=dict(x=3)), c))
{'s': {'y': 'hi', 'x': 3}} Currently, |
That's a good point, but personally I would favor intuitive and useful behavior on single-step merges before worrying about multi-step merges (which, anyway, are just a sequence of single-step merges, so if we agree on what a single-step merge should do, hopefully that makes it clear what the multi-step should do as well). Also, I haven't double checked, but normally if |
Okay, I get the current behavior will probably stay. Nonetheless I would really like to see auto-expand functionality during merges, not only because it feels nicer, also because there are some use cases where there is just no alternative: How about using dataclass field metadata to opt-in to auto-expand during merges? |
This does the trick for me now: def merge_structured_config_defaults(cfg: Any) -> None:
"""
This function takes an OmegaConf Config and recursively merges the non-optional
default values of the underlying structured config classes in-place.
This is necessary because the user config may override important keys in the schema,
like _partial_ that control instantiation. Merging the schema defaults ensures the
correct values of these keys.
Keys set to None in the schema are Optional and may be overridden by the user, so
these are not replaced.
This manual implementation is necessary because Omegaconf disabled auto-expanding of
nested structured configs, otherwise merging the schema on top of the user config
would do the trick:
https://github.com/omry/omegaconf/issues/412
The proposed solutions are unnecessary verbose (default assignments) and worse, they
don't allow for default values to propagate into variable-length lists.
"""
if isinstance(cfg, DictConfig):
for key in cfg:
if not OmegaConf.is_missing(cfg, key):
merge_structured_config_defaults(cfg[key])
t = OmegaConf.get_type(cfg)
if omegaconf._utils.is_structured_config(t):
defaults = OmegaConf.structured(t)
for key in cfg:
if key in defaults and defaults[key] is not None:
OmegaConf.update(cfg, key, defaults[key]) # type: ignore [arg-type]
elif isinstance(cfg, ListConfig):
for item in cfg:
merge_structured_config_defaults(item) I don't mind if this function would be added to Omegaconf itself. |
Good catch. |
TLDR
You have a config with nested dataclass without a default value.
The behavior of this will change in OmegaConf 2.1 and you should give it a default value (either MISSING or an object of the appropriate type).
Read on to see why this is changing and what are the migration options.
Primitive fields in OmegaConf 2.0
Consider the following:
The
MISSING
values are required in OmegaConf 2.0. Creating a Structured Config fromUser1
with fields that does not have a default value will result in this error:Primitive fields in OmegaConf 2.1
OmegaConf 2.1 will allow that and will automatically treat
age
andname
asMISSING
, making this equivalent to the definition above:Nested dataclasses in OmegaConf 2.0
In contrast to primitive values like the ones above, OmegaConf 2.0 will not complain about nested data classes without a default value:
Instead, It will auto expand using the fields from the class User.
This behavior is surprising and is in conflict the behavior primitive fields in 2.1.
Nested dataclasses in OmegaConf 2.1
OmegaConf 2.1 will change this behavior to be in line with that for primitive fields:
The above will be equivalent to:
Migration options
If you are relying on the deprecated behavior you will need to assign an explicit value to create the same config.
Recommended option
Assign an actual object of the appropriate type:
Note that if you omit the default values for age and name like in
User2
, you will have to manually provide default values for Python to instantiate the object. You can still use MISSING:More compact but not recommended option:
An more compact alternative is to assign the type itself to the field:
Python type checkers will not be happy about it. use
# type: ignore
to allow it anyway.For this reason it's recommended to assign an actual object here.
The text was updated successfully, but these errors were encountered: