Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Structured Config back to dataclass instance #472

Closed
indigoviolet opened this issue Jan 16, 2021 · 5 comments · Fixed by #502
Closed

Convert Structured Config back to dataclass instance #472

indigoviolet opened this issue Jan 16, 2021 · 5 comments · Fixed by #502
Labels
enhancement New feature or request
Milestone

Comments

@indigoviolet
Copy link

Is your feature request related to a problem? Please describe.

OmegaConf.structured() returns a DictConfig, but it is recommended to duck type that as the original structured config type to get a static type checker to catch errors. But this seems like a leaky abstraction:

import attr
from omegaconf import OmegaConf
from typing import List

@attr.s(auto_attribs=True)
class Widget:
   foo: str
   bar: int

@attr.s(auto_attribs=True)
class Quux:
   widget_list: List[Widget]

In [19]:
qq: Quux = OmegaConf.structured(Quux(widget_list=[Widget(foo='a', bar=1)]))

In [27]:
type(qq.widget_list[0])
Out [27]:
omegaconf.dictconfig.DictConfig

def do_something(conf: Quux):
   print(conf.widget_list[0].foo)

For example, in the above do_something function, we are annotating conf: Quux, and this is ok as long as we don't do anything expecting conf.widget_list[0] to be an actual Widget instance. If I wanted to convert a conf.widget_list to JSON, I would do something different for a "real" Quux instance and for a duck-typed instance.

My naive understanding is that this is so you can add runtime validation for merging etc.

Describe the solution you'd like

I think it would be nice to have a way to re-instantiate the original type after merging etc:

qq = OmegaConf.instantiate(
   OmegaConf.merge(OmegaConf.structured(Quux), OmegaConf.load(...))
)
assert type(qq) is Quux

The benefit I'm imagining is that code that is annotated with Quux would not have any "leaks". Code that is concerned with manipulating configs can continue to use DictConfigs and their runtime features, but code that is using configs can work with the original types.

What stops us from doing so?

Describe alternatives you've considered

  • Another nice-to-have would be for OmegaConf.structured to return a DictConfig[Quux] -- ie. a generic so that this can be annotated clearly.

  • I think that if I were implementing this library from scratch, I would start with attrs/structured configs only, and then add the runtime validation via attrs's hooks like on_setattr etc. - we would never have to enter DictConfig-land. What are the problems with this approach (setting side the fact that dictconfigs are widely used already)?

Additional context
Add any other context or screenshots about the feature request here.

@omry
Copy link
Owner

omry commented Jan 16, 2021

Thanks for the suggestion.

OmegaConf.structured() returns a DictConfig, but it is recommended to duck type that as the original structured config type to get a static type checker to catch errors. But this seems like a leaky abstraction

Duck-typing is by definition imperfect. If you look closely enough you will discover that it's not really a duck.

My naive understanding is that this is so you can add runtime validation for merging etc.

Yes, and a handful of other features.

  • Missing field support
  • Interpolation
  • Custom resolvers
  • Merging
  • Runtime type safety for mutations
  • Recursive read-only support
    And probably a few other things I am not remembering now.

All of this comes at a cost:

  1. It makes it harder to do instance checks (you can do it using OmegaConf.get_type(cfg)
  2. There is some performance regression compares to the native objects. (negligible in most cases).

I think it would be nice to have a way to re-instantiate the original type after merging etc

This could be a nice addition, but one that has a significant surface area.
For example:

  1. OmegaConf supports an arbitrary nesting of untyped DictConfig with Structured Configs.
    You will want something recursive like OmegaConf.to_container().
  2. OmegaConf merging can sometimes result in an illegal object (with fields that are not present int he underlying class).
  3. Handling of missing fields will need to be defined, it will probably fail the the instantiation
  4. Handling of interpolations will need to be defined

While this comes up from time to time, most people are content with using DictConfig as a replacement to their dataclass instances.

What stops us from doing so?

This is not a high priority.
If you would like to add support for it (including tests and documentation updates), and can also commit to handling the long tail of issues I am expecting to see from supporting it - I am open to accepting a pull request adding it.
We should have in-depth discussions over some doc first to be sure we got a good understanding of the surface area.

Describe alternatives you've considered
Another nice-to-have would be for OmegaConf.structured to return a DictConfig[Quux] -- ie. a generic so that this can be annotated clearly.

OmegaConf.stuctured() is exactly the same as OmegaConf.create(), with the exception that the returned type is Any.

    @staticmethod
    def structured(
        obj: Any,
        parent: Optional[BaseContainer] = None,
        flags: Optional[Dict[str, bool]] = None,
    ) -> Any:
        return OmegaConf.create(obj, parent, flags)

OmegaConf.create() supports many kinds of inputs (lists, dicts, strings, dataclasses, dataclass instnaces etc).
I don't think what you are proposing can work without restricting OmegaConf.structured().

I think that if I were implementing this library from scratch, I would start with attrs/structured configs only, and then add the runtime validation via attrs's hooks like on_setattr etc. - we would never have to enter DictConfig-land. What are the problems with this approach (setting side the fact that dictconfigs are widely used already)?

OmegaConf evolved as a low level config library for yaml, then it gained some runtime features (interpolation, missing support, custom resolvers), and finally it gained type safety support in the form of Structured Configs.
I don't know what are the possible problems / limitation with an alternative evolution. feel free to experiment and share experience :).

@omry omry changed the title Structured configs and duck typing Convert Structured Config back to dataclass instance Jan 19, 2021
@indigoviolet
Copy link
Author

Cool thanks for the detailed explanation. There's a lot to explore and I'm sure I'll understand more as I gain experience with OmegaConf, I'm a fairly new user and haven't used it in all the ways it supports.

You said:

OmegaConf supports an arbitrary nesting of untyped DictConfig with Structured Configs.
You will want something recursive like OmegaConf.to_container().
OmegaConf merging can sometimes result in an illegal object (with fields that are not present int he underlying class).
Handling of missing fields will need to be defined, it will probably fail the the instantiation
Handling of interpolations will need to be defined

I wonder if most of these can stay as currently implemented, in "DictConfig-land", which might then allow a relatively simple conversion back to the original type - which would then give back the type safety. Here's a prototype I made and am experimenting with:

from typing import List, Optional, Type, TypeVar, Any, Dict, Union

import attr
from omegaconf import OmegaConf, DictConfig, ListConfig
import dataclasses

T = TypeVar("T")


from functools import singledispatch


@singledispatch
def _instantiate_dispatcher(obj):
    return obj


@_instantiate_dispatcher.register
def _dictconfig(obj: DictConfig) -> Union[DictConfig, T]:
    type_: Optional[Type[T]] = OmegaConf.get_type(obj)
    if type_ is not None:
        return instantiate(obj, type_)
    else:
        return obj


@_instantiate_dispatcher.register
def _listconfig(obj: ListConfig) -> List:
    return [_instantiate_dispatcher(v) for v in obj]


def instantiate(obj: DictConfig, type_: Optional[Type[T]] = None) -> T:
    """
    Usage:

    x: Retinanet = instantiate(OmegaConf.structured(Retinanet(...)))

    # See https://github.com/omry/omegaconf/issues/472

    """

    if type_ is None:
        type_ = OmegaConf.get_type(obj)
    else:
        assert type_ == OmegaConf.get_type(obj), f"Type mismatch {type_=} {obj=}"

    assert type_ is not None and (
        attr.has(type_) or dataclasses.is_dataclass(type_)
    ), f"Expected structured DictConfig, found {obj=} {type_=}"
    kwargs: Dict[str, Any] = {k: _instantiate_dispatcher(v) for k, v in obj.items()}
    return type_(**kwargs)  # type: ignore[call-arg]

Here's how I could use it:

@attr.s(auto_attribs=True)
class Quux:
     ...

qd: DictConfig = ... # this could be OmegaConf.structured, or OmegaConf.merge etc.

q1: Quux = instantiate(qd) # a "real" Quux object
q2 = instantiate(qd, Quux) # mypy knows this is a Quux  object

I will experiment with this and see what issues I discover and report back.

@omry
Copy link
Owner

omry commented Jan 22, 2021

Thanks for trying.

A few things:
For this to be accepted, it needs:

  1. A lot of tests covering all sorts of scenarios.
  2. Work for attr classes and dataclasses.

You are of course welcome to build your own version that works for your use case, but if you are serious about adding it to OmegaConf - I would start by creating some tests and then adding more and more tests as you go.
The tests should cover both dataclasses and attr classes.
Feel free to send a PR with partial work if you want feedback.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 30, 2021

I have been working on an implementation of this feature.

@indigoviolet feel free to collaborate on the pull request (#502), which is currently draft status.

@omry omry added enhancement New feature or request help wanted Extra attention is needed labels Feb 3, 2021
@omry omry added this to the OmegaConf 2.2 milestone Feb 22, 2021
@omry
Copy link
Owner

omry commented Feb 22, 2021

Bouncing this to 2.2, we are wrapping up 2.1.

@omry omry removed the help wanted Extra attention is needed label Mar 30, 2021
@omry omry modified the milestones: OmegaConf 2.2, OmegaConf 2.1 Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants