Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Instantiate to support instantiating dataclasses #1236

Conversation

guillaumegenthial
Copy link

@guillaumegenthial guillaumegenthial commented Dec 24, 2020

Motivation

Address #1183

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Related Issues and PRs

#1183

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@facebook-github-bot
Copy link
Contributor

Hi @guillaumegenthial!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@guillaumegenthial guillaumegenthial force-pushed the refactor/instantiate branch 2 times, most recently from 8762d56 to e7ad10f Compare December 24, 2020 16:24
@omry
Copy link
Collaborator

omry commented Dec 24, 2020

Thanks @guillaumegenthial!
It will take a few weeks before I can really review it (It looks good at a quick glance but I will have a lot of feedback).
we can discuss some of my feedback on the chat to make things more efficient.

Don't forget to sign the CLA (see message from the bot above, I can't accept it otherwise).

@guillaumegenthial guillaumegenthial force-pushed the refactor/instantiate branch 2 times, most recently from cbaecfb to a1a8bae Compare December 24, 2020 16:38
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 24, 2020
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@guillaumegenthial guillaumegenthial force-pushed the refactor/instantiate branch 2 times, most recently from 103d599 to 3c7453d Compare December 24, 2020 17:15
@guillaumegenthial
Copy link
Author

guillaumegenthial commented Dec 24, 2020

@omry sure thing, don't hesitate to ping me when you have some time.

To summarize the changes

  1. remove _get_kwargs, put all the logic in instantiate
  2. handle edge cases carefully (depending on convert, config type, etc.)
  3. Add mode inheritance (parent modes are passed to the children when recursively instantiating)
  4. Add ParseMode (not necessary for this PR), that allows the user to avoid the conversion to OmegaConf container and operate on simple python containers (dict + list).

I updated the dataclass test, and I think it now behaves as we would expect. Some cases are counter intuitive (convert = NONE and nesting dict+dataclass or list+dataclass), as dataclasses instances have no choice but to be absorbed by the top-level OmegaConf container.

@omry
Copy link
Collaborator

omry commented Dec 24, 2020

Some high level feedback before I do a proper review based on your summary:

  1. remove _get_kwargs, put all the logic in instantiate
    Sure, it becomes very long though so it would still be good to break some parts into smaller functions to make it more readable.
    (Once the rest of the feedback is incorporated)
  1. Add mode inheritance (parent modes are passed to the children when recursively instantiating)

I am actually not sure about this one but I am guessing you can still override the mode in the child node?

  1. Add ParseMode (not necessary for this PR), that allows the user to avoid the conversion to OmegaConf container and operate on simple python containers (dict + list).

While I definitely want better support for simple containers. Few things:

  • This opens up a surface area that will need proper testing (and proper definition of what happens with interpolation and type safety in dict mode).
  • It's a user facing change that should be documented.
  • I think the parse mode can be determined automatically based on the type of the input config object. (dict or DictConfig).

We should probably split it to a followup PR once the initial changes settles in.

Finally, some of the CI tests are tailing (particularly plugins, which are being instantiated with instantiate(). be sure to take a look and fix the issues.

@guillaumegenthial
Copy link
Author

I am actually not sure about this one but I am guessing you can still override the mode in the child node?

Yes, children modes override the parent's modes if specified

We should probably split it to a followup PR once the initial changes settles in.

Definitely agree, I included it here because it's easy to do with the new implementation (it kind of demonstrates its flexibility)

Finally, some of the CI tests are tailing (particularly plugins, which are being instantiated with instantiate(). be sure to take a look and fix the issues.

I see some "version mismatch" errors, I will have a look at it to understand how instantiate has an impact there.

@guillaumegenthial
Copy link
Author

Ok, I'm pretty sure the failing ray launcher plugin tests are not caused by this PR.
A new version of ray was released 16 hours ago.
The ci instance seems to have both versions co-existing somehow (1.1.0, the new one and 1.0.1.post1, the previous one). Not sure why, maybe just some cache that's not been cleared? I see that the wheels paths are set explicitely in the code to /tmp/wheels/, maybe that's the reason...

@omry
Copy link
Collaborator

omry commented Dec 24, 2020

Yup, unrelated. I filed #1238 to track it.

@guillaumegenthial
Copy link
Author

Simple comment : ParseMode is one way to address #1200

@omry
Copy link
Collaborator

omry commented Dec 24, 2020

Simple comment : ParseMode is one way to address #1200

Certainly (assuming that use case is really not depending on type validation, interpolation or any other OmegaConf feature).

@omry omry changed the title Refactor Instantiate to support DataClasses Refactor Instantiate to support instantiating dataclasses Dec 25, 2020
@guillaumegenthial guillaumegenthial force-pushed the refactor/instantiate branch 6 times, most recently from 4f96230 to c1c5bfc Compare January 2, 2021 16:48
Copy link
Collaborator

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review pass.
see my comments.

meta:
At this point, switch to piling additional changes as new commits. it will make this easier to review.
We can squash everything before we merge.

hydra/utils.py Outdated
Comment on lines 149 to 147
_convert_container_targets_to_strings(kwargs)
_convert_container_targets_to_strings(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good opportunity to simplify this.
Now that OmegaConf supports objects, we should be able to just convert forward.
instead of converting types to strings (and then locate them again), we can convert everything to types.

Copy link
Author

@guillaumegenthial guillaumegenthial Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

EDIT: this works, what is below is incorrect

However it seems we can't really store types in OmegaConf containers.

from omegaconf import OmegaConf

class MyClass:
    pass

OmegaConf.create({"_target_": MyClass})

raises

UnsupportedValueType: Value 'MyClass' is not a supported primitive type
	full_key: _target_
	reference_type=Optional[Dict[Union[str, Enum], Any]]
	object_type=dict

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually my bad, I forgot the flag {"allow_objects"} so it works as expected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some ambiguity with dataclasses though

@dataclass
class MyClass:
    x: int


cfg = OmegaConf.create({"_target_": MyClass}, flags={"allow_objects": True})
print(cfg._target_)  # {'x': '???'}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That again, sounds like another case fox boxing the object.

config = _merge(config, kwargs) if kwargs else config

# Override parent modes from config if specified
# Cannot use pop with default because of OmegaConf typed config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the statement.
if DictConfig.pop() behaves differently than native dict pop(), please file an issue with a minimal repro against OmegaConf.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment on an opened issue omry/omegaconf#425.
I'm not sure this is really a bug, more like some type checking functionality of OmegaConf that forces us to "look before trying"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a separate issue at your request here omry/omegaconf#471

Comment on lines +245 to +253
if convert == ConvertMode.ALL or (
convert == ConvertMode.PARTIAL and config._metadata.object_type is None
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how PARTIAL is leaking into this area. somehow this feels off.

@guillaumegenthial
Copy link
Author

Thank you for reviewing @omry .
I pushed some changes addressing some of your comments.
Before addressing the others, we need to decide how we should treat kwargs, since it has a potentially big impact on the implementation (see my comment above).

@omry
Copy link
Collaborator

omry commented Jan 8, 2021

Thank you for reviewing @omry .
I pushed some changes addressing some of your comments.
Before addressing the others, we need to decide how we should treat kwargs, since it has a potentially big impact on the implementation (see my comment above).

Please repeat the comment.
I don't know which comment you are thinking about.
Also, at this point we can have another chat session.

@omry
Copy link
Collaborator

omry commented Jan 8, 2021

looks like there are some conflicts, can you rebase and fix?

@guillaumegenthial
Copy link
Author

I was referring to your comment

I think now that we support instantiating dictconfigs, we should treat passthrough dictconfigs as passthrough objects and not as configs.
I guess we can make an exception based on the presence of target but I am not sure it's worth the extra complexity.

My answer was

This is the main friction point.
I'd argue that this would be confusing to treat dataclass instances in overrides as objects and not configs.

  1. Instantiate should operate on config-like objects, why would the overrides already have some instantiated objects?
  2. DataClass become sources of confusion. If config is {"obj": {"x": 1, "y": 2}} and kwargs is obj=SomeDataClass(x=2), what do we expect? {"obj": {"x": 2, "y": 2}} (current behavior), or {"obj": SomeDataClass(x=2)
    Also, (not necessarily a good argument), it would make the implementation more complex.

The current implementation does

  1. merge kwargs fully into config
  2. forget about kwargs, only look at config

If we want to treat kwargs as objects, we can't do that because dataclasses instances merged into OmegaConf containers would be converted back to OmegaConf containers. This means that we should keep the kwargs alonside the config during the whole instantiation path.

And I agree, discussing live on the chat would be more efficient

@omry
Copy link
Collaborator

omry commented Jan 29, 2021

@guillaumegenthial,
I would like to get some progress on this, would you mind jumping to the chat at some point so we can discuss the remaining issues?

@omry
Copy link
Collaborator

omry commented Feb 24, 2021

Taking over. Thanks for working on this.

@omry omry closed this Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants