Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new oc.env and oc.decode resolvers #606

Merged
merged 32 commits into from
Mar 18, 2021

Conversation

odelalleau
Copy link
Collaborator

@odelalleau odelalleau commented Mar 15, 2021

  • Restore and deprecate the old env resolver for backward
    compatibility with OmegaConf 2.0

  • The new oc.env resolver keeps the string representation of
    environment variables, and does not use the cache

  • The new oc.decode resolver can be used to parse and evaluate strings
    according to the OmegaConf grammar

Fixes #383
Fixes #573
Fixes #574

Notes:

@odelalleau odelalleau marked this pull request as draft March 15, 2021 23:26
@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 15, 2021

Marked as draft as I want to change oc.decode to work with interpolations too. Initially I thought I needed something new (the key), but actually the _parent_ keyword seems enough => I will update the PR later and request review when ready.

@omry
Copy link
Owner

omry commented Mar 16, 2021

Didn't look at the code yet, but a question:
if you decode a list or a dict, are they convert to ListConfig and DictConfig now?
If so, is this the intention?

@odelalleau
Copy link
Collaborator Author

Didn't look at the code yet, but a question:
if you decode a list or a dict, are they convert to ListConfig and DictConfig now?
If so, is this the intention?

Oh, yes, that's a good point. As far as I am concerned this is intended, however this is a bit tricky to explain in the documentation because currently the fact that dict/list outputs are converted to DictConfig/ListConfig is documented after the default resolvers.

Any objection to moving things around? (i.e, first explain what resolvers are and how they behave, then talk about oc.env and oc.decode)

@odelalleau odelalleau requested a review from omry March 16, 2021 00:55
@omry
Copy link
Owner

omry commented Mar 16, 2021

Two points:

  1. I can imagine scenarios where certain resolvers would need to opt out of of the automatic conversion. in this case it's fine but imagine a resolver that wanted to return an object that is incompatible with OmegaConf inside a list?

  2. I think we should split the documentation of the built in resolvers from the documentation how you need to know when you write new resolvers. There are two audiences here. People using the resolvers is a superset of people writing resolvers. I think we can document each resolver based on it's behavior without linking to how this is achieved.
    In this particular case, we should just say that the oc.decode returns a transient config node when it sees list and dict. you can also demonstrate the usefulness by showing a usecase with interpolation in the decoded object.

@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 16, 2021

I requested review but that's for the code only -- doc still needs updating according to ongoing discussion

Edit: doc has been updated (just the notebook will need to be synced to the doc once more)

@omry
Copy link
Owner

omry commented Mar 16, 2021

Sure. I didn't review the code yet either. Will look tomorrow.

@odelalleau
Copy link
Collaborator Author

  1. I can imagine scenarios where certain resolvers would need to opt out of of the automatic conversion. in this case it's fine but imagine a resolver that wanted to return an object that is incompatible with OmegaConf inside a list?

If I'm not mistaken, right now the only way for someone to have an object in the config that isn't compatible with OmegaConf is by setting the allow_objects flag.
If this is correct and we are ok to keep it that way, then it can be used for this purpose, for instance:

class A:
    ...
    
OmegaConf.clear_resolvers()
OmegaConf.register_resolver("test", lambda: [A()])
c = OmegaConf.create({"x": "${test:}"})
c._set_flag("allow_objects", True)
assert isinstance(c.x[0], A)

It is true that in OmegaConf 2.0 a resolver could output anything which would get wrapped in a ValueNode. I'm not sure to which extent this use case was meant to be supported though. It seems a bit dangerous since it lets you do things that aren't meant to be possible without interpolations.

In this particular case, we should just say that the oc.decode returns a transient config node when it sees list and dict. you can also demonstrate the usefulness by showing a usecase with interpolation in the decoded object.

Ok, though technically it's hiding what's really happening in practice: oc.decode returns a list or dict, which then gets converted to a ListConfig / DictConfig at the last step of the interpolation resolution. The distinction may matter in more advanced use cases where oc.decode would be used as an intermediate resolver in nested interpolations, e.g. ${my_resolver:${oc.decode:${oc.env:VAR}}}.
It's probably fine (and better) not to go into these details though, so I'll go with the simpler explanation you suggested.

@omry
Copy link
Owner

omry commented Mar 16, 2021

If I'm not mistaken, right now the only way for someone to have an object in the config that isn't compatible with OmegaConf is by setting the allow_objects flag.

allow_objects is not something I want to push everywhere.
One limitation is that dataclass/attr classes and instances are considered config, not objects.
if someone returns an instance of a dataclass the conversion would convert it to a DictConfig even if it's not the intention.
custom resolvers are already enabling people to do things they can't do otherwise.

What I have in mind here is to add a flag when registering the resolver that would qualify it's behavior as converting or not converting the resulting containers to corresponding OmegaConf containers.

The distinction you are drawing about an intermediate resolver is something I want some clarification about:
I think you are combining two features here:

  1. validate against the underlying annotate type (which potentially means create a new object).
  2. convert a returned dict/list/tuple to a corresponding OmegaConf container. this is not connected to the annotation.

In my view, the second should be controlled by a flag when registering the resolver.

@odelalleau
Copy link
Collaborator Author

It's probably fine (and better) not to go into these details though, so I'll go with the simpler explanation you suggested.

I gave it a shot in d1229d6 (doc update). The notebook will also need to be updated but I'll wait until we settle on the documentation first.

@odelalleau odelalleau marked this pull request as ready for review March 16, 2021 02:03
@odelalleau
Copy link
Collaborator Author

odelalleau commented Mar 16, 2021

What I have in mind here is to add a flag when registering the resolver that would qualify it's behavior as converting or not converting the resulting containers to corresponding OmegaConf containers

Just to be clear, in that case, would the resulting node (wrapping the output) be a ValueNode or an AnyNode with allow_objects set to True? (both can work, but personally I was seeing ValueNode as an abstract class -- actually made it so recently, but this can be undone easily if needed)

The distinction you are drawing about an intermediate resolver is something I want some clarification about:
I think you are combining two features here:

  1. validate against the underlying annotate type (which potentially means create a new object).
  2. convert a returned dict/list/tuple to a corresponding OmegaConf container. this is not connected to the annotation.

In my view, the second should be controlled by a flag when registering the resolver.

What I mean is that (2) (and actually also (1), but let's ignore it) happens only at the very last step of interpolation resolution: we look at the final result, and if it's a dict/list, then we convert it into an OmegaConf container. But the output of an intermediate resolver is fed unchanged (*) as input to the next resolver in the chain. So in my example ${my_resolver:${oc.decode:${oc.env:VAR}}} => if VAR is set to "[1, 2, 3]" then my_resolver will get a plain list as input (not a ListConfig).

(*) actually this is incorrect (mentioning it for completeness, even if it doesn't matter in my example): we call _get_value() on any input of a resolver, so if a resolver somehow ouptuts a non-container Node that is fed to another resolver, the other resolver will see its value and not the Node object.

@odelalleau
Copy link
Collaborator Author

the output of an intermediate resolver is fed unchanged (*) as input to the next resolver in the chain. So in my example ${my_resolver:${oc.decode:${oc.env:VAR}}} => if VAR is set to "[1, 2, 3]" then my_resolver will get a plain list as input (not a ListConfig).

Thinking more about it, maybe it's better to change this behavior and systematically convert the ouptut of a resolver from dict/list to DictConfig/ListConfig, even for intermediate computations. The main advantage I see is that it is simpler to understand. In that case having a flag to control whether or not this conversion occurs is even more important.

So here is a suggestion, @omry please let me know if you'd prefer something different:

  • Add a flag convert_container_output: bool = True to register_new_resolver() to control this behavior
  • oc.decode would use this flag set to True (same thing for the upcoming oc.dict.values)
  • If the final result of an interpolation is a dict/list (which can only happen if it is ${foo:...} where foo was registered with convert_container_output=False), then wrap it within an AnyNode with allow_objects set to True (and also do this for any other unsupported type of resolver output)

@omry
Copy link
Owner

omry commented Mar 16, 2021

Wait with any changes related to it, have some other ideas which might change the direction of this discussion. will comment later.

@omry
Copy link
Owner

omry commented Mar 16, 2021

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.
If a function wants to return a DictConfig, it can do it:

OmegaConf.register_resolver("dict", lambda: {"a": 10})
OmegaConf.register_resolver("dictconfig", lambda _parent_: DictConfig({"a": 10}, parent=_parent_))

cfg = OmegaConf.create({
  "d" : "${dict:}",
  "dc" : "${dictconfig:}"
})
assert type(cfg.d) is dict
assert type(cfg.dc) is DictConfig

I did not try to use _parent_ like that, but if we do that things becomes significantly simpler as there are no more automatic container conversions.

We still need to think what this means for type based validation/conversion:

@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"

oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.

@odelalleau
Copy link
Collaborator Author

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.

True (I kinda liked the automatic conversion though, it seemed natural and more straightforward -- but maybe explicit is still better)

Assuming we go with this option, we still need to decide what type of node should wrap a plain dict/list (or any other non-standard object). Ok with AnyNode with allow_objects set to True, or do you prefer to get back to the old ValueNode?

We still need to think what this means for type based validation/conversion:

@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"

oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.

I don't think we should change the current type based validation/conversion: this mechanism has to happen as the very last step (it doesn't make sense to convert the ouptut of an intermediate resolver based on the node type), and it applies both to resolver interpolations and node interpolations (so I don't think resolver should worry about it).

@omry
Copy link
Owner

omry commented Mar 16, 2021

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.

True (I kinda liked the automatic conversion though, it seemed natural and more straightforward -- but maybe explicit is still better)

We can introduce a flag when registering the resolver that would do it automatically on demand if this turns out to be a common pattern. I think being explicit is better because it reduces surprises.

Assuming we go with this option, we still need to decide what type of node should wrap a plain dict/list (or any other non-standard object). Ok with AnyNode with allow_objects set to True, or do you prefer to get back to the old ValueNode?

Why do you need to wrap it at all?

We still need to think what this means for type based validation/conversion:

@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"

oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.

I don't think we should change the current type based validation/conversion: this mechanism has to happen as the very last step (it doesn't make sense to convert the ouptut of an intermediate resolver based on the node type), and it applies both to resolver interpolations and node interpolations (so I don't think resolver should worry about it).

Yes, I tend to agree.
I just wanted to mention that option. Those two topics are related.

@omry
Copy link
Owner

omry commented Mar 16, 2021

Why do you need to wrap it at all?

I guess for compatibility with regular interpolations that are returning a Node.
We can introduce a new DummyNode(ValueNode) for that purpose.
This would be more obvious if someone gets it when they think they got a real thing.

@odelalleau
Copy link
Collaborator Author

Why do you need to wrap it at all?

I guess for compatibility with regular interpolations that are returning a Node.
We can introduce a new DummyNode(ValueNode) for that purpose.
This would be more obvious if someone gets it when they think they got a real thing.

Generally speaking, _dereference_node() must return a Node.
I'd rather avoid creating a new type of ValueNode if we can re-use one...

@omry
Copy link
Owner

omry commented Mar 16, 2021

AnyNode with allow_objects=True works fine then.

@odelalleau
Copy link
Collaborator Author

AnyNode with allow_objects=True works fine then.

Ok sounds good. I'm planning to do this in a follow-up PR that will:

  1. Roll back the automatic conversion of dict / list outputs into DictConfig / ListConfig
  2. Add a new resolver oc.to_config (let me know if you prefer another name) that takes a dict/list as input and converts it into a DictConfig / ListConfig

@omry
Copy link
Owner

omry commented Mar 16, 2021

AnyNode with allow_objects=True works fine then.

Ok sounds good. I'm planning to do this in a follow-up PR that will:

  1. Roll back the automatic conversion of dict / list outputs into DictConfig / ListConfig
  2. Add a new resolver oc.to_config (let me know if you prefer another name) that takes a dict/list as input and converts it into a DictConfig / ListConfig

I think oc.create could make sense. this is a parallel to OmegaConf.create().
In fact, we can use it directly there, this will also enable support for creating from a yaml string.

@odelalleau
Copy link
Collaborator Author

I think oc.create could make sense. this is a parallel to OmegaConf.create().

Sounds like a plan!

@@ -6,6 +6,7 @@
import tempfile
import pickle
os.environ['USER'] = 'omry'
os.environ['USERID'] = '123456'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's enough to just say that environment variables are always returned as strings without showing an actual example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in c6a7c86

docs/source/usage.rst Outdated Show resolved Hide resolved
Comment on lines 414 to 416
>>> cfg = OmegaConf.create({
... 'database': {'password': '${env:DB_PASSWORD,abc123}'}
... 'database': {'password': '${oc.env:DB_PASSWORD,abc123}'}
... })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this to drive the point about quoting:

cfg = OmegaConf.create(
    {
        "database": {
            "password1": "${oc.env:DB_PASSWORD,abc123}",  # the string 'abc123'
            "password2": "${oc.env:DB_PASSWORD,'12345'}",  # the string '12345'
        },
    }
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e63e669

news/573.api_change Outdated Show resolved Hide resolved
omegaconf/_utils.py Show resolved Hide resolved
cfg["env_func"] = env_func # allows choosing which env resolver to use
cfg = _ensure_container(cfg)

# The legacy env resolver triggers a deprecation warning.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. add one test for directly testing the deprecation warning and ignore the warnings everywhere else.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, we should probably split test_interpolation into something like test_simple_interpolations.py and test_custom_resolvers.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. add one test for directly testing the deprecation warning and ignore the warnings everywhere else.

Done in a973d21 (I didn't add a new test since there are a couple of tests specific to the legacy env that still explicitly catch that warning, doesn't seem worth doing more refactoring since they will all go away in 2.2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, we should probably split test_interpolation into something like test_simple_interpolations.py and test_custom_resolvers.py.

Noted, will do in a follow-up PR

docs/source/usage.rst Outdated Show resolved Hide resolved
omegaconf/omegaconf.py Show resolved Hide resolved
docs/source/usage.rst Show resolved Hide resolved
Comment on lines 445 to 446
>>> def show(x):
... print(f"type: {type(x).__name__}, value: {x}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add this function to the top of the testsetup or somewhere else early on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it up and used it in more places in e090055
I didn't put it in testsetup because it doesn't appear in the doc and people may wonder exactly what is this function.

@omry
Copy link
Owner

omry commented Mar 16, 2021

I added a bunch of comments on intermediate diffs. be sure to check them as well.

@omry
Copy link
Owner

omry commented Mar 16, 2021

request review when ready.

@odelalleau
Copy link
Collaborator Author

I added a bunch of comments on intermediate diffs. be sure to check them as well.

They appeared in the review as far as I can tell.

@odelalleau odelalleau requested a review from omry March 16, 2021 23:10
@odelalleau
Copy link
Collaborator Author

I think this is a go, except the merge conflict :)

Funny, I just rebased on top of master without conflict... Just force-pushed.

Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one bites the dust!

@odelalleau odelalleau merged commit 074e8dc into omry:master Mar 18, 2021
odelalleau added a commit to odelalleau/omegaconf that referenced this pull request Mar 31, 2021
Some doc updates from omry#606 were not fully ported to the notebook.
odelalleau added a commit that referenced this pull request Mar 31, 2021
Some doc updates from #606 were not fully ported to the notebook.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants