Introduce new `oc.env` and `oc.decode` resolvers #606

odelalleau · 2021-03-15T23:26:03Z

Restore and deprecate the old env resolver for backward
compatibility with OmegaConf 2.0
The new oc.env resolver keeps the string representation of
environment variables, and does not use the cache
The new oc.decode resolver can be used to parse and evaluate strings
according to the OmegaConf grammar

Fixes #383
Fixes #573
Fixes #574

Notes:

the diff for the notebook can be seen at https://odelalleau.github.io/diff/diff_notebook_decode.html
the diff of test_interpolation.py is hard to read because there was an indentation change => I created a dummy diff at https://github.com/odelalleau/omegaconf/pull/2/files that doesn't have the indentation change for better readability

odelalleau · 2021-03-15T23:35:19Z

Marked as draft as I want to change oc.decode to work with interpolations too. Initially I thought I needed something new (the key), but actually the _parent_ keyword seems enough => I will update the PR later and request review when ready.

omry · 2021-03-16T00:34:00Z

Didn't look at the code yet, but a question:
if you decode a list or a dict, are they convert to ListConfig and DictConfig now?
If so, is this the intention?

odelalleau · 2021-03-16T00:45:50Z

Didn't look at the code yet, but a question:
if you decode a list or a dict, are they convert to ListConfig and DictConfig now?
If so, is this the intention?

Oh, yes, that's a good point. As far as I am concerned this is intended, however this is a bit tricky to explain in the documentation because currently the fact that dict/list outputs are converted to DictConfig/ListConfig is documented after the default resolvers.

Any objection to moving things around? (i.e, first explain what resolvers are and how they behave, then talk about oc.env and oc.decode)

omry · 2021-03-16T00:55:32Z

Two points:

I can imagine scenarios where certain resolvers would need to opt out of of the automatic conversion. in this case it's fine but imagine a resolver that wanted to return an object that is incompatible with OmegaConf inside a list?
I think we should split the documentation of the built in resolvers from the documentation how you need to know when you write new resolvers. There are two audiences here. People using the resolvers is a superset of people writing resolvers. I think we can document each resolver based on it's behavior without linking to how this is achieved.
In this particular case, we should just say that the oc.decode returns a transient config node when it sees list and dict. you can also demonstrate the usefulness by showing a usecase with interpolation in the decoded object.

odelalleau · 2021-03-16T00:56:16Z

I requested review but that's for the code only -- doc still needs updating according to ongoing discussion

Edit: doc has been updated (just the notebook will need to be synced to the doc once more)

omry · 2021-03-16T01:00:30Z

Sure. I didn't review the code yet either. Will look tomorrow.

odelalleau · 2021-03-16T01:17:37Z

I can imagine scenarios where certain resolvers would need to opt out of of the automatic conversion. in this case it's fine but imagine a resolver that wanted to return an object that is incompatible with OmegaConf inside a list?

If I'm not mistaken, right now the only way for someone to have an object in the config that isn't compatible with OmegaConf is by setting the allow_objects flag.
If this is correct and we are ok to keep it that way, then it can be used for this purpose, for instance:

class A:
    ...
    
OmegaConf.clear_resolvers()
OmegaConf.register_resolver("test", lambda: [A()])
c = OmegaConf.create({"x": "${test:}"})
c._set_flag("allow_objects", True)
assert isinstance(c.x[0], A)

It is true that in OmegaConf 2.0 a resolver could output anything which would get wrapped in a ValueNode. I'm not sure to which extent this use case was meant to be supported though. It seems a bit dangerous since it lets you do things that aren't meant to be possible without interpolations.

In this particular case, we should just say that the oc.decode returns a transient config node when it sees list and dict. you can also demonstrate the usefulness by showing a usecase with interpolation in the decoded object.

Ok, though technically it's hiding what's really happening in practice: oc.decode returns a list or dict, which then gets converted to a ListConfig / DictConfig at the last step of the interpolation resolution. The distinction may matter in more advanced use cases where oc.decode would be used as an intermediate resolver in nested interpolations, e.g. ${my_resolver:${oc.decode:${oc.env:VAR}}}.
It's probably fine (and better) not to go into these details though, so I'll go with the simpler explanation you suggested.

omry · 2021-03-16T01:53:30Z

If I'm not mistaken, right now the only way for someone to have an object in the config that isn't compatible with OmegaConf is by setting the allow_objects flag.

allow_objects is not something I want to push everywhere.
One limitation is that dataclass/attr classes and instances are considered config, not objects.
if someone returns an instance of a dataclass the conversion would convert it to a DictConfig even if it's not the intention.
custom resolvers are already enabling people to do things they can't do otherwise.

What I have in mind here is to add a flag when registering the resolver that would qualify it's behavior as converting or not converting the resulting containers to corresponding OmegaConf containers.

The distinction you are drawing about an intermediate resolver is something I want some clarification about:
I think you are combining two features here:

validate against the underlying annotate type (which potentially means create a new object).
convert a returned dict/list/tuple to a corresponding OmegaConf container. this is not connected to the annotation.

In my view, the second should be controlled by a flag when registering the resolver.

odelalleau · 2021-03-16T02:03:40Z

It's probably fine (and better) not to go into these details though, so I'll go with the simpler explanation you suggested.

I gave it a shot in d1229d6 (doc update). The notebook will also need to be updated but I'll wait until we settle on the documentation first.

odelalleau · 2021-03-16T02:13:55Z

What I have in mind here is to add a flag when registering the resolver that would qualify it's behavior as converting or not converting the resulting containers to corresponding OmegaConf containers

Just to be clear, in that case, would the resulting node (wrapping the output) be a ValueNode or an AnyNode with allow_objects set to True? (both can work, but personally I was seeing ValueNode as an abstract class -- actually made it so recently, but this can be undone easily if needed)

The distinction you are drawing about an intermediate resolver is something I want some clarification about:
I think you are combining two features here:

validate against the underlying annotate type (which potentially means create a new object).

convert a returned dict/list/tuple to a corresponding OmegaConf container. this is not connected to the annotation.

In my view, the second should be controlled by a flag when registering the resolver.

What I mean is that (2) (and actually also (1), but let's ignore it) happens only at the very last step of interpolation resolution: we look at the final result, and if it's a dict/list, then we convert it into an OmegaConf container. But the output of an intermediate resolver is fed unchanged (*) as input to the next resolver in the chain. So in my example ${my_resolver:${oc.decode:${oc.env:VAR}}} => if VAR is set to "[1, 2, 3]" then my_resolver will get a plain list as input (not a ListConfig).

(*) actually this is incorrect (mentioning it for completeness, even if it doesn't matter in my example): we call _get_value() on any input of a resolver, so if a resolver somehow ouptuts a non-container Node that is fed to another resolver, the other resolver will see its value and not the Node object.

odelalleau · 2021-03-16T17:02:35Z

the output of an intermediate resolver is fed unchanged (*) as input to the next resolver in the chain. So in my example ${my_resolver:${oc.decode:${oc.env:VAR}}} => if VAR is set to "[1, 2, 3]" then my_resolver will get a plain list as input (not a ListConfig).

Thinking more about it, maybe it's better to change this behavior and systematically convert the ouptut of a resolver from dict/list to DictConfig/ListConfig, even for intermediate computations. The main advantage I see is that it is simpler to understand. In that case having a flag to control whether or not this conversion occurs is even more important.

So here is a suggestion, @omry please let me know if you'd prefer something different:

Add a flag convert_container_output: bool = True to register_new_resolver() to control this behavior
oc.decode would use this flag set to True (same thing for the upcoming oc.dict.values)
If the final result of an interpolation is a dict/list (which can only happen if it is ${foo:...} where foo was registered with convert_container_output=False), then wrap it within an AnyNode with allow_objects set to True (and also do this for any other unsupported type of resolver output)

omry · 2021-03-16T17:25:38Z

Wait with any changes related to it, have some other ideas which might change the direction of this discussion. will comment later.

omry · 2021-03-16T18:43:08Z

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.
If a function wants to return a DictConfig, it can do it:

OmegaConf.register_resolver("dict", lambda: {"a": 10})
OmegaConf.register_resolver("dictconfig", lambda _parent_: DictConfig({"a": 10}, parent=_parent_))

cfg = OmegaConf.create({
  "d" : "${dict:}",
  "dc" : "${dictconfig:}"
})
assert type(cfg.d) is dict
assert type(cfg.dc) is DictConfig

I did not try to use _parent_ like that, but if we do that things becomes significantly simpler as there are no more automatic container conversions.

We still need to think what this means for type based validation/conversion:

@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"

oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.

odelalleau · 2021-03-16T18:58:17Z

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.

True (I kinda liked the automatic conversion though, it seemed natural and more straightforward -- but maybe explicit is still better)

Assuming we go with this option, we still need to decide what type of node should wrap a plain dict/list (or any other non-standard object). Ok with AnyNode with allow_objects set to True, or do you prefer to get back to the old ValueNode?

We still need to think what this means for type based validation/conversion:
@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"
oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.

I don't think we should change the current type based validation/conversion: this mechanism has to happen as the very last step (it doesn't make sense to convert the ouptut of an intermediate resolver based on the node type), and it applies both to resolver interpolations and node interpolations (so I don't think resolver should worry about it).

omry · 2021-03-16T19:02:37Z

I realized that now that we support passing the _parent_ node, the function itself can do any conversions explicitly.

True (I kinda liked the automatic conversion though, it seemed natural and more straightforward -- but maybe explicit is still better)

We can introduce a flag when registering the resolver that would do it automatically on demand if this turns out to be a common pattern. I think being explicit is better because it reduces surprises.

Assuming we go with this option, we still need to decide what type of node should wrap a plain dict/list (or any other non-standard object). Ok with AnyNode with allow_objects set to True, or do you prefer to get back to the old ValueNode?

Why do you need to wrap it at all?

We still need to think what this means for type based validation/conversion:
@dataclass
class Foo:
   a : str = "10"
   b : int = "${a}"
oc.env can do the conversion based on the ref_type of parent, but this might be useful as a generic behavior for all resolvers.
I don't think we should change the current type based validation/conversion: this mechanism has to happen as the very last step (it doesn't make sense to convert the ouptut of an intermediate resolver based on the node type), and it applies both to resolver interpolations and node interpolations (so I don't think resolver should worry about it).

Yes, I tend to agree.
I just wanted to mention that option. Those two topics are related.

omry · 2021-03-16T19:20:09Z

Why do you need to wrap it at all?

I guess for compatibility with regular interpolations that are returning a Node.
We can introduce a new DummyNode(ValueNode) for that purpose.
This would be more obvious if someone gets it when they think they got a real thing.

odelalleau · 2021-03-16T20:02:38Z

Why do you need to wrap it at all?

I guess for compatibility with regular interpolations that are returning a Node.
We can introduce a new DummyNode(ValueNode) for that purpose.
This would be more obvious if someone gets it when they think they got a real thing.

Generally speaking, _dereference_node() must return a Node.
I'd rather avoid creating a new type of ValueNode if we can re-use one...

omry · 2021-03-16T20:08:08Z

AnyNode with allow_objects=True works fine then.

odelalleau · 2021-03-16T20:37:23Z

AnyNode with allow_objects=True works fine then.

Ok sounds good. I'm planning to do this in a follow-up PR that will:

Roll back the automatic conversion of dict / list outputs into DictConfig / ListConfig
Add a new resolver oc.to_config (let me know if you prefer another name) that takes a dict/list as input and converts it into a DictConfig / ListConfig

omry · 2021-03-16T20:40:27Z

AnyNode with allow_objects=True works fine then.

Ok sounds good. I'm planning to do this in a follow-up PR that will:

Roll back the automatic conversion of dict / list outputs into DictConfig / ListConfig

Add a new resolver oc.to_config (let me know if you prefer another name) that takes a dict/list as input and converts it into a DictConfig / ListConfig

I think oc.create could make sense. this is a parallel to OmegaConf.create().
In fact, we can use it directly there, this will also enable support for creating from a yaml string.

odelalleau · 2021-03-16T20:42:23Z

I think oc.create could make sense. this is a parallel to OmegaConf.create().

Sounds like a plan!

omry · 2021-03-16T20:46:16Z

docs/source/usage.rst

@@ -6,6 +6,7 @@
    import tempfile
    import pickle
    os.environ['USER'] = 'omry'
+    os.environ['USERID'] = '123456'


I think it's enough to just say that environment variables are always returned as strings without showing an actual example.

Removed in c6a7c86

docs/source/usage.rst

omry · 2021-03-16T20:59:34Z

docs/source/usage.rst

    >>> cfg = OmegaConf.create({
-    ...       'database': {'password': '${env:DB_PASSWORD,abc123}'}
+    ...       'database': {'password': '${oc.env:DB_PASSWORD,abc123}'}
    ... })


You can use this to drive the point about quoting:

cfg = OmegaConf.create( { "database": { "password1": "${oc.env:DB_PASSWORD,abc123}", # the string 'abc123' "password2": "${oc.env:DB_PASSWORD,'12345'}", # the string '12345' }, } )

Done in e63e669

news/573.api_change

omegaconf/_utils.py

omry · 2021-03-16T21:24:45Z

tests/test_interpolation.py

+        cfg["env_func"] = env_func  # allows choosing which env resolver to use
+        cfg = _ensure_container(cfg)
+
+        # The legacy env resolver triggers a deprecation warning.


ditto. add one test for directly testing the deprecation warning and ignore the warnings everywhere else.

side note, we should probably split test_interpolation into something like test_simple_interpolations.py and test_custom_resolvers.py.

ditto. add one test for directly testing the deprecation warning and ignore the warnings everywhere else.

Done in a973d21 (I didn't add a new test since there are a couple of tests specific to the legacy env that still explicitly catch that warning, doesn't seem worth doing more refactoring since they will all go away in 2.2)

side note, we should probably split test_interpolation into something like test_simple_interpolations.py and test_custom_resolvers.py.

Noted, will do in a follow-up PR

docs/source/usage.rst

omegaconf/omegaconf.py

docs/source/usage.rst

omry · 2021-03-16T21:33:22Z

docs/source/usage.rst

+    >>> def show(x):
+    ...     print(f"type: {type(x).__name__}, value: {x}")


We can probably add this function to the top of the testsetup or somewhere else early on.

I moved it up and used it in more places in e090055
I didn't put it in testsetup because it doesn't appear in the doc and people may wonder exactly what is this function.

omry · 2021-03-16T21:34:08Z

I added a bunch of comments on intermediate diffs. be sure to check them as well.

omry · 2021-03-16T22:53:47Z

request review when ready.

odelalleau · 2021-03-16T23:10:35Z

I added a bunch of comments on intermediate diffs. be sure to check them as well.

They appeared in the review as far as I can tell.

Co-authored-by: Omry Yadan <[email protected]>

odelalleau · 2021-03-18T02:39:53Z

I think this is a go, except the merge conflict :)

Funny, I just rebased on top of master without conflict... Just force-pushed.

omry

Another one bites the dust!

Some doc updates from omry#606 were not fully ported to the notebook.

Some doc updates from #606 were not fully ported to the notebook.

odelalleau marked this pull request as draft March 15, 2021 23:26

odelalleau requested a review from omry March 16, 2021 00:55

odelalleau marked this pull request as ready for review March 16, 2021 02:03

omry reviewed Mar 16, 2021

View reviewed changes

odelalleau mentioned this pull request Mar 16, 2021

Automatic validation of resolver arguments #612

Open

odelalleau requested a review from omry March 16, 2021 23:10

odelalleau and others added 19 commits March 17, 2021 22:37

Update doc on string interpolations

b4690e0

More readable test formatting

00245cf

Improve comment formatting

a68c595

Restore interpolation examples

a0a6ec7

Update docs/notebook/Tutorial.ipynb

2a7a756

Co-authored-by: Omry Yadan <[email protected]>

Update docs/source/usage.rst

ef9b254

Co-authored-by: Omry Yadan <[email protected]>

Update docs/source/usage.rst

a2401b3

Co-authored-by: Omry Yadan <[email protected]>

Rephrasing in doc

aa0170f

Use show() function in doc

c1100dc

Raise a KeyError instead of ValidationError for missing env variables

0a2725f

Remove handling of "null" as default in legacy env resolver

518f430

Update news

91b775e

Explicit typing for the default value of the oc.env resolver

5429c05

Use a more appropriate exception type

1f29fd9

Update tests/test_interpolation.py

48df088

Co-authored-by: Omry Yadan <[email protected]>

Safer markers for default values

3b7b4ec

Fix coverage

3107001

Use more appropriate TypeError

5dcda42

Refactor: consistent use of _DEFAULT_MARKER_

4800df9

odelalleau force-pushed the new_env_and_decode branch from 8862343 to 4800df9 Compare March 18, 2021 02:39

omry approved these changes Mar 18, 2021

View reviewed changes

odelalleau merged commit 074e8dc into omry:master Mar 18, 2021

This was referenced Mar 19, 2021

[ON HOLD] Simpler escaping in the grammar, and fix to quoted values #621

Closed

Follow-up plan regarding changes to resolvers #535

Closed

Add new resolvers oc.dict.keys and oc.dict.values #644

Merged

Add new resolver oc.create #645

Closed

odelalleau added a commit to odelalleau/omegaconf that referenced this pull request Mar 31, 2021

Synch notebook with doc

ffbcb00

Some doc updates from omry#606 were not fully ported to the notebook.

odelalleau mentioned this pull request Mar 31, 2021

Synch notebook with doc #655

Merged

odelalleau added a commit that referenced this pull request Mar 31, 2021

Synch notebook with doc (#655)

ac0288d

Some doc updates from #606 were not fully ported to the notebook.

		>>> def show(x):
		... print(f"type: {type(x).__name__}, value: {x}")

Introduce new oc.env and oc.decode resolvers #606

Introduce new oc.env and oc.decode resolvers #606

Conversation

odelalleau commented Mar 15, 2021 • edited Loading

odelalleau commented Mar 15, 2021 • edited Loading

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021 • edited Loading

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

odelalleau commented Mar 16, 2021 • edited Loading

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

omry commented Mar 16, 2021 • edited Loading

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omry commented Mar 16, 2021

omry commented Mar 16, 2021

odelalleau commented Mar 16, 2021

odelalleau commented Mar 18, 2021

omry left a comment

Choose a reason for hiding this comment

Introduce new `oc.env` and `oc.decode` resolvers #606

Introduce new `oc.env` and `oc.decode` resolvers #606

odelalleau commented Mar 15, 2021 •

edited

Loading

odelalleau commented Mar 15, 2021 •

edited

Loading

odelalleau commented Mar 16, 2021 •

edited

Loading

odelalleau commented Mar 16, 2021 •

edited

Loading

omry commented Mar 16, 2021 •

edited

Loading