Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major changes to interpolations and resolvers #445

Merged
merged 77 commits into from
Feb 12, 2021

Conversation

odelalleau
Copy link
Collaborator

@odelalleau odelalleau commented Nov 26, 2020

This is a squashed version of #321.

  • Add a grammar to parse interpolations
  • Add support for nested interpolations
  • Deprecate register_resolver() and introduce register_new_resolver() (that allows resolvers to use non-string arguments, and to decide whether or not to use the cache)
  • The env resolver now parses environment variables in a way that is consistent with the new interpolation grammar

Fixes #100 #230 #266 #318

@lgtm-com
Copy link

lgtm-com bot commented Nov 26, 2020

This pull request introduces 1 alert when merging ddb57bb into 800e8a7 - view on LGTM.com

new alerts:

  • 1 for Unused import

omegaconf/base.py Outdated Show resolved Hide resolved
omegaconf/base.py Outdated Show resolved Hide resolved
@odelalleau
Copy link
Collaborator Author

This pull request introduces 1 alert when merging ddb57bb into 800e8a7 - view on LGTM.com

new alerts:

  • 1 for Unused import

Just a note that I fixed that in a force push (it was making flake8 tests fail as well) -- this was caused by my rebase on top of master

@omry
Copy link
Owner

omry commented Nov 28, 2020

looks like you have conflicts with master. can you fix?

@odelalleau odelalleau force-pushed the interpolation_grammar branch from ee83314 to ad826e7 Compare November 28, 2020 20:08
Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, made a review pass.
lots of comments and questions inline. overall looks good.
(Must say this was easier to review now, in part because of the previous review passes in the other PR but also due to the fact that reviewing it as one unit is easier.

docs/notebook/Tutorial.ipynb Outdated Show resolved Hide resolved
docs/source/usage.rst Outdated Show resolved Hide resolved
Comment on lines +377 to +391
Environment variables are parsed when they are recognized as valid quantities that
may be evaluated (e.g., int, float, dict, list):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go over why we think (sometimes) parsing the environment variables as opposed to always passing them as is in a string is the the best choice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until yesterday I thought that otherwise there would no way to write something like:

learning_rate: ${env:LR}

and have learning_rate be a float.

That being said, I just realized that {Integer,Float,Boolean}Node actually convert from string, so maybe this would work with structured configs?
(although I'm not entirely sure why this is allowed in the first place, my first instinct would be that assigning a string to an int/float/bool node should raise an exception)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can check and see for yourself :).
I think the answer is no, because the result if the interpolation is returned directly and is not assigned to the config.

Converting on assignment was introduced really early and may have been a mistake.
The motivation is to support converting during merge: in particular merging from cli, where everything is a string.

We should consider deprecating support for assignments that mypy would frowns upon if it knew the type.

@dataclass
class Foo:
  x: int = 10

cfg: Foo = OmegaConf.structured(Foo)
cfg.x = "20"  # mypy would bitch here even though it's legal in OmegaConf.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider deprecating support for assignments that mypy would frowns upon if it knew the type.

I created #459 to keep track of this

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until yesterday I thought that otherwise there would no way to write something like:

learning_rate: ${env:LR}

and have learning_rate be a float.

Issue 488 is a recent report about this.
I think we should consider validating the return value of custom interpolations against the specified type.
This is a breaking change, so switching to new resolvers is a good opportunity (with a potential opt-out flag - maybe).
As a POC, I gave it a shot in #506 (based on the current interpolation logic). It adds validation for nodes that are annotated (ValueNodes only, and only if they are not AnyNode).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this actually doesn't work, I'll look into fixing it. Just to be sure, do we want the following to work:

@dataclass
class Cfg:
    s: str = "7"
    i: int = II("s")

?
(the question is, since s can be cast to int, is it ok to do it implicitly here as well?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I reverted the changes that were (attempting at) addressing #488. I will do that in a follow-up PR instead, to avoid adding too much stuff here (since it's already way too big).

As a result, I also believe that it's best to also revisit the behavior of the env resolver in another follow-up PR, so that when env is changed to only return strings, then at least typed nodes keep working as expected (even without a new decode resolver).

@omry if you're ok with that plan then this conversation may be resolved.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolving.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify:
There are multiple things here that should probably be changed.

  1. Support for type annotation for resolvers (with runtime validation).
  2. Env resolver should no longer parse input environment variables.
  3. Potentially new built in resolver oc.decode() that can parse an input with the grammar parser.

488 is about 1.
2 and 3 are related.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, they will be addressed in that order in future PRs

docs/source/usage.rst Outdated Show resolved Hide resolved
news/445.feature.2 Outdated Show resolved Hide resolved
omegaconf/listconfig.py Outdated Show resolved Hide resolved
omegaconf/omegaconf.py Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
tests/test_interpolation.py Outdated Show resolved Hide resolved
# same as the definition (or `OmegaConf.create()` called on it for lists
# and dictionaries).
# Order matters! (each entry should only depend on those above)
TEST_CONFIG_DATA: List[Tuple[str, Any, Any]] = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see some parser specific unit tests (testing the parser directly without going through the higher layers).
Did I miss it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't realize it was possible until later, at which point I had already written all tests this way.
Do you want me to refactor them?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one.
Yes - I think having lower level tests will make the tests more robust and easier to work with.
Once you have lower level tests, this entire function can probably be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored these tests in b520815

Essentially I split the big list into 3:

  • Stuff that can be parsed without interpolations (to test the basic constructs of the grammar, like int, float, str, dict, etc.). Those can be parsed without using any config object.
  • Expressions containing interpolations for the parser rule singleElement (this is the biggest chunk)
  • Finally, expressions for the highest-level parser rule configValue

In the end it still looks pretty similar overall, but at least it's faster and avoids using dummy resolvers everywhere just to test the correct parser rule. Let me know if that looks ok to you.

NB: in a first attempt I tried to go even lower in the rules (e.g., using the dictContainer parser rule for dicts), but the problem is that without an EOF in the parser rule, ANTLR fails to detect some errors (ex: {0: 1}} is not supposed to be a valid dict). That's why I stuck to the singleElement and configValue rules (that contain the EOF).

@odelalleau odelalleau force-pushed the interpolation_grammar branch from 3ada218 to a8ff6f9 Compare November 29, 2020 15:21
@odelalleau
Copy link
Collaborator Author

Regarding #448, currently it doesn't work because @ is not a valid character in IDs. I can easily fix it but I'd like to clarify a few things first:

  1. @ should be only allowed in key names, not resolver names, correct?
  2. @ should be allowed at any position in the key name, correct?
  3. Is there any plan to support @ in keys in Hydra's override grammar? Currently this seems to conflict with the group@pkg syntax

@omry
Copy link
Owner

omry commented Dec 7, 2020

Regarding #448, currently it doesn't work because @ is not a valid character in IDs. I can easily fix it but I'd like to clarify a few things first:

  1. @ should be only allowed in key names, not resolver names, correct?
  2. @ should be allowed at any position in the key name, correct?
  3. Is there any plan to support @ in keys in Hydra's override grammar? Currently this seems to conflict with the group@pkg syntax

This is an extension of group@pkg, allowing an empty package in Hydra.
It's not landed yet, but you can see it here.

group and group@pkg are now going to be supported as interpolation keys in the defaults list (this is non-standard interpolation support). to enable that OmegaConf need to accept @ as a legal character in config keys.
This can also be useful for things like emails. from the perspective of OmegaConf it's just another valid character.
Let me know if you have any more questions.

  1. Yes
  2. Yes
  3. No conflict, this is the same, just allowing for an empty package.

@odelalleau
Copy link
Collaborator Author

odelalleau commented Dec 7, 2020

  1. Yes
  2. Yes
  3. No conflict, this is the same, just allowing for an empty package.

Ok thanks, I added this in 3af2f85 (edited: initially it was d959a13 but I realized I could make the grammar more readable) (also added another commit, af3870d, to add a missing test for @ character in unquoted strings, which was already supported)

Regarding Hydra, I tried synching facebookresearch/hydra#1170 with a config.yaml looking like:

foo:
  x@y: youpla

and run the command line

python my_app.py foo.x@y=boom

and Hydra complains with Override foo.x@y=boom looks like a config group override, but config group 'foo.x' does not exist.
From what you said I suspect this is behaving as expected, but I still thought I'd mention it just in case.

@odelalleau odelalleau force-pushed the interpolation_grammar branch from d959a13 to 3af2f85 Compare December 7, 2020 04:44
@omry
Copy link
Owner

omry commented Dec 7, 2020

Commented in one of the commits, this seems a bit more complicated than I would have wanted.

About Hydra:
Yes, this is the expected behavior. you can't override keys with @ in them from the command line because the syntax is reserved. if it becomes a real problem we can support escaping of the @.

#1170 is still work in progress, the new logic is not integrated there yet.
Interpolation in the defaults list will be special. You can see a bit about it here.

Here is a concrete test that has an interpolation with @ in the key.

@odelalleau
Copy link
Collaborator Author

Commented in one of the commits, this seems a bit more complicated than I would have wanted.

Link to the discussion for reference: d959a13#r44887061

Yes, this is the expected behavior. you can't override keys with @ in them from the command line because the syntax is reserved. if it becomes a real problem we can support escaping of the @.

Ok, just wanted to be sure!

Here is a concrete test that has an interpolation with @ in the key.

I see, thanks!

@odelalleau odelalleau force-pushed the interpolation_grammar branch from 200fef2 to 9e1bc41 Compare December 9, 2020 02:49
@odelalleau
Copy link
Collaborator Author

I did a force push to replace my old commit adding support for @ by another one (9e1bc41) that adds support for almost any character, as discussed in 200fef2#r44900426 (compared to this comment, I also added \ to the list of forbidden characters, in case we might want later to add some kind of escaping).

One open question is whether we really want to allow more varied dictionary keys: see discussion at 200fef2#commitcomment-44939891

Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recent batch of changes looks good. I still need to have a second pass on your responses to everything else.

omegaconf/_utils.py Show resolved Hide resolved
omegaconf/grammar/OmegaConfGrammarParser.g4 Show resolved Hide resolved
Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial pass. Got about a million lines left to review :)

docs/notebook/Tutorial.ipynb Outdated Show resolved Hide resolved
docs/notebook/Tutorial.ipynb Outdated Show resolved Hide resolved
docs/notebook/Tutorial.ipynb Outdated Show resolved Hide resolved
docs/notebook/Tutorial.ipynb Outdated Show resolved Hide resolved
docs/source/usage.rst Show resolved Hide resolved
news/445.feature.2 Outdated Show resolved Hide resolved
omegaconf/_utils.py Outdated Show resolved Hide resolved
omegaconf/_utils.py Outdated Show resolved Hide resolved
omegaconf/_utils.py Show resolved Hide resolved
omegaconf/base.py Show resolved Hide resolved
Copy link
Owner

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got most of it, but ran out of energy for now.

docs/source/usage.rst Show resolved Hide resolved
docs/source/usage.rst Outdated Show resolved Hide resolved
Comment on lines +377 to +391
Environment variables are parsed when they are recognized as valid quantities that
may be evaluated (e.g., int, float, dict, list):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until yesterday I thought that otherwise there would no way to write something like:

learning_rate: ${env:LR}

and have learning_rate be a float.

Issue 488 is a recent report about this.
I think we should consider validating the return value of custom interpolations against the specified type.
This is a breaking change, so switching to new resolvers is a good opportunity (with a potential opt-out flag - maybe).
As a POC, I gave it a shot in #506 (based on the current interpolation logic). It adds validation for nodes that are annotated (ValueNodes only, and only if they are not AnyNode).

docs/source/usage.rst Show resolved Hide resolved
docs/source/usage.rst Show resolved Hide resolved
omegaconf/base.py Show resolved Hide resolved
omegaconf/base.py Outdated Show resolved Hide resolved
omegaconf/base.py Show resolved Hide resolved
omegaconf/base.py Outdated Show resolved Hide resolved
omegaconf/base.py Show resolved Hide resolved
@omry
Copy link
Owner

omry commented Feb 3, 2021

This is also closing #266, right?

@odelalleau odelalleau force-pushed the interpolation_grammar branch from 5c03808 to 5cb75f6 Compare February 11, 2021 23:53
@omry omry merged commit 499de6b into omry:master Feb 12, 2021
@odelalleau odelalleau deleted the interpolation_grammar branch February 12, 2021 03:23
@omry
Copy link
Owner

omry commented Feb 17, 2021

Your fixed string:

Fixes #100 #230 #266 #318

Only closed 100.
I think it works with a comma.

@odelalleau
Copy link
Collaborator Author

Your fixed string:

Fixes #100 #230 #266 #318

Only closed 100.
I think it works with a comma.

Oops, thanks for taking care of it! Looks like we actually need to repeat the "Fixes / Closes" keyword (https://stackoverflow.com/questions/3547445/closing-multiple-issues-in-github-with-a-commit-message)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] resolve individual resolver arguments
2 participants