-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OmegaConf 2.1] Grammar for parsing of interpolations #321
Conversation
A few comments about the main differences wrt Hydra's grammar (https://github.com/facebookresearch/hydra/blob/master/hydra/grammar/Override.g4):
A couple more points not related to differences with Hydra:
|
Cool. I am a bit torn about +3 because it's not really useful.
Ok, although I am not sure this is too useful: those numbers generally too small to justify
I basically wanted to whitelist special characters there to ensure there is no ambiguity with the rest of the grammar.
Yup, I believe I mentioned it at one of the messages when we discussed this.
You are specifically why I even have interpolation parsing in the Hydra grammar, right?
Can you explain more how those are different than the current interpolations? I think I know what you mean but from those examples alone I can't be sure. |
grammar/examples.py
Outdated
"${foo:{'a': 0}}", | ||
"{${foo}: ${bar}}", | ||
"${foo:{${bar}: 0, ${baz}: 1}, 2}_abc", | ||
"{ab_${foo}: c_${baz}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too much at this point, I am not expecting people to register so many functions that this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing a pass through all unresolved conversations (I'll let you resolve them when you are good).
The syntax ${ab_${foo}:...}
is not allowed anymore by the grammar, so this should be ok now.
https://gist.github.com/a76e9c4cd076bf10804d8287814883ef overrides_file parses this input:
|
The core of the change is switching from the lexer token to a parser rule:
|
Probably not (same for
Ok I think I see what you mean, though I'd need to see specific examples of where it's useful to be sure.
Not just interpolations. Why even parse dictionaries in the first place? Shouldn't it be handled by OmegaConf?
Currently there are two main types of interpolations, based on your previous explanation in #308 (comment):
The new interpolations I'm proposing don't fit in either of these categories, which is why I'm suggesting to create a new category for them. They would be any list or dict containing an interpolation (possibly deeply nested, e.g. I'm expecting that these new interpolations should come "for free" once the grammar parsing code is complete (since we already need to be able to parse lists / dicts as arguments of custom resolvers for instance). |
I ran it against my test strings and ran into some failure cases. Note that I consider it a failure when it goes through the
Also among your strings there are actually some failures (related to the above-mentioned warnings):
On the other hand all seem to run fine through my current grammar. At this point since what I have now seems to provide a better coverage of potential use cases, I'm going to go ahead with it for now so as to have something working, then once we have a set of tests it'll be easier to make things simpler (I'll keep in mind #321 (comment) though, as dropping support for this could help simplify parts of my current grammar). |
It makes it inconsistent with Hydra which is much more important to me than Python inconsistency.
for example, comma ',' is treated very differently by Hydra in the context of a simple sweep.
In Hydra 0.11 Hydra was delegating to OmegaConf to perform the merge with the command line. Another answer is that OmegaConf currently uses yaml to parse the dicts and lists and it's not nearly good enough.
The motivation to add a new grammar for Hydra started with these use cases.
Well, there is also ${function:p1,p2,...} which is a custom resolver interpolation.
This fits perfectly fine if you consider that both interpolation types you outlined are just like any other primitives and they can be elements of dict or list. |
I did not run it through actual Python code yet so I am not surprised. I am not sure what those mean actually and I was just being extra defensive.
What you have now may work better than the grammar I wrote in 20 minutes, but it's also much more complicated and hard to reason about. Unfortunately I am in the middle of another diff working on the grammar so it will have to wait. In the mean time I think you can make progress on the testing and build integration. |
Thanks for the additional context, it's much clearer now!
I had counted this one as a "simple interpolation" ("anything that starts with
Ok, so actually after looking at some tests it appears that these may already be supported, though as you said Yaml doesn't like them. I thought it wasn't possible because the doc doesn't show an example of this. I guess this point is moot then. |
No problem, that's what I've been doing. I just reached the point where all current |
YAML is a complex thing, it actually has support for JSON as well! it's called Flow style in YAML lingo. # this is json parsed by yaml
[1,2,3]
# this is also fine, even though it's not legal json (no quotes around strings):
[a,b,c]
# But this is not legal flow style:
[${foo}]
# even though this is fine for regular YAML:
bar: ${foo} |
469a2ce
to
0804230
Compare
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Just pushed a major update -- not quite ready for review yet, I need to add more tests and I'll have a few questions as well (will post them later). Also going to make a few more updates to try and get the CircleCI build working. |
I suggest that you don't look closely at the interpolation parsing code (= the grammar, the In the meantime, here are already some questions (in no particular order):
|
26d5a60
to
5374722
Compare
Just pushed more tests, still not complete but this shows most special cases I've come across. Currently these tests are made to pass even if the behavior is debatable (or clearly wrong), just to show how the code is currently behaving. Next steps (next week):
|
Sounds good.
No, most of build_helpers from Hydra is copied from Antelope anyway.
It's a good question. Since we are not really distributing it to end users I think the answer is no.
I don't think so. for now I think each project should keep a copy of those to allow for maximum flexibility.
Yeah, just an historical implementation detail. feel free to change that. I don't think anything is depending on it.
Did you look at the function that raises the exception?
Well, I prefer that it didn't break the caching.
The env resolver is used in the condig: foo: ${env:PASSWORD} I can see value in allowing parsed values here (in fact, it should be parsed using the grammar and decode_primitive should be deleted). @dataclass
class Config:
foo : str = II("{env:password})
Generally, try to keep things as simple as possible. |
[tool.pytest.ini_options] | ||
addopts = "--import-mode=append" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen this. how come it's not needed for Hydra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That surprised me as well. Just looked into it more closely and this is because of this line in setup.py:
entry_points={"pytest11": ["hydra_pytest = hydra.extra.pytest_plugin"]},
Because of this, the hydra
module is imported from its pip install location before pytest
prepends the hydra root folder to sys.path
(this is pytest
's default behavior, see https://docs.pytest.org/en/stable/pythonpath.html). So the tests run against the installed version as intended, but essentially "by luck".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nasty. I was wondering if the difference in behavior had anything to do with it.
That pytest plugin is fairly recent (maybe a couple of months).
How did you initially notice it was running against the local version and not the installed version? shouldn't the local version also have the generated parser code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you initially notice it was running against the local version and not the installed version? shouldn't the local version also have the generated parser code?
No when you run pip install .
the parser code is not generated locally, ony in the installed version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running pip install .
, the parser is generated while making the sdist wheel which in turn is installed.
why do you need the generated code in this scenario? it's installed and available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running
pip install .
, the parser is generated while making the sdist wheel which in turn is installed.
why do you need the generated code in this scenario? it's installed and available.
So, starting from a clean repo, running pip install .
generates the grammar code and installs it somewhere in a path that looks like /Users/odelalleau/opt/miniconda3/envs/py38-omega/lib/python3.8/site-packages/omegaconf
. However, if you look at the content of ./omegaconf/grammar/gen
, this folder is still emtpy.
Without this change to the pytest config, pytest would use ./omegaconf
instead of /Users/odelalleau/opt/miniconda3/envs/py38-omega/lib/python3.8/site-packages/omegaconf
and thus wouldn't be able to find the generated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, starting from a clean repo, running pip install . generates the grammar code and installs it somewhere in a path that looks like /Users/odelalleau/opt/miniconda3/envs/py38-omega/lib/python3.8/site-packages/omegaconf. However, if you look at the content of ./omegaconf/grammar/gen, this folder is still emtpy.
This is what I am expecting.
But for starters, the current logic in nox (master) is just running pytest in the dir.
In Hydra it also pip install .
(or pip install -e .
).
Can you try that and see if eliminates the need for this hack?
(again, trying to control the creeping complexity here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue this thread in #321 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's not. this is a different issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's not. this is a different issue.
I'll follow-up about this comment in the new upcoming PR.
Thanks for all answers! A few follow-ups:
Ok I'll see how to do both (keep caching + maintain backward compatibility).
Ok will do that. |
Tests are still WIP but one thing I'd like to clarify asap is the handling of top-level strings, because it has a significant impact on the design of the grammar, and for now I hacked around something but it definitely needs fixing. By "top-level strings" I mean strings that are not within an interpolation, as in:
Initially I was parsing all strings the same way, regardless of whether they were top-level or within an interpolation. However this meant only a subset of characters were allowed, and one had to escape commas (so for instance ex. 3-4 didn't work unless adding quotes as in ex. 5). My main question is whether we should keep letting people use pretty much any kind of top-level string in string interpolations, of force them to use quotes (as in ex. 5) so as to have the same parsing logic both within and outside of interpolations. The former is more convenient for the user, but the latter makes parsing easier (since top-level is not a special case anymore). |
Once you have some ideas about possible backward compatibility paths let's discuss them.
We can have a few followups for the new grammar. foo:
bar:
- zonk: [a,b,c]
There could also be a few other things that can benefit from the grammar. |
The original grammar you based this on is the command line grammar in Hydra, that has some constraints that are not present in yaml files (or strings in Python). There should generally be no limitations on anything outside of the Quoting interpolations in an interesting question. thinking of it, I don't think it should be possible to bulk-escape an interpolation by quoting. quoting is used in many contexts.
I think escaping the $ should be enough to prevent interpolation expansion: HOME: moonbase_alpha
cmd: python foo.py --home=/${HOME} should be interpreted as
and not as
|
Sounds like a good plan! Regarding the cache mechanism, I intend to convert lists and dicts to tuples to compute the key. I'm not planning to sort the dicts unless you think that'd be better (since some functions' output may depend on the ordering of keys).
Marking that for the future but not planning to add it in first iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
half way through [resolver] Add options to give resolvers access to the config
This matches Hydra's override grammar.
def deps(session, local_install): | ||
session.install("--upgrade", "setuptools", "pip") | ||
session.install("-r", "requirements/dev.txt", ".", silent=True) | ||
extra_flags = ["-e"] if local_install else [] | ||
session.install("-r", "requirements/dev.txt", *extra_flags, ".", silent=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not make the noxfile more complicated if we can get away with it.
these kind of creeping complexity is not good.
Is this also a problem with Hydra?
generally I don't care about the coverage of the generated code at all and I think I am excluding it from the coverage reporting in Hydra.
@@ -43,7 +47,7 @@ def coverage(session): | |||
|
|||
@nox.session(python=PYTHON_VERSIONS) | |||
def lint(session): | |||
deps(session) | |||
deps(session, local_install=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if if you can eliminate the local_mode.
def is_bool(st: str) -> bool: | ||
st = str.lower(st) | ||
return st == "true" or st == "false" | ||
return ret(ValueKind.INTERPOLATION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, once I am done with the step by step review I will make a final review on the combined diff.
the change I am seeing here is changing the meaning of the function which is why I brought it up.
@@ -362,19 +362,23 @@ def resolve_simple_interpolation( | |||
self, | |||
key: Any, | |||
inter_type: str, | |||
inter_key: str, | |||
inter_key: Tuple[Any, ...], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably an existing confusion there.
See if you can clear it up. can you describe how you would split the function and what would be the responsibility of each of the two functions?
omegaconf/omegaconf.py
Outdated
# The `args_as_strings` warning is triggered when the resolver is | ||
# called instead of when it is defined, so as to limit the amount of | ||
# warnings (by skipping warnings when all inputs are strings). | ||
if args_as_strings and any(not isinstance(k, str) for k in key): | ||
non_str_arg = [k for k in key if not isinstance(k, str)][0] | ||
warnings.warn( | ||
f"Resolvers that take non-string arguments should now be registered " | ||
f"with `args_as_strings=False`, and their code should be updated to " | ||
f"ensure it works as expected with non-string arguments. This " | ||
f"warning is raised because resolver '{name}' was registered with " | ||
f"the current default `args_as_strings=True` and received at least " | ||
f"one non-string argument (`{non_str_arg}`). Although we converted " | ||
f"such non-string arguments to strings to preserve backward " | ||
f"compatibility, this behavior is deprecated => please update " | ||
f"resolver '{name}' as described above. Alternatively, you may " | ||
f"ensure that all its arguments are strings, e.g., by enclosing " | ||
f"them within quotes.", | ||
category=UserWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For deprecation warnings I often create dedicated GitHub issues and link to them.
Recent example is issue 1089 in the Hydra repo, take a look.
omegaconf/omegaconf.py
Outdated
new_key: List[Any] = [] # will store the new key elements | ||
hashable_item: Any | ||
for idx, item in enumerate(key): | ||
if item is None or isinstance(item, (int, float, bool, str)): | ||
hashable_item = item | ||
elif isinstance(item, list): | ||
hashable_item = _make_hashable(tuple(item)) | ||
elif isinstance(item, tuple): | ||
hashable_item = _make_hashable(item) | ||
elif isinstance(item, dict): | ||
# We sort the dictionary so that the order of keys does not matter. | ||
hashable_item = _make_hashable(tuple(sorted(item.items()))) | ||
else: | ||
raise NotImplementedError(f"type {type(item)} cannot be made hashable") | ||
new_key.append(hashable_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is complex and bugs in it will result in very subtle behavior changes that will be very hard to identify.
Do you have dedicated tests for it?
[tool.pytest.ini_options] | ||
addopts = "--import-mode=append" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, starting from a clean repo, running pip install . generates the grammar code and installs it somewhere in a path that looks like /Users/odelalleau/opt/miniconda3/envs/py38-omega/lib/python3.8/site-packages/omegaconf. However, if you look at the content of ./omegaconf/grammar/gen, this folder is still emtpy.
This is what I am expecting.
But for starters, the current logic in nox (master) is just running pytest in the dir.
In Hydra it also pip install .
(or pip install -e .
).
Can you try that and see if eliminates the need for this hack?
(again, trying to control the creeping complexity here).
Looks like some new CI checks were added but fail with this branch (https://results.pre-commit.ci/run/github/147219819/1603908169.w-VpUbDkREuPfGTp3sgZ-g) any hint as to where I should look to get it fixed? |
I disabled the pre-commit.ci, it should not appear in the next push. |
As per discussions in omry#321
Great, thanks! I pushed a new version centered around our discussions related to I think it might be easier for you to continue the review if I integrate these changes to older commits. Either starting at d375552 (the commit you stopped half-way through), its parent or its child. Any preference? (or I can leave it like that if you don't mind :) ) |
I missed your commit and your question, sorry. If you can reorganize the changes such that the updates to register_resolver in top (or in place of) the old register_resolver commit it would be easier. If it proves difficult let me know. Since this will require a force push, you might as well also rebase on top of master (there have been numerous changes, hopefully thing that will conflict with your changes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I reviewed the newest 4 diffs. no need to shift them around.
Thanks, as a result I didn't do any rebase, but let me know when it's a good time to rebase on top of master (keeping in mind that it'll invalidate the commit hashes I've been referring to, so better do it once everything else is resolved IMO) |
Okay. ball is back in my court to resume reviewing where I stopped. |
As a result, resolvers are not able anymore to access the parent node, and the env resolver cannot parse node interpolations anymore.
Okay, there have been enough stacked changes that I feel like reviewing in order no longer makes sense. can you create a new PR rebased on master with all the changes in this one squashed? |
Will do, closing this one then! |
New PR with squashed commits is up in #445 (I believe that the commits you hadn't reviewed in order yet were either related to doc, the build system, or trivial, so it was indeed a good time for squashing) I'll do a pass over current discussions in this PR to move them over to the new PR if they still seem relevant. |
Sounds good, thanks! |
This is heavily WIP. Just putting it out there for visibility and to centralize discussions.