Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support merge keys (YAML >> tag), also with interpolation #469

Closed
o-smirnov opened this issue Dec 30, 2020 · 16 comments
Closed

support merge keys (YAML >> tag), also with interpolation #469

o-smirnov opened this issue Dec 30, 2020 · 16 comments

Comments

@o-smirnov
Copy link

Is your feature request related to a problem? Please describe.

I would like to be able to merge one config section into another, and augment the result with new keys. In vanilla YAML, this is straightforwardly accomplished via anchors and the merge key tag. The following piece of code illustrates this:

import yaml
print(yaml.safe_load("""
        a: &A
            x: 1
            y: 2

        b:
            <<: *A
            x: 3
            z: 1
"""))

results in

{'a': {'x': 1, 'y': 2}, 'b': {'x': 1, 'y': 2, 'z': 1}}

Loading the same YAML into omegaconf results in an exception:

from omegaconf import *
conf = OmegaConf.create("""
        a: &A
            x: 1
            y: 2

        b:
            <<: *A
            x: 3
            z: 1
""")
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:merge'

Describe the solution you'd like

Well it would be nice if the vanilla YAML merge key worked. However, this whole anchor/alias business is rather clumsy. Top prize would be if this could work with the interpolation feature, like so:

        a:
            x: 1
            y: 2

        b:
            <<: ${a}
            x: 3
            z: 1

Even better would be if this worked across a merge of configs (similar to what's discussed in #442). I.e.

from omegaconf import *
conf1 = OmegaConf.create("""
        a: 
            x: 1
            y: 2
""")
conf2 = OmegaConf.create("""
        b:
            <<: ${a}
            x: 3
            z: 1
""")
conf3 = OmegaConf.merge(conf1, conf2)

Describe alternatives you've considered
I could use lists with nested mappings in them, and merge the keys in the code. Seems ugly though. I don't see any other way around it.

Additional context
Current onegaconf master, python 3.6.9.

@omry
Copy link
Owner

omry commented Dec 30, 2020

Loading the same YAML into omegaconf results in an exception:

This looks like a bug. we will have to take a closer look.

Well it would be nice if the vanilla YAML merge key worked. However, this whole anchor/alias business is rather clumsy. Top prize would be if this could work with the interpolation feature, like so:

        a:
            x: 1
            y: 2

        b:
            <<: ${a}
            x: 3
            z: 1

This is not happening. pyyaml is handling tag merging at load time. it will not know how to handle with the OmegaConf interpolation.

because pyyaml is handling tags at load time, it cannot support merging across different config files. Same way it cannot support accessing anchors from different config files (this is actually the primary reason OmegaConf have interpolation).

@omry
Copy link
Owner

omry commented Dec 30, 2020

I could use lists with nested mappings in them, and merge the keys in the code. Seems ugly though. I don't see any other way around it.

Your entire bug report does not talk about lists. then you pop them up in the alternative.

@o-smirnov
Copy link
Author

pyyaml is handling tag merging at load time. it will not know how to handle with the OmegaConf interpolation.

OK yeah that should've been obvious if I'd only thought about it for another minute... :)

Let me rephrase the feature request then: is there any conceivable way of combining interpolation with a key merge in OmegeConf? I would like to compose a config section by merging in previously defined sections, then adding or overriding some keys. Exactly what the << tag offers, but with the ability to refer to stuff across files.

Your entire bug report does not talk about lists. then you pop them up in the alternative.

Yeah sorry, that was inspired by your suggestion in #442, and then I elided the explanation... let me backtrack. What I'd really like is to be able to merge two configs:

file1.yml

a:
  x: 1
  y: 2

and file2.yml:

b:
  # include all the keys from "a" somehow, since <<: ${a} can't work
  y: 3
  z: 4

The only workaround I can think of at present is:

b:
 - ${a}
 -
   y: 3
   z: 4

...and then have explicit code in the application itself which loops over the elements of b and merges them into one dict. Which is the part that feels ugly and non-generic.

@omry
Copy link
Owner

omry commented Dec 30, 2020

Having looked a bit, it looks like the yaml loaded OmegaConf is using is messing up with the merge loader. that's not intentional but after spending 30 minutes trying to fix it I am officially parking it. I don't have time for it at the moment.
(This is related to a feature request of OmegaConf failing on duplicate keys. the person suggesting it copy an arcane piece of code from somewhere and it's breaking the merge tag).

Let me rephrase the feature request then: is there any conceivable way of combining interpolation with a key merge in OmegeConf? I would like to compose a config section by merging in previously defined sections, then adding or overriding some keys. Exactly what the << tag offers, but with the ability to refer to stuff across files.

pyyaml doing everything in load times means it's hostile to mixing with things that happens afterwards.
I do not know if there is "any conceivable way", and it's an area I don't really want to spend cycles thinking about. I haven't thought much about yaml merge tag so far.

OmegaConf supports that directly via an API.

cfg = OmegaConf.create({"foo": {"bar" : 10}})
OmegaConf.merge(cfg, {"foo": {"X" :  20}})
# {'foo': {'bar': 10, 'X': 20}}

Or via the somewhat internal merge_with:

cfg = OmegaConf.create({"foo": {"bar" : 10}})
cfg.foo.merge_with({"x":  20})
# {'foo': {'bar': 10, 'x': 20}}

@omry
Copy link
Owner

omry commented Dec 30, 2020

@jgehring , your fix for duplicate keys in #257 is causing an error when using yaml merge tags.
Can you take a look?

@o-smirnov
Copy link
Author

OmegaConf supports that directly via an API

Yeah I was playing with that too... but then it's the application code then that has to decide which sections to merge in. If I go back to the single-file YML example using anchors:

a1: &FOO
  x: 1
  y: 2

a2: &BAR
  x: 'a'
  y: 'b'

b:
  <<: *FOO 
  y: 3

...whether the "base section" for b is FOO or BAR is a config-side option, completely transparent to the application.

I suppose I could have the application scan the whole config for a magic key in each section, and trigger a merge based on its value. Something like:

b:
   _merge: a1  

...would work. Anyway, I just wanted to check if you thought it was simple or worthwhile adding this functionality to the core omegaconf... which from the sound of it it isn't.

@omry
Copy link
Owner

omry commented Dec 30, 2020

If you aren't using it, I suggest you take a look at Hydra.
It supports driving the config composition via the config itself.

Can you please file a separate clean bug report for the tag issue? I think we can close this feature request once you do.

@o-smirnov
Copy link
Author

I did start with Hydra actually, but it wasn't quite right for the job, but that led me to omegaconf, which is elegant, and already practically perfect. Thanks for a very useful library, by the way (I should have started with this)!

It was actually a quick job to add a function to the app code to recursively scan the config and act on "_merge" keys. So this now works for me and I'm very happy with it. Even the syntax feels similar to <<.

a1:
  x: 1

a2:
  y: 2

b:
  _merge: a1, a2
  z: 3 

There's only one wrinkle: iterating over ListConfig and DictConfig.items() causes normal interpolations to be resolved, which is not necessarily desirable at this stage. I found that iterating over _iter_ex(False) and items_ex(False) avoids this. I am, however, a little nervous about relying on such seemingly "internal" functions. Any chance they can be promoted to the official API?

@o-smirnov
Copy link
Author

Bug is #470.

@omry
Copy link
Owner

omry commented Dec 31, 2020

Once you get to do config composition like that you are going to hit some internal functions. I do not plan to promote the special iterators to be a public API at this time but you should feel comfortable relaying on them. (Hydra is also using them).
I do recommend that you use the argument name though (items_ex(resolve=False)).

I would be happy to hear what you think about Hydra 1.1 upcoming support for nested default lists.
It's getting close to being merged to master, in the mean time the docs are mostly ready.
take a look at these and tell me if you feel they would cover your needs:

Keep in mind that the support in Hydra need to meet a very high quality bar (specifically, it needs to play out properly with everything else Hydra is doing and should be very well tested).
The pull request introducing it is probably twice as big as all of OmegaConf.

This is significantly more powerful than your current _merge_ approach in that the choices can be overridden from the outside while composing the config object.
You are of course welcome to use your own solution if you would like (I like the fact that you are exploring in that direction).

@omry omry closed this as completed Dec 31, 2020
@o-smirnov
Copy link
Author

I would be happy to hear what you think about Hydra 1.1 upcoming support for nested default lists.

Looks very powerful... still trying to wrap my head around all of it, so I'll just have to ponder it into next year... If I understand correctly though, the composition in Hydra only happens at the defaults level, right? Subsequent sections can only augment whatever was set up in defaults. If this is correct, then it's a bit of an awkward fit to one of our use cases, where we use YAML to describe a pipeline workflow. The YAML file then is becoming more of a (long) imperative program than a configuration. For this reason, it is preferable to have the composition of each section specified locally within that section, rather than separately up front. If that makes sense.

I think I'm converging on a nice solution with just omegaconf though. And now I see you've even implemented relative interpolation targets, wonderful!

Anyway, thanks for all the help, and down with 2020!

@omry
Copy link
Owner

omry commented Dec 31, 2020

Looks very powerful... still trying to wrap my head around all of it, so I'll just have to ponder it into next year... If I understand correctly though, the composition in Hydra only happens at the defaults level, right? Subsequent sections can only augment whatever was set up in defaults.

The definition of the composition can only happen in the defaults. this is the alternative to having scattered merge magic in config file.

In terms of expressive power, this is similar because it let's you compose into a specific package (the terminology in Hydra for the path in the config).
All the defaults lists (in the primary, and then in included configs and so on) and first converted into a tree structure and then into a single list.
Each item in that final defaults list contains the logical config path of the config (e.g /db/mysql or /server/db/mysql) and the final package to place the config content into.

That single list is used by starting with an empty config and then calling cfg = OmegaConf.merge(cfg, cfg_from_list) for each item. (with the extra complication of embedding the cfg from the list into the final package first).

Another advantage of Hydra is that the config discovery is abstracted from the process. Hydra supports a config search path and configs can be in the file system, in Python packages and also in the config store (which is an in-memory store typically used to store Structured Configs acting as schema).

I think I'm converging on a nice solution with just omegaconf though. And now I see you've even implemented relative interpolation targets, wonderful!

Yup, this makes interpolation more practical in the presence of dynamic composition :).
Keep in mind that 2.1 is not officially out yet and there are a handful of small breaking changes there.
You can see them in the news directory.
You can install the dev release though (dev releases are not pretending to have a stable API across their versions).

Anyway, thanks for all the help, and down with 2020!

Happy new year!
I don't want to jinx 2021 by saying 2020 was a pesky pesky year :).

@gchhablani
Copy link

gchhablani commented Mar 7, 2021

Hi @omry,

Is there a way to use the interpolation in the same config file, with merge? I am unable to find the correct syntax. What I want is the omegaconf syntax for regular YAML merge. I am using interpolations in the same file.

Example:

main_config: &main
  seed: 42
  device: cuda
train_config:
  <<: *main
  epochs: 2
  optimizer: adam

I'll appreciate any help with this.

Thanks,
Gunjan

@omry
Copy link
Owner

omry commented Mar 7, 2021

yaml merge is a load time feature and interpolation is a runtime feature.
Those two features are generally incompatible.

The only equivalent to yaml merge is OmegaConf.merge(), which is an API you have to call in your code.
If you are looking to automate config composition look at Hydra (https://hydra.cc).

@gchhablani
Copy link

Hi @omry,

Thanks a lot. I built omegaconf from source, and now I am facing an issue with variable interpolation (it doesn't work), even when there is no merge syntax used. Is this expected?

@omry
Copy link
Owner

omry commented Mar 8, 2021

Since it has nothing at all to do with this issue, can you not hijack it?
If you think you found a bug submit as new issue with a minimal repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants