Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make config glob paths configurable by the user #1363

Closed
datajoely opened this issue Mar 23, 2022 · 9 comments
Closed

Make config glob paths configurable by the user #1363

datajoely opened this issue Mar 23, 2022 · 9 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@datajoely
Copy link
Contributor

datajoely commented Mar 23, 2022

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

Fix #1210

Today catalog config is loaded via the following paths (this also applies the same to parameters):

conf_catalog = conf_loader.get('catalog*', 'catalog*/**')

conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")

These are hardcoded in the library and the user of a project cannot change without getting creative. I think there would be a lot of user value (1) making these configurable (2) being transparent with how this works.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

  • It feels restrictive to stop users controlling this when so much of the framework is configurable
  • We have evidence of users want to do _catalog suffixes (as opposed to prefixes) which aren't supported by the default pattern

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

  • In 0.18.x it may make sense to expose in settings.py
@datajoely datajoely added the Issue: Feature Request New feature or improvement to existing feature label Mar 23, 2022
@merelcht
Copy link
Member

Wouldn't this be possible by using a custom ConfigLoader, which the users can do in settings.py from 0.18.0?

@idanov
Copy link
Member

idanov commented Mar 23, 2022

@datajoely As @MerelTheisenQB pointed out, this is already configurable through a custom ConfigLoader in settings.py, so closing the issue.

@idanov idanov closed this as completed Mar 23, 2022
@datajoely
Copy link
Contributor Author

So I actually disagree here - I think it's highly likely that a novice user would like to control this without the added complexity of the creating a custom class.

The smallest version of this I can imagine involves inheriting from the config loader you were already using, and overriding _get_config_credentials(self) and the params(self) which is type @property. I really think this is prohibitively complex for novices.

What about adding the defaults as constructor arguments?

@yetudada
Copy link
Contributor

I have a query, why do users want _catalog suffixes and how many users does this affect?

@antonymilne
Copy link
Contributor

antonymilne commented Mar 23, 2022

I think there's a misunderstanding here that this is do with a custom ConfigLoader - as far as I can tell, this requires a custom KedroContext. Since the calls like config_loader.get("catalog*", "catalog*/**", "**/catalog*") are buried inside context.py, you would actually need to make a custom KedroContext to change these patterns, and as @datajoely points out even then they're not easy to change.

As a first step I think it's a very good idea to make these methods easier to override by a user writing a custom KedroContext. When we refactored the ConfigLoader we deliberately exposed some methods like _build_conf_paths so that users can easily modify them. I think we should do the same for _get_catalog_path, _get_params_path, etc. in KedroContext (or make the patterns into class attributes). These are easy changes to make and would make kedro easier to understand and more extensible.

@DebanjanBanerjeeQB
Copy link

DebanjanBanerjeeQB commented Mar 23, 2022

Was chatting with @datajoely on our slack , we have client folks who are building pipelines on kedro for the first time (many of them) . There was a guy who built a very small pipeline on first go and the only error he was getting was because he named the catalogs like raw_catalog.yml instead of catalog_raw.yml. Issue persists with parameters* too

If you think of it ,both names make right amount of sense and i could not find an argument on why the latter is right and the former is incorrect. Ideally we can make it configurable with only *catalog* to be present in the YAML filename that tells Kedro where to look.

Also , agree with Joel , we are talking about client DEs who before 2 months ago were building drag and drop pipelines on informatica and people who used to maintain codebases on shared folders :) so not sure how much we can ask them to look into a custom ConfigLoader for a very small naming convention challenge.

WDYT ?

@DebanjanBanerjeeQB
Copy link

Basis the above discussion , do we see a benefit in reopening and working on this issue ? Let me know thx!

@antonymilne
Copy link
Contributor

antonymilne commented Mar 24, 2022

I'm going to reopen this because I think it was closed on the misunderstanding that this is already possible through a custom ConfigLoader, which as far as I can tell it's not. It's really about a custom KedroContext.

This is a request I've heard several times before (or at least the question of "what are the patterns kedro looks for in finding catalog/params"), and I think it makes a lot of sense to make this more transparent and configurable.

I'm not sure exactly how far we should go into making these patterns exposed to the user (e.g. whether we should make them configurable in settings.py, whether CONTEXT_ARGS should be configurable), but given that CONTEXT_CLASS is already exposed in settings.py it would be very easy to make these patterns configurable simply by making them class properties/methods than can be directly overridden rather than buried inside other methods.

This won't immediately meet @DebanjanBanerjeeQB's ask that we shouldn't need to make a custom class just for a small change to naming convention, since you would still need to make a custom KedroContext class to change the patterns. But at least it would make configuring these patterns possible. After that we can consider whether users should be able to change them without needing to make a custom KedroContext.

@merelcht
Copy link
Member

This was done in #1870 and #1864. Docs will be added in #1462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

6 participants