Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make core config accessible in dict get way #1870

Merged
merged 16 commits into from
Oct 11, 2022
Merged

Conversation

merelcht
Copy link
Member

@merelcht merelcht commented Sep 23, 2022

Description

#1863

Development notes

  • Changed AbstractConfigLoader to inherit from UserDict
  • Moved regex patterns for core config locations from KedroContext to ConfigLoader and TemplatedConfigLoader
  • Core config patterns are stored in a dictionary so they can be accessed with dict syntax
  • config_patterns is added as an argument to the ConfigLoader and TemplatedConfigLoader

To discuss

Is allowing users to access config in this way something we want for the future 0.19.0 version as well? How does this allow for using non file based config, e.g stored in a database?

Discussion summary

  • This change opens up to ability for users to change the config file patterns.
  • The pattern paths are added to the ConfigLoader and TemplatedConfigLoader and not to the AbstractConfigLoader to not make add assumptions about file based config in the base abstract class.
  • This also means that (in the future) users will be able to point to a database. The implementation of reading from databases would not happen on the Kedro side: it's not possible for us to build functionality for all possible databases that users have. However, the improvements to ConfigLoaderwill enable users to implement their own DB access so they can load their config.
  • In 0.19.0 we will remove the custom .get() method and rely on OmegaConf for loading the actual config.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@merelcht merelcht marked this pull request as ready for review September 23, 2022 15:36
Signed-off-by: Merel Theisen <[email protected]>
@merelcht merelcht self-assigned this Sep 26, 2022
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of this change and it should make it easier towards customisable path for all configurations ⭐️.

This is just something always in my mind since this question almost come up every week and I always has to point people to make a copy & paste CustomTemplateConfigLoader. Would be great if we can do something about it while we are making changes to these classes. This is definitely out of scope of this PR, but I just want bring it out so we can see if there are any opportunity to improve it.
#1527

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr: I think this is a good start, definitely an improvement or where things are now and probably fine to merge after thinking about the inheritance a bit more, but I'm not sure exactly how the next step will work. I'd feel more confident about it if @idanov could weight in with his thoughts!

I think there were a few major requirements here:

  • enable access through getitem syntax:config_loader["params"] etc. ✅
  • change all our .get calls to __getitem__
  • maintain backwards compatibility so that 3rd party .get calls don't need to change ✅

So I think that in basic terms we've achieved what we wanted to here.

But I'm left with several questions about exactly how it should be done:

  1. what should AbstractConfigLoader inherit from? IMO at the moment it should just be ABC still. I would personally leave the dictionary inheritance for the future if and when it becomes a purely dict-like object
  2. where should the __getitem__ definition go - abstract config loader or the implementations? At the moment it's just copied and pasted between the implementations which seems a bit unDRY.
  3. likewise do core_config_patterns belong in the abstract class or the implementations?
  4. is the intention to remove .get entirely in the future?

tbh these points are all kind of moot though given that it's not at all clear what the abstract config loader is actually for currently...

More practically, what is actually the expected user journey following this change if I want to (a) add spark config and (b) add *parameters as a glob pattern? This is #1864 but it feels a bit hard to judge the proposed change here without understanding how that would work. Currently I think it would be something like this:

class MyConfigLoader(ConfigLoader):
    def __init__(...):
         super().__init__(...)
         self.core_config_patterns.append("*parameters")
         self.core_config_patterns["spark"] = ["spark*", ...]
         # really we should have another object that's not called "core" to add spark to

I believe @idanov's plan was instead of doing class inheritance to make this patterns dictionary directly accessible in settings.py. But I don't quite see how that works. Would we then want to pass the patterns into the constructor of ConfigLoader? And how would that eventually enable different sort of config, e.g. database vs. file? We don't need to have it all worked out now, but I'd personally feel more comfortable with this change if I could see better what the ultimate consequence will look like.

@merelcht
Copy link
Member Author

As always, thank you for your thorough comments and feedback @AntonyMilneQB ⭐

But I'm left with several questions about exactly how it should be done:

  1. what should AbstractConfigLoader inherit from? IMO at the moment it should just be ABC still. I would personally leave the dictionary inheritance for the future if and when it becomes a purely dict-like object

Yes I think that would make sense, see my earlier comment: #1870 (comment)

  1. where should the __getitem__ definition go - abstract config loader or the implementations? At the moment it's just copied and pasted between the implementations which seems a bit unDRY.
  2. likewise do core_config_patterns belong in the abstract class or the implementations?

These are good questions, my feeling is it should be in the loader implementations because it's too specific to be in the abstract class. As in the mapping from "params" to the parameter regexes isn't something the abstract implementation should care about or anyone inheriting from it. But I also agree that it's not clear what the role of the AbstractConfigLoader is anymore..

  1. is the intention to remove .get entirely in the future?

Yes as far as I understand it, and instead just leverage the dict get methods.

More practically, what is actually the expected user journey following this change if I want to (a) add spark config and (b) add *parameters as a glob pattern? This is #1864 but it feels a bit hard to judge the proposed change here without understanding how that would work. Currently I think it would be something like this:

class MyConfigLoader(ConfigLoader):
    def __init__(...):
         super().__init__(...)
         self.core_config_patterns.append("*parameters")
         self.core_config_patterns["spark"] = ["spark*", ...]
         # really we should have another object that's not called "core" to add spark to

I believe @idanov's plan was instead of doing class inheritance to make this patterns dictionary directly accessible in settings.py. But I don't quite see how that works. Would we then want to pass the patterns into the constructor of ConfigLoader? And how would that eventually enable different sort of config, e.g. database vs. file? We don't need to have it all worked out now, but I'd personally feel more comfortable with this change if I could see better what the ultimate consequence will look like.

I coded this up in an older (less clean) branch: https://github.com/kedro-org/kedro/compare/abstract-config-loader, but essentially you would have something like this in settings.py:

CONFIG_LOADER_CLASS = ConfigLoader
CONFIG_LOADER_ARGS = {
   "custom_patterns":{"spark" : ["spark*/"]}
}

and inside the ConfigLoader

 if custom_patterns:
    core_patterns.update(custom_patterns)
 self.config_patterns = core_patterns

So you can add whatever custom config you have as well as overwrite any of the "core" config locations.

@noklam
Copy link
Contributor

noklam commented Sep 28, 2022

  1. what should AbstractConfigLoader inherit from? IMO at the moment, it should just be ABC still. I would personally leave the dictionary inheritance for the future if and when it becomes a purely dict-like object

I just want to point out that with this PR, we already assume config_loader is a dict-like object. If we are going with ABC, then we should also add __getitem__ as the class method.

@MerelTheisenQB @AntonyMilneQB
Example:

# context.py
 params = self.config_loader["parameters"]

This won't work if a user inherit AbstractConfigLoader and implement get but not __getitem__. With that said, i may prefer ABC + get + __getitem__ to define clearly what exactly is need. dict or UserDict is assuming more than we need.

@merelcht
Copy link
Member Author

  1. what should AbstractConfigLoader inherit from? IMO at the moment, it should just be ABC still. I would personally leave the dictionary inheritance for the future if and when it becomes a purely dict-like object

I just want to point out that with this PR, we already assume config_loader is a dict-like object. If we are going with ABC, then we should also add __getitem__ as the class method.

@MerelTheisenQB @AntonyMilneQB Example:

# context.py
 params = self.config_loader["parameters"]

This won't work if a user inherit AbstractConfigLoader and implement get but not __getitem__. With that said, i may prefer ABC + get + __getitem__ to define clearly what exactly is need. dict or UserDict is assuming more than we need.

Yes you're right. I think for the time being though we should not get too stuck on "if users inherit from AbstractConfigLoader", because from the interviews it became clear people aren't using the AbstractConfigLoader. It won't give you a usable configloader if you inherit from it currently.

Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, such a small code change enabling so much extra functionality!

Could you add a very short description of this to the RELEASE.md?

@merelcht merelcht requested a review from idanov October 10, 2022 10:20
@merelcht merelcht requested a review from noklam October 10, 2022 10:20
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! I think the development notes needed to be updated. I would add that the config_pattern is now an argument, although it's not configurable in settings.py yet. Maybe we need to update the RELEASE.md too?

Signed-off-by: Merel Theisen <[email protected]>
Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Should we add a small section in the docs to show how to change file patterns through settings.py?

@merelcht
Copy link
Member Author

Awesome! Should we add a small section in the docs to show how to change file patterns through settings.py?

I was planning on doing this in the follow up tasks:
#1864
#1462

@merelcht merelcht merged commit 997692a into main Oct 11, 2022
@merelcht merelcht deleted the feat/access-config-dict-way branch October 11, 2022 12:41
AhdraMeraliQB pushed a commit that referenced this pull request Oct 21, 2022
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
nickolasrm pushed a commit to ProjetaAi/kedro that referenced this pull request Oct 26, 2022
AhdraMeraliQB added a commit that referenced this pull request Nov 9, 2022
* Release/0.18.3 (#1856)

* Update release version and release notes

Signed-off-by: Nok Chan <[email protected]>

* Update missing release notes

Signed-off-by: Nok Chan <[email protected]>

* update vresion

Signed-off-by: Nok Chan <[email protected]>

* update release notes

Signed-off-by: Nok Chan <[email protected]>

Signed-off-by: Nok Chan <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Remove comment from code example

Signed-off-by: Ahdra Merali <[email protected]>

* Remove more comments

Signed-off-by: Ahdra Merali <[email protected]>

* Add YAML formatting

Signed-off-by: Ahdra Merali <[email protected]>

* Add missing import

Signed-off-by: Ahdra Merali <[email protected]>

* Remove even more comments

Signed-off-by: Ahdra Merali <[email protected]>

* Remove more even more comments

Signed-off-by: Ahdra Merali <[email protected]>

* Add pickle requirement to extras_require

Signed-off-by: Ahdra Merali <[email protected]>

* Try fix YAML docs

Signed-off-by: Ahdra Merali <[email protected]>

* Try fix YAML docs pt 2

Signed-off-by: Ahdra Merali <[email protected]>

* Fix code snippets in docs (#1876)

* Fix code snippets

Signed-off-by: Ahdra Merali <[email protected]>

* Separate code blocks

Signed-off-by: Ahdra Merali <[email protected]>

* Lint

Signed-off-by: Ahdra Merali <[email protected]>

Signed-off-by: Ahdra Merali <[email protected]>

* Fix issue with specifying format for SparkHiveDataSet (#1857)

Signed-off-by: jstammers <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update RELEASE.md (#1883)

* Update RELEASE.md

* fix broken link

* Update RELEASE.md

Co-authored-by: Merel Theisen <[email protected]>

Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Deprecate `kedro test` and `kedro lint` (#1873)

* Deprecating `kedro test` and `kedro lint`

Signed-off-by: Nok Chan <[email protected]>

* Deprecate commands

Signed-off-by: Nok Chan <[email protected]>

* Make kedro looks prettier

* Update Linting

Signed-off-by: Nok <[email protected]>

Signed-off-by: Nok Chan <[email protected]>
Signed-off-by: Nok <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Fix micro package pull from PyPI (#1848)

Signed-off-by: Florian Gaudin-Delrieu <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update Error message for `VersionNotFoundError` to handle Permission related issues better (#1881)

* Update message for VersionNotFoundError

Signed-off-by: Ankita Katiyar <[email protected]>

* Add test for VersionNotFoundError for cloud protocols

* Update test_data_catalog.py

Update NoVersionFoundError test

* minor linting update

* update docs link + styling changes

* Revert "update docs link + styling changes"

This reverts commit 6088e00.

* Update test with styling changes

* Update RELEASE.md

Signed-off-by: ankatiyar <[email protected]>

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: ankatiyar <[email protected]>
Co-authored-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update experiment tracking documentation with working examples (#1893)

Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Add NHS AI Lab and ReSpo.Vision to companies list (#1878)

Signed-off-by: Ahdra Merali <[email protected]>

* Document how users can use pytest instead of kedro test (#1879)

* Add best_practices.md with introductory sections

Signed-off-by: Jannic Holzer <[email protected]>

* Add pytest and pytest-cov sections

Signed-off-by: Jannic Holzer <[email protected]>

* Add pytest-cov coverage report

Signed-off-by: Jannic Holzer <[email protected]>

* Add sections on pytest-cov

Signed-off-by: Jannic Holzer <[email protected]>

* Add automated_testing to index.rst

Signed-off-by: Jannic Holzer <[email protected]>

* Reformat third-party library names and clean grammar.

Signed-off-by: Jannic Holzer <[email protected]>

* Add link to virtual environment docs

Signed-off-by: Jannic Holzer <[email protected]>

* Add example of good test naming

Signed-off-by: Jannic Holzer <[email protected]>

* Improve link accessibility

Signed-off-by: Jannic Holzer <[email protected]>

* Improve pytest docs link accessibility

Signed-off-by: Jannic Holzer <[email protected]>

* Add reminder link to virtual environment docs

Signed-off-by: Jannic Holzer <[email protected]>

* Fix formatting in link to coverage docs

Signed-off-by: Jannic Holzer <[email protected]>

* Remove reference to /src under 'Run your tests'

Signed-off-by: Jannic Holzer <[email protected]>

* Modify references to <project_name> to <package_name>

Signed-off-by: Jannic Holzer <[email protected]>

* Fix sentence structure

Signed-off-by: Jannic Holzer <[email protected]>

* Fix broken databricks doc link

Signed-off-by: Jannic Holzer <[email protected]>

Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Capitalise Kedro-Viz in the "Visualize layers" section (#1899)

* Capitalised kedro-viz

Signed-off-by: yash6318 <[email protected]>

* capitalised Kedro viz

Signed-off-by: yash6318 <[email protected]>

* Updated set_up_experiment_tracking.md

Co-authored-by: Deepyaman Datta <[email protected]>
Signed-off-by: yash6318 <[email protected]>

Signed-off-by: yash6318 <[email protected]>
Co-authored-by: Deepyaman Datta <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Fix linting on autmated test page (#1906)

Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Add _SINGLE_PROCESS property to CachedDataSet (#1905)

Signed-off-by: Carla Vieira <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update the tutorial of "Visualise pipelines" (#1913)

* Change a file extention to match the previous article

Signed-off-by: dinotuku <[email protected]>

* Add a missing import

Signed-off-by: dinotuku <[email protected]>

* Change both preprocessed datasets to parquet files

Signed-off-by: dinotuku <[email protected]>

* Change data type to ParquetDataSet for parquet files

Signed-off-by: dinotuku <[email protected]>

* Add a note for installing seaborn if it is not installed

Signed-off-by: dinotuku <[email protected]>

Signed-off-by: dinotuku <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Document how users can use linting tools instead of `kedro lint` (#1904)

* Add documentation for linting tools

Signed-off-by: Ankita Katiyar <[email protected]>

* Revert changes to commands_reference.md

Signed-off-by: Ankita Katiyar <[email protected]>

* Update linting docs with suggestions

Signed-off-by: Ankita Katiyar <[email protected]>

* Update linting doc

Signed-off-by: Ankita Katiyar <[email protected]>

Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Make core config accessible in dict get way  (#1870)

Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Create dependabot.yml configuration file for version updates (#1862)

* Create dependabot.yml configuration file

* Update dependabot.yml

Signed-off-by: SajidAlamQB <[email protected]>

* add target-branch

Signed-off-by: SajidAlamQB <[email protected]>

* Update dependabot.yml

Signed-off-by: SajidAlamQB <[email protected]>

* limit dependabot to just dependency folder

Signed-off-by: SajidAlamQB <[email protected]>

* Update test_requirements.txt

Signed-off-by: SajidAlamQB <[email protected]>

* Update MANIFEST.in

Signed-off-by: SajidAlamQB <[email protected]>

* fix e2e

Signed-off-by: SajidAlamQB <[email protected]>

* Update continue_config.yml

Signed-off-by: SajidAlamQB <[email protected]>

* Update requirements.txt

Signed-off-by: SajidAlamQB <[email protected]>

* Update requirements.txt

Signed-off-by: SajidAlamQB <[email protected]>

* fix link

Signed-off-by: SajidAlamQB <[email protected]>

* revert

Signed-off-by: SajidAlamQB <[email protected]>

* Delete requirements.txt

Signed-off-by: SajidAlamQB <[email protected]>

Signed-off-by: SajidAlamQB <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update dependabot config (#1928)

Signed-off-by: Ahdra Merali <[email protected]>

* Update robots.txt (#1929)

Signed-off-by: Ahdra Merali <[email protected]>

* fix broken link (#1950)

Signed-off-by: Ahdra Merali <[email protected]>

* Update dependabot.yml config  (#1938)

* Update dependabot.yml

Signed-off-by: SajidAlamQB <[email protected]>

* pin jupyterlab_services to requirments

Signed-off-by: SajidAlamQB <[email protected]>

* lint

Signed-off-by: SajidAlamQB <[email protected]>

Signed-off-by: SajidAlamQB <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Update setup.py Jinja2 dependencies (#1954)

Signed-off-by: Ahdra Merali <[email protected]>

* Update pip-tools requirement from ~=6.5 to ~=6.9 in /dependency (#1957)

Updates the requirements on [pip-tools](https://github.com/jazzband/pip-tools) to permit the latest version.
- [Release notes](https://github.com/jazzband/pip-tools/releases)
- [Changelog](https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md)
- [Commits](jazzband/pip-tools@6.5.0...6.9.0)

---
updated-dependencies:
- dependency-name: pip-tools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Ahdra Merali <[email protected]>

* Update toposort requirement from ~=1.5 to ~=1.7 in /dependency (#1956)

Updates the requirements on [toposort]() to permit the latest version.

---
updated-dependencies:
- dependency-name: toposort
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sajid Alam <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Add deprecation warning to package_name argument in session create() (#1953)

Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Remove redundant `resolve_load_version` call (#1911)

* remove a redundant function call

Signed-off-by: Nok Chan <[email protected]>

* Remove redundant resolove_load_version & fix test

Signed-off-by: Nok Chan <[email protected]>

* Fix HoloviewWriter tests with more specific error message pattern & Lint

Signed-off-by: Nok Chan <[email protected]>

* Rename tests

Signed-off-by: Nok Chan <[email protected]>

Signed-off-by: Nok Chan <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>

* Make docstring in test starter match real starters (#1916)

Signed-off-by: Ahdra Merali <[email protected]>

* Try to fix formatting error

Signed-off-by: Merel Theisen <[email protected]>

* Specify pickle import

Signed-off-by: Nok Chan <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: jstammers <[email protected]>
Signed-off-by: Nok <[email protected]>
Signed-off-by: Florian Gaudin-Delrieu <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: ankatiyar <[email protected]>
Signed-off-by: Merel Theisen <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: yash6318 <[email protected]>
Signed-off-by: Carla Vieira <[email protected]>
Signed-off-by: dinotuku <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: SajidAlamQB <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Nok <[email protected]>
Co-authored-by: Jimmy Stammers <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Co-authored-by: Florian Gaudin-Delrieu <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Yetunde Dada <[email protected]>
Co-authored-by: Jannic <[email protected]>
Co-authored-by: Yash Agrawal <[email protected]>
Co-authored-by: Deepyaman Datta <[email protected]>
Co-authored-by: Carla Vieira <[email protected]>
Co-authored-by: Kuan Tung <[email protected]>
Co-authored-by: Sajid Alam <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Merel Theisen <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make core config accessible in dict get way (e.g. config_loader["catalog"])
4 participants