-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration to be contributed by providers #32604
Conversation
Some tests will still fail but I wanted to put that one up for review. And if you think that this one is huge, then well, yes it is. But worry not - in case we agree it is too huge I have a plan to split that one into a few smaller ones. I just wanted to show what is the target we are heading to - because some of the smaller PRs are better explained seeing the target. This is something I've attempted already 3 times I think, and I think this time I succeeded - refactoring the way how configuration is read and generated, and converting our config.yml to be the single source of truth - all that while making it possible for providers to contribute their own config. I hope this PR once merged, will remove a LOT of confusion on how our configuration is read for "production" and "test" mode. The main change is to allow to contribute configuration by provider (via config: section in provider.yaml) - but there are many side-effects Among the side-effects of this change:
The code is - I think - much more readable, understandable. more type-hinted, intentions of the code are documented with docstrings and any "special cases" are also documented with comments - why we do things in some strange ways. I removed/replaced quite a bit of "strange" code that seemed to be implemented as band-aid-for-band-aid. I think I got 100% backwards compatibility |
Very much that. We are hit rather small-ish way (pymssql on ARM) but this is a mayhem for the whole Python ecosystem it seems. |
551167e
to
d174fbf
Compare
d174fbf
to
ede1803
Compare
The changes implemented: * provider.yaml files for providers can optionally contribute extra configuration, the configuration is exposed via "get_provider_info" entrypoint, thus allowing Airflow to discover the configuration from both - sources (in Breeze and local development) and from installed packages * Provider configuraitions are lazily loaded - only for commands that actually need them * Documentation for configuration contributed by providers is generated as part of Provider documentation. It is also discoverable by having a "core-extension" page displaying all community providers that contribute their own configuration. * Celery configuration (and in the future Kubernetes configuration) is linked directly from the airflow documentation - the providers are preinstalled, which means that celery (and Kubernetes in the future) configuration is considered as important to be directly mentioned and linked from the core. Similarly Celery and Kubernetes executor documentation remains in the core documentation (still configuration options are detailed only in the provider documentation and only linked from the core. * configuration writing happens in "main" not in the configuration initialization and we will always execute provider configuration initialization. This will make sure that the generated configuration will contain configuration for the providers as well. * Related documentation about custom and community providers have been updated and somewhat refactored - I realized that some of it was quite out-of-date and some of it was really "developer" not user docs. The docs are restructured a bit, cleaned, missing information is added and old/irrelevant parts removed. Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
f98aacc
to
9e6508f
Compare
The apache#32604 moved initialization of airflow config to after config initialization but webserver config is still in initialization part. Previously when the AIRFLOW_HOME folder was missing, it was created during config writing but it needs to be created now before webserver config is written.
The #32604 moved initialization of airflow config to after config initialization but webserver config is still in initialization part. Previously when the AIRFLOW_HOME folder was missing, it was created during config writing but it needs to be created now before webserver config is written.
The Celery Executor tests in Helm started to fail after the configuration migration has been merged (apache#32604). The PR did not have "full tests needed" label and it skipped K8S tests because there was no change related to kubernetes (but some fundamental changes in how configuration were retrieved caused the Celery Executor failed on missing default configuration value. The change adds ProvidersManager configuration initialization when executors are started in order to fix the problem temporarily, however there is an ongoing effort to optimise the path of retrieving provider configuration without having to initialize all provider's configuration and those lines will be removed when it happens.
The Celery Executor tests in Helm started to fail after the configuration migration has been merged (#32604). The PR did not have "full tests needed" label and it skipped K8S tests because there was no change related to kubernetes (but some fundamental changes in how configuration were retrieved caused the Celery Executor failed on missing default configuration value. The change adds ProvidersManager configuration initialization when executors are started in order to fix the problem temporarily, however there is an ongoing effort to optimise the path of retrieving provider configuration without having to initialize all provider's configuration and those lines will be removed when it happens.
* Allow configuration to be contributed by providers The changes implemented: * provider.yaml files for providers can optionally contribute extra configuration, the configuration is exposed via "get_provider_info" entrypoint, thus allowing Airflow to discover the configuration from both - sources (in Breeze and local development) and from installed packages * Provider configuraitions are lazily loaded - only for commands that actually need them * Documentation for configuration contributed by providers is generated as part of Provider documentation. It is also discoverable by having a "core-extension" page displaying all community providers that contribute their own configuration. * Celery configuration (and in the future Kubernetes configuration) is linked directly from the airflow documentation - the providers are preinstalled, which means that celery (and Kubernetes in the future) configuration is considered as important to be directly mentioned and linked from the core. Similarly Celery and Kubernetes executor documentation remains in the core documentation (still configuration options are detailed only in the provider documentation and only linked from the core. * configuration writing happens in "main" not in the configuration initialization and we will always execute provider configuration initialization. This will make sure that the generated configuration will contain configuration for the providers as well. * Related documentation about custom and community providers have been updated and somewhat refactored - I realized that some of it was quite out-of-date and some of it was really "developer" not user docs. The docs are restructured a bit, cleaned, missing information is added and old/irrelevant parts removed. Co-authored-by: Jed Cunningham <[email protected]> * Update airflow/configuration.py Co-authored-by: Jed Cunningham <[email protected]> --------- Co-authored-by: Jed Cunningham <[email protected]>
The changes implemented:
provider.yaml files for providers can optionally contribute extra
configuration, the configuration is exposed via "get_provider_info"
entrypoint, thus allowing Airflow to discover the configuration
from both - sources (in Breeze and local development) and from
installed packages
Provider configuraitions are lazily loaded - only for commands that
actually need them
Documentation for configuration contributed by providers is
generated as part of Provider documentation. It is also discoverable
by having a "core-extension" page displaying all community providers
that contribute their own configuration.
Celery configuration (and in the future Kubernetes configuration) is
linked directly from the airflow documentation - the providers are
preinstalled, which means that celery (and Kubernetes in the future)
configuration is considered as important to be directly mentioned
and linked from the core. Similarly Celery and Kubernetes executor
documentation remains in the core documentation (still configuration
options are detailed only in the provider documentation and only
linked from the core.
configuration writing happens in "main" not in the configuration
initialization and we will always execute provider configuration
initialization. This will make sure that the generated configuration
will contain configuration for the providers as well.
Related documentation about custom and community providers have been
updated and somewhat refactored - I realized that some of it was quite
out-of-date and some of it was really "developer" not user docs.
The docs are restructured a bit, cleaned, missing information is
added and old/irrelevant parts removed.