-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NeptuneHook
and operators
#32738
Conversation
* Add Executors discovery and documentation The Executors can now be added via providers. This PR adds mechanism of discovering the executors via Providers Manager, exposing them via CLI and documenting in core-extensions. * Update scripts/in_container/verify_providers.py
We would like to use the config.yml approach as our main source of truth for airflow configuration. So far it has been split amongst multiple files: * confg.yml -> descroption * default_airflow.cfg -> JINJA template to keep cofiguration, examples and description (used to generate airflow.cfg when airflow starts) * default_test.cfg -> storing test-only configuration used in some test cases - used to generate unittest.cfg * scripts/in_container/airflow_ci.cfg where dCI-specific configuration overwrote the unittest.cfg This change consolidates it all into unified appraoch, where all configuration information is retrieved from .yml files stored in config_templates. No more additional template files processed by JINJA, no more special CI versions of it, no more unittestdb.cfg file where such configuration would be generated, no more unittestdb to be used separately for tests. * The default_*.cfg files were not real configuration files, becuase they were really JINJA templates and it got peoeple confused when copying the files. This change leaves the file empty with the comment that instructs the user how they can get the default configuration. * The default_airflow.cfg is gone and instead, we have a way to show defaults via `airflow config list --defaults` command * Unittest config is generated on-the-flight using defaults stored in confg_templates. constituing a single place where they need to be changed for the tests to use them * internally, description of the configuration is stored in AirflowConfigurationParser and used by the parser to generate the default configuration when needed. * we got rid of `{{{{` in templated config defaults by marking the templates with ``is_template`` and getting rid of processing those entries with regular formatting when generating the default values. This only concerns defaults from config.yml. Reading those configuration entries from file is unaffected. This change aims to be 100% backwards compatible with the previous implementation when it comes to functionality, even if internals changed. It also does not add provider-specific changes that are coming separately. The only changes visible to the user are: * generated airflow.cfg is slightly more readable and displays names of variables that can be used to override each configuration (which is very useful for copy&pasting) * user are advised, instead of copying the default_airflow.cfg to use `airflow config list --defaults` to generate production config. This configuration has all the entries commented out, so that they can selectively uncomment and change the values they want. This is now promoted as "best practice" in the documentation.
…pache#32514) * build(pre-commit): add list of supported deferrable operators to doc * docs(providers): move providers list to apache-airflow-providers core-extension
* Allow configuration to be contributed by providers The changes implemented: * provider.yaml files for providers can optionally contribute extra configuration, the configuration is exposed via "get_provider_info" entrypoint, thus allowing Airflow to discover the configuration from both - sources (in Breeze and local development) and from installed packages * Provider configuraitions are lazily loaded - only for commands that actually need them * Documentation for configuration contributed by providers is generated as part of Provider documentation. It is also discoverable by having a "core-extension" page displaying all community providers that contribute their own configuration. * Celery configuration (and in the future Kubernetes configuration) is linked directly from the airflow documentation - the providers are preinstalled, which means that celery (and Kubernetes in the future) configuration is considered as important to be directly mentioned and linked from the core. Similarly Celery and Kubernetes executor documentation remains in the core documentation (still configuration options are detailed only in the provider documentation and only linked from the core. * configuration writing happens in "main" not in the configuration initialization and we will always execute provider configuration initialization. This will make sure that the generated configuration will contain configuration for the providers as well. * Related documentation about custom and community providers have been updated and somewhat refactored - I realized that some of it was quite out-of-date and some of it was really "developer" not user docs. The docs are restructured a bit, cleaned, missing information is added and old/irrelevant parts removed. Co-authored-by: Jed Cunningham <[email protected]> * Update airflow/configuration.py Co-authored-by: Jed Cunningham <[email protected]> --------- Co-authored-by: Jed Cunningham <[email protected]>
Publishing the documentation | ||
-------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at least one of the reasons the build-docs is failing; the line needs to be the same length as the heading.
Publishing the documentation | |
-------------------------- | |
Publishing the documentation | |
---------------------------- |
@@ -462,6 +462,51 @@ Those are all available flags of ``build-docs`` command: | |||
:width: 100% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file don't feel related to me. Am I missing something, or should this be a separate PR?
|
||
breeze release-management publish-docs | ||
|
||
The publishing documentation consists steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The publishing documentation consists steps: | |
Publishing documentation consists of the following steps: |
""" | ||
Interact with AWS Neptune using proper client from the boto3 library. | ||
|
||
Hook attribute `conn` has all methods that listed in documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hook attribute `conn` has all methods that listed in documentation | |
Hook attribute `conn` has all methods listed in the documentation |
:rtype: str | ||
:raises AirflowNotFoundException: If the DB cluster does not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:rtype: str | |
:raises AirflowNotFoundException: If the DB cluster does not exist. |
We don't list rtypes anymore and I haven't seen anywhere else that lists the :raises:, though I'm not necesarily against the idea on that one.
stop_db_response = self._stop_db() | ||
if self.wait_for_completion: | ||
self._wait_until_db_stopped() | ||
return json.dumps(stop_db_response, default=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and above: Please verify this manually and let me know if this is working as you expect. I've seen json.dumps fail in the past when the json includes datetimes, and these responses do.
self.hook.wait_for_db_cluster_state(self.db_identifier, target_state="stopped") | ||
|
||
|
||
__all__ = ["NeptuneStartDbOperator", "NeptuneStopDbOperator"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure you can drop this?
|
||
|
||
class NeptuneDbType(Enum): | ||
"""Only available types for the AWS Neptune DB""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D205 nitpick
"""Only available types for the AWS Neptune DB""" | |
"""Only available types for the AWS Neptune DB.""" |
deferrable=True, | ||
waiter_delay=30, | ||
wait_for_completion=399, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear related to the rest of the PR.
Also: System tests don't currently support deferrable operators, and wait_for_completion is a bool.
catchup=False, | ||
) as dag: | ||
test_context = sys_test_context_task() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a good reason otherwise, system tests should be self-contained so the Db should be created and destroyed within the test. I know there aren't operators for that yet, but you can use the boto API in a @task to do it, as seen in other tests. Something like:
@task
def create_db():
client = NeptuneHook().conn
client.create_db_cluster(foo)
client.create_db_instance(bar)
Then a teardown task at the end to undo it.
Sorry, just noticed this was still a draft. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
closes: #28289
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.