Add `NeptuneHook` and operators #32738

pateash · 2023-07-21T08:04:18Z

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

* Add Executors discovery and documentation The Executors can now be added via providers. This PR adds mechanism of discovering the executors via Providers Manager, exposing them via CLI and documenting in core-extensions. * Update scripts/in_container/verify_providers.py

We would like to use the config.yml approach as our main source of truth for airflow configuration. So far it has been split amongst multiple files: * confg.yml -> descroption * default_airflow.cfg -> JINJA template to keep cofiguration, examples and description (used to generate airflow.cfg when airflow starts) * default_test.cfg -> storing test-only configuration used in some test cases - used to generate unittest.cfg * scripts/in_container/airflow_ci.cfg where dCI-specific configuration overwrote the unittest.cfg This change consolidates it all into unified appraoch, where all configuration information is retrieved from .yml files stored in config_templates. No more additional template files processed by JINJA, no more special CI versions of it, no more unittestdb.cfg file where such configuration would be generated, no more unittestdb to be used separately for tests. * The default_*.cfg files were not real configuration files, becuase they were really JINJA templates and it got peoeple confused when copying the files. This change leaves the file empty with the comment that instructs the user how they can get the default configuration. * The default_airflow.cfg is gone and instead, we have a way to show defaults via `airflow config list --defaults` command * Unittest config is generated on-the-flight using defaults stored in confg_templates. constituing a single place where they need to be changed for the tests to use them * internally, description of the configuration is stored in AirflowConfigurationParser and used by the parser to generate the default configuration when needed. * we got rid of `{{{{` in templated config defaults by marking the templates with ``is_template`` and getting rid of processing those entries with regular formatting when generating the default values. This only concerns defaults from config.yml. Reading those configuration entries from file is unaffected. This change aims to be 100% backwards compatible with the previous implementation when it comes to functionality, even if internals changed. It also does not add provider-specific changes that are coming separately. The only changes visible to the user are: * generated airflow.cfg is slightly more readable and displays names of variables that can be used to override each configuration (which is very useful for copy&pasting) * user are advised, instead of copying the default_airflow.cfg to use `airflow config list --defaults` to generate production config. This configuration has all the entries commented out, so that they can selectively uncomment and change the values they want. This is now promoted as "best practice" in the documentation.

…pache#32514) * build(pre-commit): add list of supported deferrable operators to doc * docs(providers): move providers list to apache-airflow-providers core-extension

* Allow configuration to be contributed by providers The changes implemented: * provider.yaml files for providers can optionally contribute extra configuration, the configuration is exposed via "get_provider_info" entrypoint, thus allowing Airflow to discover the configuration from both - sources (in Breeze and local development) and from installed packages * Provider configuraitions are lazily loaded - only for commands that actually need them * Documentation for configuration contributed by providers is generated as part of Provider documentation. It is also discoverable by having a "core-extension" page displaying all community providers that contribute their own configuration. * Celery configuration (and in the future Kubernetes configuration) is linked directly from the airflow documentation - the providers are preinstalled, which means that celery (and Kubernetes in the future) configuration is considered as important to be directly mentioned and linked from the core. Similarly Celery and Kubernetes executor documentation remains in the core documentation (still configuration options are detailed only in the provider documentation and only linked from the core. * configuration writing happens in "main" not in the configuration initialization and we will always execute provider configuration initialization. This will make sure that the generated configuration will contain configuration for the providers as well. * Related documentation about custom and community providers have been updated and somewhat refactored - I realized that some of it was quite out-of-date and some of it was really "developer" not user docs. The docs are restructured a bit, cleaned, missing information is added and old/irrelevant parts removed. Co-authored-by: Jed Cunningham <[email protected]> * Update airflow/configuration.py Co-authored-by: Jed Cunningham <[email protected]> --------- Co-authored-by: Jed Cunningham <[email protected]>

ferruzzi · 2023-07-24T16:45:42Z

BREEZE.rst

+Publishing the documentation
+--------------------------


This is at least one of the reasons the build-docs is failing; the line needs to be the same length as the heading.

Suggested change

Publishing the documentation

--------------------------

Publishing the documentation

----------------------------

ferruzzi · 2023-07-24T16:47:10Z

BREEZE.rst

@@ -462,6 +462,51 @@ Those are all available flags of ``build-docs`` command:
  :width: 100%


The changes in this file don't feel related to me. Am I missing something, or should this be a separate PR?

ferruzzi · 2023-07-24T16:47:57Z

BREEZE.rst

+
+     breeze release-management publish-docs
+
+The publishing documentation consists  steps:


Suggested change

The publishing documentation consists steps:

Publishing documentation consists of the following steps:

ferruzzi · 2023-07-24T16:49:19Z

airflow/providers/amazon/aws/hooks/neptune.py

+    """
+    Interact with AWS Neptune using proper client from the boto3 library.
+
+    Hook attribute `conn` has all methods that listed in documentation


Suggested change

Hook attribute `conn` has all methods that listed in documentation

Hook attribute `conn` has all methods listed in the documentation

ferruzzi · 2023-07-24T16:51:20Z

airflow/providers/amazon/aws/hooks/neptune.py

+        :rtype: str
+        :raises AirflowNotFoundException: If the DB cluster does not exist.


Suggested change

:rtype: str

:raises AirflowNotFoundException: If the DB cluster does not exist.

We don't list rtypes anymore and I haven't seen anywhere else that lists the :raises:, though I'm not necesarily against the idea on that one.

ferruzzi · 2023-07-24T18:04:21Z

airflow/providers/amazon/aws/operators/neptune.py

+            stop_db_response = self._stop_db()
+        if self.wait_for_completion:
+            self._wait_until_db_stopped()
+        return json.dumps(stop_db_response, default=str)


Here and above: Please verify this manually and let me know if this is working as you expect. I've seen json.dumps fail in the past when the json includes datetimes, and these responses do.

ferruzzi · 2023-07-24T18:05:14Z

airflow/providers/amazon/aws/operators/neptune.py

+        self.hook.wait_for_db_cluster_state(self.db_identifier, target_state="stopped")
+
+
+__all__ = ["NeptuneStartDbOperator", "NeptuneStopDbOperator"]


Pretty sure you can drop this?

ferruzzi · 2023-07-24T18:05:36Z

airflow/providers/amazon/aws/utils/neptune.py

+
+
+class NeptuneDbType(Enum):
+    """Only available types for the AWS Neptune DB"""


D205 nitpick

Suggested change

"""Only available types for the AWS Neptune DB"""

"""Only available types for the AWS Neptune DB."""

ferruzzi · 2023-07-24T18:08:41Z

tests/system/providers/amazon/aws/example_eks_with_fargate_in_one_step.py

+        deferrable=True,
+        waiter_delay=30,
+        wait_for_completion=399,


This doesn't appear related to the rest of the PR.

Also: System tests don't currently support deferrable operators, and wait_for_completion is a bool.

ferruzzi · 2023-07-24T18:19:48Z

tests/system/providers/amazon/aws/example_neptune_cluster.py

+    catchup=False,
+) as dag:
+    test_context = sys_test_context_task()
+


Unless there is a good reason otherwise, system tests should be self-contained so the Db should be created and destroyed within the test. I know there aren't operators for that yet, but you can use the boto API in a @task to do it, as seen in other tests. Something like:

@task def create_db(): client = NeptuneHook().conn client.create_db_cluster(foo) client.create_db_instance(bar)

Then a teardown task at the end to undo it.

ferruzzi · 2023-07-24T18:21:44Z

Sorry, just noticed this was still a draft.

github-actions · 2023-09-08T00:10:35Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

pateash requested review from eladkal and o-nikolas as code owners July 21, 2023 08:04

boring-cyborg bot added area:providers area:system-tests provider:amazon-aws AWS/Amazon - related issues labels Jul 21, 2023

pateash marked this pull request as draft July 21, 2023 08:04

eladkal changed the title ~~AWS Neptune: Provider~~ Add NeptuneHook and operators Jul 21, 2023

eladkal requested review from ferruzzi and vincbeck July 21, 2023 08:50

pateash and others added 9 commits July 23, 2023 20:20

airflow-28289 - files added

09ee329

airflow-28289 - fixing values

fe70582

New breeze command to publish docs (apache#32495)

d03d86e

Deferrable mode for EKS Create/Delete Operator (apache#32355)

bc57697

build(pre-commit): add list of supported deferrable operators to doc (a…

28997b7

…pache#32514) * build(pre-commit): add list of supported deferrable operators to doc * docs(providers): move providers list to apache-airflow-providers core-extension

airflow-28289 Add AWS Neptune hook and operators

adc7ed5

pateash force-pushed the airflow-28289 branch from 5a0d8e7 to adc7ed5 Compare July 23, 2023 14:50

ferruzzi reviewed Jul 24, 2023

View reviewed changes

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 8, 2023

github-actions bot closed this Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `NeptuneHook` and operators #32738

Add `NeptuneHook` and operators #32738

pateash commented Jul 21, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi Jul 24, 2023

ferruzzi commented Jul 24, 2023

github-actions bot commented Sep 8, 2023

		@@ -462,6 +462,51 @@ Those are all available flags of ``build-docs`` command:
		:width: 100%


		breeze release-management publish-docs

		The publishing documentation consists steps:

	The publishing documentation consists steps:
	Publishing documentation consists of the following steps:

	Hook attribute `conn` has all methods that listed in documentation
	Hook attribute `conn` has all methods listed in the documentation

		:rtype: str
		:raises AirflowNotFoundException: If the DB cluster does not exist.

		self.hook.wait_for_db_cluster_state(self.db_identifier, target_state="stopped")


		__all__ = ["NeptuneStartDbOperator", "NeptuneStopDbOperator"]



		class NeptuneDbType(Enum):
		"""Only available types for the AWS Neptune DB"""

Add NeptuneHook and operators #32738

Add NeptuneHook and operators #32738

Conversation

pateash commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferruzzi commented Jul 24, 2023

github-actions bot commented Sep 8, 2023

Add `NeptuneHook` and operators #32738

Add `NeptuneHook` and operators #32738