AWS Neptune: Operators for StartDB and StopDB cluster #29168

swapz-z · 2023-01-25T19:08:42Z

Introducing a Hook for AWS Neptune Database along with Start and Stop DB operators for Neptune Cluster

NeptuneHook
NeptuneStartDbOperator
NeptuneStopDbOperator

The current implementation has been hugely inspired from RDS hooks and operators.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Taragolis · 2023-01-25T19:19:45Z

airflow/providers/amazon/aws/operators/neptune.py

+        db_identifier: str,
+        db_type: NeptuneDbType | str = NeptuneDbType.CLUSTER,
+        aws_conn_id: str = "aws_default",
+        region_name: str = "us-east-1",


We should not define default region_name

Suggested change

region_name: str = "us-east-1",

region_name: str | None = None,

Taragolis · 2023-01-25T19:20:30Z

airflow/providers/amazon/aws/operators/neptune.py

+        *,
+        db_identifier: str,
+        db_type: NeptuneDbType | str = NeptuneDbType.CLUSTER,
+        aws_conn_id: str = "aws_default",


aws_conn_id could be None in this case default boto3 strategy would use.

Suggested change

aws_conn_id: str = "aws_default",

aws_conn_id: str | None = "aws_default",

Taragolis · 2023-01-25T19:23:06Z

airflow/providers/amazon/aws/operators/neptune.py

+    ):
+        super().__init__(**kwargs)
+        self.db_identifier = db_identifier
+        self.hook = NeptuneHook(aws_conn_id=aws_conn_id, region_name=region_name)


You should move hook definition to @cached_property, e.g.:

airflow/airflow/providers/amazon/aws/operators/athena.py

Lines 102 to 107 in b314db9

@cached_property

def hook(self) -> AthenaHook:

"""Create and return an AthenaHook."""

return AthenaHook(self.aws_conn_id, sleep_time=self.sleep_time, log_query=self.log_query)

Taragolis · 2023-01-25T19:23:32Z

airflow/providers/amazon/aws/operators/neptune.py

+        aws_conn_id: str = "aws_default",
+        region_name: str = "us-east-1",


Same as above

Taragolis · 2023-01-25T19:25:00Z

airflow/providers/amazon/aws/operators/neptune.py

+
+__all__ = ["NeptuneStartDbOperator", "NeptuneStopDbOperator"]


If you want to include __all__ include it in the top of the module

Taragolis · 2023-01-25T19:29:43Z

airflow/providers/amazon/aws/hooks/neptune.py

+class NeptuneHook(AwsBaseHook):
+    """
+    Interact with AWS Neptune using proper client from the boto3 library.
+
+    Hook attribute `conn` has all methods that listed in documentation
+
+    .. seealso::
+        - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptune.html
+        - https://docs.aws.amazon.com/neptune/index.html
+
+    Additional arguments (such as ``aws_conn_id`` or ``region_name``) may be specified and
+    are passed down to the underlying AwsBaseHook.
+
+    .. seealso::
+        :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook`
+
+    :param aws_conn_id: The Airflow connection used for AWS credentials.
+    """


Please define Hook docsting by the same way it is implemented in other boto3-hooks. See example

airflow/airflow/providers/amazon/aws/hooks/ecs.py

Lines 89 to 101 in b314db9

class EcsHook(AwsGenericHook):

"""

Interact with Amazon Elastic Container Service (ECS).

Provide thin wrapper around :external+boto3:py:class:`boto3.client("ecs") <ECS.Client>`.

Additional arguments (such as ``aws_conn_id``) may be specified and

are passed down to the underlying AwsBaseHook.

.. seealso::

- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`

- `Amazon Elastic Container Service \

<https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>`__

"""

You could check in main branch documentation how it looks like.

How can i check how my current doc string looks like in web UI ? Any chance ?

You could build it in Breeze, be aware build entire documentation required a lot of time, so make sure you only build for Amazon

The command for that is breeze build-docs --package-filter apache-airflow-providers-amazon and it does save a lot of build time. 👍

Taragolis · 2023-01-25T19:36:46Z

airflow/providers/amazon/aws/hooks/neptune.py

+        def poke():
+            return self.get_db_cluster_state(db_cluster_id)
+
+        target_state = target_state.lower()
+        self._wait_for_state(poke, target_state, check_interval, max_attempts)
+        self.log.info("DB cluster snapshot '%s' reached the '%s' state", db_cluster_id, target_state)


I thinks currently we have a different method for waiting operations in new hooks?

@vincbeck @ferruzzi @vandonr-amz Am I right?

Correct! Please use the function waiter defined here

Sorry to contradict Vincent, but we should be standardizing on this new Waiter implementation which offloads a lot of the work to the boto API instead.

Nice catch! No worries at all, it is good to be contradicted :) I forgot we implemented this. Side question, should we then deprecate the waiter function or is there any use case not covered by the custom waiters which the waiter function satisfy?

I think @syedahsn and I were working on them in parallel without noticing it, but whether there are usecases where his (the one you linked first) is the better answer... I don't think there are, but I could be mistaken.

And yeah, I guess we should flag one as deprecated, or at least leave a comment to that effect so folks don't add to the mess, and set some time aside to do the conversions. Batch is another that has it's own unique way of re-implementing the boto waiter and needs to get moved over to a standardized approach at some point.

I think it would be scope creep to have any of that done here, but the new waiters should definitely be done the "right" way at the very least.

vincbeck · 2023-01-25T19:44:01Z

airflow/providers/amazon/aws/hooks/neptune.py

+        def poke():
+            return self.get_db_cluster_state(db_cluster_id)
+
+        target_state = target_state.lower()
+        self._wait_for_state(poke, target_state, check_interval, max_attempts)
+        self.log.info("DB cluster snapshot '%s' reached the '%s' state", db_cluster_id, target_state)


Correct! Please use the function waiter defined here

vincbeck · 2023-01-25T19:46:33Z

airflow/providers/amazon/aws/operators/neptune.py

+        self.wait_for_completion = wait_for_completion
+
+    def execute(self, context: Context) -> str:
+        self.db_type = NeptuneDbType(self.db_type)


You're overriding the value you already set on line 66? I am not sure I understand what you are trying to achieve here

vincbeck · 2023-01-25T19:47:20Z

airflow/providers/amazon/aws/operators/neptune.py

+
+        if self.wait_for_completion:
+            self._wait_until_db_available()
+        return json.dumps(start_db_response, default=str)


start_db_response is always None?

vincbeck · 2023-01-25T19:48:46Z

airflow/providers/amazon/aws/operators/neptune.py

+        self.wait_for_completion = wait_for_completion
+
+    def execute(self, context: Context) -> str:
+        self.db_type = NeptuneDbType(self.db_type)


Same as above

vincbeck · 2023-01-25T19:49:27Z

airflow/providers/amazon/aws/operators/neptune.py

+        response = self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)
+        return response


Suggested change

response = self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)

return response

return self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)

vincbeck · 2023-01-25T19:51:27Z

docs/apache-airflow-providers-amazon/operators/neptune.rst

+======================================================
+Amazon Neptune Documentation
+======================================================


Please be sure lines of === are the same length as the title

vincbeck · 2023-01-25T19:51:39Z

docs/apache-airflow-providers-amazon/operators/neptune.rst

+`Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run
+applications that work with highly connected datasets. The core of Neptune is a purpose-built,
+high-performance graph database engine that is optimized for storing billions of relationships and
+querying the graph with milliseconds latency. Neptune supports the popular graph query languages
+Apache TinkerPop Gremlin and W3C's SPARQL, allowing you to build queries that efficiently navigate highly connected
+datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs,
+drug discovery, and network security.`


Suggested change

`Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run

applications that work with highly connected datasets. The core of Neptune is a purpose-built,

high-performance graph database engine that is optimized for storing billions of relationships and

querying the graph with milliseconds latency. Neptune supports the popular graph query languages

Apache TinkerPop Gremlin and W3C's SPARQL, allowing you to build queries that efficiently navigate highly connected

datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs,

drug discovery, and network security.`

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run

applications that work with highly connected datasets. The core of Neptune is a purpose-built,

high-performance graph database engine that is optimized for storing billions of relationships and

querying the graph with milliseconds latency. Neptune supports the popular graph query languages

Apache TinkerPop Gremlin and W3C's SPARQL, allowing you to build queries that efficiently navigate highly connected

datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs,

drug discovery, and network security.

vincbeck · 2023-01-25T19:55:39Z

tests/system/providers/amazon/aws/example_neptune_cluster.py

+
+    # Assuming Neptune DB is already created, its identifier is provided to test NeptuneStartDbOperator
+    # and NeptuneStopDbOperator
+    neptune_db_identifier = f"{test_context[ENV_ID_KEY]}-neptune-database"


Ideally we are trying to make system tests as self contained as possible which means, here it would be great if you could create the difference resources you need to start the database. It does not mean to create the operators associated to these actions, you can call these actions by creating custom tasks using TaskFlow API. A good example is example_batch.py

eladkal · 2023-02-26T19:12:05Z

@swapz-z are you still working on this PR?

github-actions · 2023-04-13T00:10:50Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

Added StartDB and StopDB operators for Neptune

3e08e90

boring-cyborg bot added area:providers area:system-tests kind:documentation provider:amazon-aws AWS/Amazon - related issues labels Jan 25, 2023

Merge branch 'main' into Neptune_start_stop_operators

d49eb84

Taragolis requested changes Jan 25, 2023

View reviewed changes

vincbeck requested changes Jan 25, 2023

View reviewed changes

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 13, 2023

github-actions bot closed this Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Neptune: Operators for StartDB and StopDB cluster #29168

AWS Neptune: Operators for StartDB and StopDB cluster #29168

swapz-z commented Jan 25, 2023 •

edited by eladkal

Loading

Taragolis Jan 25, 2023 •

edited

Loading

Taragolis Jan 25, 2023

Taragolis Jan 25, 2023

Taragolis Jan 25, 2023

Taragolis Jan 25, 2023

Taragolis Jan 25, 2023

swapz-z Jan 25, 2023

Taragolis Jan 25, 2023

ferruzzi Jan 25, 2023

Taragolis Jan 25, 2023

vincbeck Jan 25, 2023

ferruzzi Jan 25, 2023

vincbeck Jan 25, 2023

ferruzzi Jan 25, 2023 •

edited

Loading

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023 •

edited

Loading

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023

vincbeck Jan 25, 2023

eladkal commented Feb 26, 2023

github-actions bot commented Apr 13, 2023

	region_name: str = "us-east-1",
	region_name: str \| None = None,

	aws_conn_id: str = "aws_default",
	aws_conn_id: str \| None = "aws_default",


	@cached_property
	def hook(self) -> AthenaHook:
	"""Create and return an AthenaHook."""
	return AthenaHook(self.aws_conn_id, sleep_time=self.sleep_time, log_query=self.log_query)


		__all__ = ["NeptuneStartDbOperator", "NeptuneStopDbOperator"]

	class EcsHook(AwsGenericHook):
	"""
	Interact with Amazon Elastic Container Service (ECS).
	Provide thin wrapper around :external+boto3:py:class:`boto3.client("ecs") <ECS.Client>`.

	Additional arguments (such as ``aws_conn_id``) may be specified and
	are passed down to the underlying AwsBaseHook.

	.. seealso::
	- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
	- `Amazon Elastic Container Service \
	<https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>`__
	"""

		response = self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)
		return response

	response = self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)
	return response
	return self.hook.conn.stop_db_cluster(DBClusterIdentifier=self.db_identifier)

AWS Neptune: Operators for StartDB and StopDB cluster #29168

AWS Neptune: Operators for StartDB and StopDB cluster #29168

Conversation

swapz-z commented Jan 25, 2023 • edited by eladkal Loading

Taragolis Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferruzzi Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincbeck Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eladkal commented Feb 26, 2023

github-actions bot commented Apr 13, 2023

swapz-z commented Jan 25, 2023 •

edited by eladkal

Loading

Taragolis Jan 25, 2023 •

edited

Loading

ferruzzi Jan 25, 2023 •

edited

Loading

vincbeck Jan 25, 2023 •

edited

Loading