Replies: 4 comments 10 replies
-
Just got back from a vacation, sorry for the delay. I have read through this and want to take a minute to digest it. My initial thought is that this is the time to make a change I've been wanting to do. This doesn't necessarily need to be done at the same time, but if we are designing a new abstraction layer/interface then it might be a good time to think about how we could implement this. If we are making breaking changes, then some of the existing hooks can be converted to what we've been calling "thin hooks". There are hooks which just take values and pass them through to boto. As one example, see here:
In these cases the method basically just converts python's snake_case to boto's CamelCase and adds IDE completion, but I always wonder if there is a better way to get those benefits using some interface like you are suggesting. The best I have come up with so far is to replace those with a
For example, without any changes to the existing Athena hook right now you can do something like AthenaHook().list_engine_versions() without having to add it to the hook. IFF we go through with a change like that then another thing that would pair really well with it in the new interface would be a way to add a "wait for completion" block (new decorator/wrapper??) to any arbitrary boto call. We have been going through and adding wait_for_completion to hooks and operators left and right and I think if there was a convenient way to optionally wrap any hook in a wait block like that, it would allow us to clean out a ton of boilerplate code in the hooks. I don't have a proposal on what that would look like, but thought I'd throw it out there in case hat turns out to be something easy to implement in the new layer. |
Beta Was this translation helpful? Give feedback.
-
Also, I just wanted to take a sec to say thanks. You have been putting a ton of effort into improving the AWS provider package and it's absolutely appreciated. 👍 |
Beta Was this translation helpful? Give feedback.
-
Looks really cool @Taragolis, love the idea!
Do we really need the
Is part of this proposal creating hooks for all services? I'd love to see this and we've had some internal discussions about it as well. Having at least a (autogenerated) hook for all services would be great. And then any custom methods can be added to the hook as operators are written for each service. |
Beta Was this translation helpful? Give feedback.
-
Love the video! Thanks for the effort. Very minor technical question on my side, should not |
Beta Was this translation helpful? Give feedback.
-
Hello Airflow folks, I hope you are doing well and Airflow pipelines work as expected.
I have a proposal/idea about AWS Hooks which interact with AWS API by
boto3
(andbotocore
as Éminence grise)Goals
Lets starts with what I expected to have in Amazon Provider Hooks:
boto3.session.Session
,boto3.client
(low-level client which usual refers to specificbotocore
Client generated from json data-definitions) andboto3.resource
(high-level client) without define method specific parameters in hook constructor, some sample of hooks which break this idea:api_type
Demo
boto3-hooks.mp4
Suggested Implementation
This is only idea if I had solution I would create PR rather than discussion. I have some thoughts about how it should looks like
All new naming just for reference, you could find PoC implementation in Gist
Current Inheritance model
Proposal Inheritance model
Changes in attributes
New attributes (cached):
service_name
will replaceclient_type
andresource_type
attributesNew properties (cached):
session
return cached session associated with Hookclient
either Boto3ResourceHook and Boto3ClientHook will provideboto3.client
resource
only Boto3ResourceHook will provideboto3.resource
Deprecate:
conn
this property as well asget_conn
depend onclient_type
andresource_type
and does not required anymore if we havesession
,client
andresource
.Backward compatibility
Without backward compatibility it could turned into
Idea is add
legacy_mode
attribute and set it implicit to True for all existed hooks.Property
conn
andget_conn
will provide compatibility with previous version of Hooks for a whileClient vs Resource
Service / Resources / Hooks statistic
According to internals: AWS have 336 different services,10 of them have high-level client and in total we have 31 hooks in Amazon Provider.
How to decide is new hook should based on
Boto3ResourceHook
orBoto3ClientHook
?boto3.resource
is high level wrapper aroundboto3.client
that mean if boto3 supports resource for specific service we should use Boto3ResourceHook otherwise Boto3ClientHook.Also I suggest migrate existed hooks to Resource-based
cloudformation
: Current implementation airflow.providers.amazon.aws.hooks.cloud_formation.CloudFormationHook(client_type="cloudformation")cloudwatch
: Not existed (not Cloudwatch Logs provided bylogs
)dynamodb
: Current implementation airflow.providers.amazon.aws.hooks.dynamodb.DynamoDBHook(resource_type="dynamodb")ec2
: Current implementation airflow.providers.amazon.aws.hooks.ec2.EC2Hook(api_type="resource_type" or api_type="client_type")glacier
: Current implementation airflow.providers.amazon.aws.hooks.glacier.GlacierHook(client_type="glacier")iam
: Not existedopsworks
: Not existeds3
: Current implementation airflow.providers.amazon.aws.hooks.s3.S3Hook(client_type="s3") and also it manually create resource for some methodssns
: Current implementation airflow.providers.amazon.aws.hooks.sns.SnsHook(client_type="sns")sqs
: Current implementation airflow.providers.amazon.aws.hooks.sqs.SqsHook(client_type="sqs")Conclusion
I'd like to hear any feedback/ideas/objections before create Issues/Tasks/PRs. Thanks for your time!
cc: @o-nikolas @ferruzzi @eladkal @shubham22 @potiuk
Beta Was this translation helpful? Give feedback.
All reactions