-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AWS EmrStepSensor ignoring the specified aws_conn_id in deferred mode #33952
Fix AWS EmrStepSensor ignoring the specified aws_conn_id in deferred mode #33952
Conversation
def hook(self) -> AwsGenericHook: | ||
return EmrContainerHook(self.aws_conn_id) | ||
return EmrContainerHook(aws_conn_id=self.aws_conn_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, hooks are mixed with positional/keyword arguments, and this might lead a problem when we create Thick wrapper hook
airflow/airflow/providers/amazon/aws/hooks/emr.py
Lines 53 to 58 in 633217c
def __init__(self, emr_conn_id: str | None = default_conn_name, *args, **kwargs) -> None: | |
self.emr_conn_id = emr_conn_id | |
kwargs["client_type"] = "emr" | |
super().__init__(*args, **kwargs) | |
def hook(self) -> AwsGenericHook: | ||
return EmrContainerHook(self.aws_conn_id) | ||
return EmrContainerHook(aws_conn_id=self.aws_conn_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vandonr-amz Not related to this PR, but why we decide to use hook
as method instead of property here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering. Maybe some serialization issues or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly, maybe it is because it actually builds the hook every time, so it shouldn't be used "like a property".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case we usually decorate it as @cached_property
. In general it uses in a lot of places over the different providers, especially when you need to move something heavy from constructor and keep compatible attribute.
Anyway it was just a question, no offence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sure we can, but it shouldn't be necessary as we use it only in once place.
airflow 2.7.0
amazon-provider 8.6.0
Problem
AWS EmrStepSensor in deferred mode falls back to
aws_default
connection after deferral and ignores the connection id specified.Root-cause
Internally EmrStepSensorTrigger initializes EmrHook without specifying a value for the
aws_conn_id
and instead sets theemr_conn_id
which is not used by the trigger. Thus after serialization in the Trigerrer the default value for theaws_conn_id
is used (which isaws_default
).See the example log in the bottom (Airflow there doesn't have the
aws_default
connection configured). Note that before deferral it uses theaws
connection, but afterwards switches toaws_default.
Scope
This PR
aws_conn_id
value^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.