-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated logic to allow AWS Batch Hook get_job_description
retries to be more effective
#37552
Conversation
get_job_description
retries to be more effective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the test case for the change? test cases are failing. Can you take look at them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aggree with @dirrao the test cases should be fixed, and new tests should be added.
There are 3 different places in Amazon Provider which are rely on current behaviour, and unknown number of usage into th user code.
job = self.hook.get_job_description(self.job_id) |
job_desc = self.hook.get_job_description(self.job_id) |
job_description = self.hook.get_job_description(self.job_id) |
At least this places should be covered by the new behaiviour. We do not want to to break something when we tried to improve/fix something other
else: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem like this continue redundant
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Update logic in
Amazon/aws
providers forbatch_client
hook
'sparse_job_description
function. This function is ONLY called within theget_job_description
function that allows for retries; however, there is only handling ofbotocore.exceptions.ClientError
error in this function.This means, that if AWS simply fails to provide the job description - if there is a delay in AWS to register job information, this function will immediate fail instead of trying. This may happen under certain conditions, especially when using the
batch
operators for submitting jobs which immediately use returnedjob_id
to query for the job information - but again, if AWS fails to register the job description in time - this will cause an unnecessary failure in Airflow side of things.The proposed change allows
parse_job_description
to returnNone
instead of raising if zero jobs are returned by AWS. Similarly, if more than one matching job is returned, the function will provide the description of the first job.