-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferred Glue Operators seem to always be verbose #43620
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
I had a quick look at other serialize functions and they just seem to make |
There is a test for verbose=True but not verbose=False, as it is always True then this will pass fine. airflow/providers/tests/amazon/aws/operators/test_glue.py Lines 158 to 183 in ff6038b
|
Feel free to raise PR with your code suggestions (and improving the tests). It will be easier to review and look into it with a PR |
I didn’t quite understand your question. In your configuration, the deferrable parameter is set to False in the re-produce step (step 2). This means the operator won’t use the deferrable mode; instead, it will execute normally on the worker. |
Sorry, I made a mistake typing it up. I'll fix it. |
Fixed in description.
|
Heya, I've added a test to this PR which shows the bug. |
Should be fixed now by my reference PR. |
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.28.0
Apache Airflow version
2.10.1
Operating System
Amazon Linux 2023
Deployment
Amazon (AWS) MWAA
Deployment details
MWAA in production and mwaa-local-runner for local testing. Bug exists on both.
What happened
Deferred GlueJobOperator tasks are getting job failures due to rate limiting on fetching logs from Cloudwatch, we have verbose set to False. The traceback shows a function
print_job_logs
that should only be called when verbose is true. Additionally when I watch the task logs in real time they are definitely pulling the Glue logs from Cloudwatch into Airflow.A colleague and I read through the operator code and it all looks correct, so we suspect that the issues is occurring when the task is serialized as that specifically converts the bool to a string. The string "False" would then be truthy and cause this code to run. I'm guessing that the tasks are serialized when they are deferred.
Serialization:
airflow/providers/src/airflow/providers/amazon/aws/triggers/glue.py
Lines 58 to 69 in ff6038b
Code which is getting called when verbose is false:
airflow/providers/src/airflow/providers/amazon/aws/hooks/glue.py
Lines 333 to 338 in ff6038b
Example python showing bool serialization issues:
What you think should happen instead
Cloudwatch logs should not be read into Airflow when verbose is False.
If the serialization process is the issue, then it should either change to serialize differently or the deserialization should be changed to understand
"True"
and"False"
.How to reproduce
Anything else
Seems to be happening everytime if I watch the tasks. Completed tasks (success or failed) do not show the logs but deferred tasks do, which to me backs up some kind of serialization bug in the Operator.
The traceback shows that code is being called when running even though in the log it is not shown.
Would be willing to submit a PR if someone could assist.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: