-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLSensor does not time out #39219
Comments
This won't solve the problem, but might help surface more logs and lead to finding the solution. You'll have a complete set of logs if you increase the celery visibility timeout (details here). |
Unfortunately it looks like it is a locked down parameter :-( Amazon MWAA reserves a set of critical environment variables. If you overwrite a reserved variable, Amazon MWAA restores it to its default. The following lists the reserved variables: |
If this is MWAA specific problem then you need to contact MWAA support |
I assume you have If so please read https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#timeouts |
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.1
What happened?
Some of the SQL Sensors from time to time does not respect the timeout interval and run the entire DAG for multiple days. The sensors we use have waiting timeout 12 hours and usually starts after midnight or early morning. Normally all sensors finish - either skips (soft fail = True) or succeeds - by afternoon. However from time to time it happen that the sensor has one or more retries, and it is running for more than 24 or 36 hours. In the last case I thought maybe the counting is from the last retry attempt, but no, because the log says it started to poke 17 hours ago.
What you think should happen instead?
The sensor should quit (time out) and skip reliably.
How to reproduce
I dont have a way to reproduce, it happens cca twice per month with 500 sensors running every day.
Operating System
mwaa
Versions of Apache Airflow Providers
No response
Deployment
Amazon (AWS) MWAA
Deployment details
No response
Anything else?
This is a log of the sensor what started 23.4.2024 and should time out after 12 hours, but was running until manually marked as failed on 24.4.2024.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: