You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The airflow-redis-master pod deployment happened after celery worker pods deployment then Worker pods not able to process any tasks until manually restarted the worker pods. I have killed the airflow-redis-master pod and it is disconnected with worker pods and worker pods stop processing tasks until manually restarted the worker pods.
in logs could see, missed heartbeat from another worker pod. we are facing this issue in 2.2.5 version and didn't face this issue in 1.10.12 version.
in logs missed heartbeat is the last message.
[2022-05-30 17:53:18,213: INFO/MainProcess] Connected to redis://:**@airflow-redis-master.auto1.svc.cluster.local:6379/1
[2022-05-30 17:53:18,228: INFO/MainProcess] mingle: searching for neighbors
[2022-05-30 17:53:19,239: INFO/MainProcess] mingle: all alone
[2022-05-30 17:53:24,246: INFO/MainProcess] missed heartbeat from celery@airflow-worker-0
we have configured the liveness checks for worker pod for workaround with below command.
celery --app airflow.executors.celery_executor.app inspect ping
but the pods are restarting if all worker nodes are health checks failed. if any one of the worker health check failed. the liveness probes are considering as healthy as it is getting response from healthy worker pod.
What you think should happen instead
The worker pods has to resume the connection with airflow-redis-master node after redis pod up.
How to reproduce
please delete the airflow-redis-master pod and monitor the worker logs. after sometime you can see missed heartbeat in logs and worker pods not able to process any tasks.
Apache Airflow version
2.2.5
What happened
The airflow-redis-master pod deployment happened after celery worker pods deployment then Worker pods not able to process any tasks until manually restarted the worker pods. I have killed the airflow-redis-master pod and it is disconnected with worker pods and worker pods stop processing tasks until manually restarted the worker pods.
in logs could see, missed heartbeat from another worker pod. we are facing this issue in 2.2.5 version and didn't face this issue in 1.10.12 version.
in logs missed heartbeat is the last message.
[2022-05-30 17:53:18,213: INFO/MainProcess] Connected to redis://:**@airflow-redis-master.auto1.svc.cluster.local:6379/1
[2022-05-30 17:53:18,228: INFO/MainProcess] mingle: searching for neighbors
[2022-05-30 17:53:19,239: INFO/MainProcess] mingle: all alone
[2022-05-30 17:53:24,246: INFO/MainProcess] missed heartbeat from celery@airflow-worker-0
we have updated below config but didn't work.
AIRFLOW__CELERY_BROKER_TRANSPORT_OPTIONS__MAX_RETRIES=6
AIRFLOW__CELERY_BROKER_CONNECTION_TIMEOUT=60
AIRFLOW_CELERY_BROKER_HEARTBEAT=360
broker_connection_timeout=
we have configured the liveness checks for worker pod for workaround with below command.
celery --app airflow.executors.celery_executor.app inspect ping
but the pods are restarting if all worker nodes are health checks failed. if any one of the worker health check failed. the liveness probes are considering as healthy as it is getting response from healthy worker pod.
What you think should happen instead
The worker pods has to resume the connection with airflow-redis-master node after redis pod up.
How to reproduce
please delete the airflow-redis-master pod and monitor the worker logs. after sometime you can see missed heartbeat in logs and worker pods not able to process any tasks.
Operating System
"Debian GNU/Linux 10 (buster)"
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==3.2.0
apache-airflow-providers-celery==2.1.3
apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-docker==2.5.2
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-google==6.7.0
apache-airflow-providers-grpc==2.0.4
apache-airflow-providers-hashicorp==2.1.4
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-microsoft-azure==3.7.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-odbc==2.0.4
apache-airflow-providers-postgres==4.1.0
apache-airflow-providers-redis==2.0.4
apache-airflow-providers-sendgrid==2.0.4
apache-airflow-providers-sftp==2.5.2
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-sqlite==2.1.3
apache-airflow-providers-ssh==2.4.3
Deployment
Official Apache Airflow Helm Chart
Deployment details
https://github.com/apache/airflow
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: