-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add liveness probe to Celery workers #25561
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this prob can only work when worker_enable_remote_control
is true.
@pingzh, very good call. Do you know of a better probe to use when it's disabled? I'm tempted to just add an |
I am not aware of other better probe methods. For us, we turn off I like the idea of adding an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a note somewhere about not enabling this with SQS?
SQS is not officially supported by Airflow. We disucssed it, but Amazon team experience is that it has many more quirks and the level of support in Celery is definitely not on par with Redis/RabbitMQ so we should refrain from even stating that SQS can be used in Airflow #24019. |
@jedcunningham, I have enabled health checks for workers as workers not processing any messages when redis and workers communication broken. The liveness checks are causing memory leak. |
I believe this is the issue with K8S livenessprobe kubernetes-sigs/vsphere-csi-driver#778 - you can update K8S to latest version and check that the CSI livenessprobe is of the right version kubernetes-csi/livenessprobe#94 Generally upgrading whatever K8S you are usiung to latest version is highly recommended. Please double-check that @anu251989 and in case you observe the same issue with latest version of K8S, report it please as a new issue. |
This adds a liveness probe to our workers, to help guard against the worker being "up" but not communicating with Celery.
Might help with #24731, though it'll be a pretty blunt solution.