-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod failed with error: Pod was active on the node longer than the specified deadline
-remain in status Running
#9934
Comments
This is a fatal error. It prevents us from upgrading argo version to latest. |
@shiraOvadia it looks like the k8s API is taking time to update pods when activedeadlineseconds reached. you can try the
|
I also tried "timeout" , and the behavior was the same as "activedeadlineseconds" . |
I'm also experiencing this issue. Example WorkflowapiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: active-deadline-test
spec:
entrypoint: active-deadline-test
templates:
- name: active-deadline-test
parallelism: 10
steps:
- - name: active-deadline-test-timeout
inline:
activeDeadlineSeconds: '5'
script:
image: alpine:{{.Chart.AppVersion}}
command: [bin/bash]
source: |
sleep 100s My suspicion is that the deadlineExceeded node isn't having it's phase updated correctly here: https://github.com/argoproj/argo-workflows/blob/master/workflow/controller/steps.go#L249-L258 Using timeout instead of activeDeadlineSeconds did however work Using timeout insteadapiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: active-deadline-test
spec:
entrypoint: active-deadline-test
templates:
- name: active-deadline-test
parallelism: 10
dag:
tasks:
- name: test-timeout-set
template: test-timeout
arguments:
parameters:
- name: timeout
value: '5s'
- name: test-timeout-unset
template: test-timeout
- name: test-timeout-set-empty
template: test-timeout
arguments:
parameters:
- name: timeout
value: ''
- name: test-timeout-set-zero
template: test-timeout
arguments:
parameters:
- name: timeout
value: '0s'
- name: test-timeout
inputs:
parameters:
- name: timeout
default: ''
timeout: '{{`{{inputs.parameters.timeout}}`}}'
script:
image: alpine:{{.Chart.AppVersion}}
command: [bin/bash]
source: |
sleep 100s --- EDIT --- |
@sarabala1979 Hi, I got the same issue with |
This comment was marked as resolved.
This comment was marked as resolved.
I ran into the same issue with |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as resolved.
This comment was marked as resolved.
@sarabala1979 We're still experiencing this issue. |
We are also facing the same issue with v3.4.7. Is there any ETA to fix this issue? |
any one working on this issue? since latest version having fixes for all vulnerabilities because of workflows failure issue not able to upgrade to latest |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as spam.
This comment was marked as spam.
I tested it and think it has been solved by #12761 |
Pod was active on the node longer than the specified deadline
-remain in status Running
Pre-requisites
:latest
What happened/what you expected to happen?
I set a timeout of 10 seconds in a template.
activeDeadlineSeconds: 10
After 10 seconds the pod received an error: the pod was active on the node for more than the specified deadline
After a few minutes the pod is deleted in kubernetes, but its remain in status Pending or Running.
The flow is blocked and no continue to the next template.
I expected that the pod get failed status
In previous version the same workflow work ok.
Version
V3.4.2
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
workflow.log
Logs from in your workflow's wait container
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
wait.log
The text was updated successfully, but these errors were encountered: