-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskRun with InvalidImageName runs forever #6105
Comments
/assign @vdemeester @vdemeester to check if we have addressed this already! |
Pipelines WG - @jerop please move to the next milestone if we haven't heard back on this before releasing 0.48 |
WG |
I left a comment here as a start #4846 (comment) - I will check if there is a way we can make the |
@chmouel I would argue that this is not critical urgent. A faster fail would be nice to have, however, if I understood correctly, the k8s Pod controller will keep trying until a timeout, so Tekton honours that behaviour. |
Testing the current behaviour. Pod events:
The |
The Is there a way other than deletion to terminate the pod? Once the pod is signalled for deletion or stop:
|
@chmouel @tektoncd/core-maintainers it seems to me that a good solution is more complex than the value it brings, but I'd be happy to continue working on this if you think it's worth it. We need to decide what the behaviour of the |
@afrittoli thank you for looking into this! I think deleting the pod and failing the TaskRun makes the most sense. I don't see a problem with deletion if the image pull error isn't retryable (hopefully we have a way of determining if the error is retryable or not). Cancelation might be a bit confusing but I think it would also be an OK solution. I just want to note that if we choose cancelation for invalid image pulls and end up implementing support for canceling the taskrun via the entrypoint (#3238), we may need to replace the user-provided image with a noop image or something like that, since the entrypoint can't start before the image is pulled. |
@afrittoli I do also think deleting the pod and failing the taskrun makes the most sense. We just need to make sure we provide enough information to the users in the TaskRun for them not to be confused that there is not |
The kubernets pod treats an invallid image failures as potentially ephemeral errors, because even if the format of the image reference is not syntactically correct, users may update image without recreating the Pod. Tekton, however, uses Pod to provide workloads that run to completion, and users are not allowed to change the specification of steps during execution. This commits changes the handling of the InvalidImageName pod reason, so that the TaskRun is marked as failed and the Pod deleted. Fixes: tektoncd#6105 Signed-off-by: Andrea Frittoli <[email protected]>
The kubernets pod treats an invallid image failures as potentially ephemeral errors, because even if the format of the image reference is not syntactically correct, users may update image without recreating the Pod. Tekton, however, uses Pod to provide workloads that run to completion, and users are not allowed to change the specification of steps during execution. This commits changes the handling of the InvalidImageName pod reason, so that the TaskRun is marked as failed and the Pod deleted. Fixes: tektoncd#6105 Signed-off-by: Andrea Frittoli <[email protected]>
The kubernets pod treats an invallid image failures as potentially ephemeral errors, because even if the format of the image reference is not syntactically correct, users may update image without recreating the Pod. Tekton, however, uses Pod to provide workloads that run to completion, and users are not allowed to change the specification of steps during execution. This commits changes the handling of the InvalidImageName pod reason, so that the TaskRun is marked as failed and the Pod deleted. Fixes: #6105 Signed-off-by: Andrea Frittoli <[email protected]>
Expected Behavior
When an image name of a step is invalid, the taskrun/pipelinerun doesn't get canceled, it waits forever (or as long the timeout is effective)
Actual Behavior
Fail only until the timeout expired.
Steps to Reproduce the Problem
Additional Info
Latest pipeline on Latest kind
@lbernick was mentioning the right approach to handle this here:
#4846 (comment)
The text was updated successfully, but these errors were encountered: