Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImagePullBackOff should immediately flag a deployment as failed #863

Closed
edvald opened this issue Jun 20, 2019 · 0 comments · Fixed by #865
Closed

ImagePullBackOff should immediately flag a deployment as failed #863

edvald opened this issue Jun 20, 2019 · 0 comments · Fixed by #865

Comments

@edvald
Copy link
Collaborator

edvald commented Jun 20, 2019

Bug

Current Behavior

Currently, deployments hang for a while and eventually time out when the cluster is unable to fetch the image for a pod.

Expected behavior

I'd expect Garden to show an error as soon as an ImagePullBackoff event comes up in relation to the pods I'm deploying.

Reproducible example

Use any Helm chart that has a values parameter for image name and/or tag. Configure it with
a dummy non-existing name/tag and try to run garden deploy.

Workaround

Make sure your image names/tags are correct, I guess.

Suggested solution(s)

We need to rework how we collect information about workload deployment status. We're missing or ignoring some relevant events.

Your environment

Latest master (any recent version really).

edvald added a commit that referenced this issue Jun 21, 2019
This combines a general refactor of our workload status checks, more
robust checking of related resource statuses (in order to fail fast on
common errors like ImagePullBackOff and CrashLoopBackOff), as well as an
improvement on the level of information given when errors do come up.

This should aid considerably in debugging deployment issues.

Closes #863
edvald added a commit that referenced this issue Jun 21, 2019
This combines a general refactor of our workload status checks, more
robust checking of related resource statuses (in order to fail fast on
common errors like ImagePullBackOff and CrashLoopBackOff), as well as an
improvement on the level of information given when errors do come up.

This should aid considerably in debugging deployment issues.

Closes #863
edvald added a commit that referenced this issue Jun 24, 2019
This combines a general refactor of our workload status checks, more
robust checking of related resource statuses (in order to fail fast on
common errors like ImagePullBackOff and CrashLoopBackOff), as well as an
improvement on the level of information given when errors do come up.

This should aid considerably in debugging deployment issues.

Closes #863
edvald added a commit that referenced this issue Jun 24, 2019
This combines a general refactor of our workload status checks, more
robust checking of related resource statuses (in order to fail fast on
common errors like ImagePullBackOff and CrashLoopBackOff), as well as an
improvement on the level of information given when errors do come up.

This should aid considerably in debugging deployment issues.

Closes #863
thsig pushed a commit that referenced this issue Jun 24, 2019
This combines a general refactor of our workload status checks, more
robust checking of related resource statuses (in order to fail fast on
common errors like ImagePullBackOff and CrashLoopBackOff), as well as an
improvement on the level of information given when errors do come up.

This should aid considerably in debugging deployment issues.

Closes #863
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant