-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137 #1669
Comments
Additionally - it builds fine with caching disabled and when a heavy 8 CPU machine type is used. However, I think it's strange that Kaniko caching requires more resources than the build itself. |
I've been trying to work around this issue for the past several days. Kaniko consistently tries to use more memory than our kubernetes cluster has available. It only happens with our large images. |
Any workaround available? My base image is |
Any update on this topic? I have this issue on every ML related dockerfile where we need to use pytorch and other libs. |
The :latest image is quite old, pointing to :v1.6.0 due to issues with :v1.7.0 It's possible the bug is fixed at head, and while we wait for a v1.8.0 release (#1871) you can try out the latest commit-tagged release and see if that helps: If it's not fixed, it sounds like we need to figure out where layer contents are being buffered into memory while being cached, which it sounds like was introduced some time between v1.3 and now. If anybody investigates and finds anything useful, please add it here. |
Looks like it worked but I tried with cache disabled. On 1.6 even with cache disabled it was stopping. So good sign |
any update for this issue ?, i am facing same problem when deploy ML image with sentence-transformers and torch>=1.6.0. the image size is more than 3 GB. |
It sounds like #1669 (comment) says this works with a newer commit-tagged image, and with caching disabled. It sounds like caching causes filesystem contents to be buffered in memory, which causes problems with large images. |
happened to me too with a large image, and the referenced commit solved it. any update why its not solved yet in v1.8.1? @imjasonh |
#2115 is the issue tracking the next release. I don't have any more information than what's in that issue. |
Does this issue still happen at the latest commit-tagged image? With and without caching enabled? |
Any news on this? Still happening on v1.9.0 |
If you add |
This is the
|
Disable cache compression to allow large images, like images depending on `tensorflow` or `torch`. For more information, see: GoogleContainerTools/kaniko#1669
I confirm I was having the same issue in Cloud Build and the |
Actual behavior
I am running a build on Cloud build. The build succeeds, but the caching snapshot at the end fails with the following messages:
Expected behavior
I would like the whole build to succeed - including caching.
To Reproduce
Steps to reproduce the behavior:
Additional Information
I cannot provide the Dockerfile, but it is based on
continuumio/miniconda3
and also installs tensorflow in a conda environment. I think it started failing after tensorflow was added to the list of dependencies.The text was updated successfully, but these errors were encountered: