Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune prev images on build/run/{dev iteration start} #4792

Merged

Conversation

ilya-zuyev
Copy link
Contributor

@ilya-zuyev ilya-zuyev commented Sep 16, 2020

Prune local docker images at beginning of build/dev loop/run actions.

This PR removes previous versions of docker images from the local registry, when local docker builder is used.
Fixes #4693

The same flag rules are applied as for 'post' pruning. To activate the pruning the--cache-artifacts=false command line flag must be explicitly passed (its default value is true) and --no-prune must be false (default)

When the pruning is activated, and the local docker builder is used, we query all image versions with the artifact's image ref from the local registry, and try to remove all but the latest before we build. We keep the most recent version as If the build is going to fail, we need to keep the last runnable artifact. The pruning can be pretty slow, so it's started it in a separate go-routine

When the build is complete, we repeat the pruning (we may have two images in the registry, as we might skip on in pre-pruning, and need to keep only one). In run and dev modes we run it asynchronously again to avoid delays. In build mode this pruning is the last step of the pipeline and we use sync run to wait for pruning to complete before we exit.

This PR adds dependency to github.com/dustin/go-humanize to be able to show user numbers is the human readable format. (3M bytes instead of 3145728 bytes)

@codecov
Copy link

codecov bot commented Sep 16, 2020

Codecov Report

Merging #4792 into master will increase coverage by 0.11%.
The diff coverage is 71.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4792      +/-   ##
==========================================
+ Coverage   71.73%   71.85%   +0.11%     
==========================================
  Files         353      354       +1     
  Lines       12108    12214     +106     
==========================================
+ Hits         8686     8776      +90     
- Misses       2776     2786      +10     
- Partials      646      652       +6     
Impacted Files Coverage Δ
pkg/skaffold/build/local/types.go 76.31% <12.50%> (-14.01%) ⬇️
pkg/skaffold/build/local/local.go 60.41% <66.66%> (+0.41%) ⬆️
pkg/skaffold/build/local/prune.go 76.25% <76.25%> (ø)
pkg/skaffold/docker/image.go 79.34% <82.35%> (+7.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4dce04...611eba4. Read the comment docs.

Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more details to PR description mentioning why certain choices were made.

pkg/skaffold/build/local/prune.go Show resolved Hide resolved
pkg/skaffold/build/local/prune.go Outdated Show resolved Hide resolved
pkg/skaffold/build/local/prune.go Show resolved Hide resolved
pkg/skaffold/build/local/local.go Outdated Show resolved Hide resolved
@tejal29 tejal29 self-assigned this Sep 21, 2020
@ilya-zuyev ilya-zuyev marked this pull request as ready for review September 21, 2020 23:05
@tejal29
Copy link
Contributor

tejal29 commented Sep 24, 2020

@ilya-zuyev I tried your PR locally,

../../out/skaffold dev -d gcr.io/tejal-test --cache-artifacts=false 
Listing files to watch...
 - leeroy-web
 - leeroy-app
Generating tags...
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
Found [minikube] context, using local docker daemon.
pruner started
Building [leeroy-web]...
Sending build context to Docker daemon  3.072kB
Step 1/7 : FROM golang:1.12.9-alpine3.10 as builder
 ---> e0d646523991
Step 2/7 : COPY web.go .
 ---> Using cache
 ---> be01011bf3ce
Step 3/7 : RUN go build -o /web .
 ---> Using cache
 ---> 3df073bc07b2
Step 4/7 : FROM alpine:3.10
 ---> be4e4bea2c2e
Step 5/7 : ENV GOTRACEBACK=single
 ---> Using cache
 ---> 086338dabf14
Step 6/7 : CMD ["./web"]
 ---> Using cache
 ---> 78628644cb2e
Step 7/7 : COPY --from=builder /web .
 ---> Using cache
 ---> 39830b2cb645
Successfully built 39830b2cb645
Successfully tagged gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
Building [leeroy-app]...

Sending build context to Docker daemon  3.072kB
Step 1/7 : FROM golang:1.12.9-alpine3.10 as builder
 ---> e0d646523991
Step 2/7 : COPY app.go .
 ---> Using cache
 ---> f00288cffe47
Step 3/7 : RUN go build -o /app .
 ---> Using cache
 ---> c47916ce0192
Step 4/7 : FROM alpine:3.10
 ---> be4e4bea2c2e
Step 5/7 : ENV GOTRACEBACK=single
 ---> Using cache
 ---> 086338dabf14
Step 6/7 : CMD ["./app"]
 ---> Using cache
 ---> e6aed69fc5d5
Step 7/7 : COPY --from=builder /app .
 ---> Using cache
 ---> afe0b21289c8
Successfully built afe0b21289c8
Successfully tagged gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
Tags used in deployment:
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
Starting deploy...
 - deployment.apps/leeroy-web created
 - service/leeroy-app created
 - deployment.apps/leeroy-app created
Waiting for deployments to stabilize...
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]
 - deployment/leeroy-app is ready.
Deployments stabilized in 2.546520353s
Press Ctrl+C to exit
Watching for changes...
[leeroy-app] 2020/09/24 05:27:37 leeroy app server ready
[leeroy-web] 2020/09/24 05:27:37 leeroy web server ready
^CCleaning up...
 - deployment.apps "leeroy-web" deleted
 - service "leeroy-app" deleted
 - deployment.apps "leeroy-app" deleted
Pruning images...
untagged image gcr.io/tejal-test/leeroy-web:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-6-gab9135046
deleted image sha256:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
deleted image sha256:78628644cb2e2e0e153852980c60220eaf3727ac95b483ed61188752482b551a
untagged image gcr.io/tejal-test/leeroy-app:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-6-gab9135046
deleted image sha256:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
deleted image sha256:e6aed69fc5d511baa3686251065e17ef58d93c3aafa89c10d9bad5f9c6416f4d

I see these messages at the end. This is not something you added but since we are now pruning aggressively, for a dev loop with many iterations, this number could be really high.
They are adding to the output. Can we please avoid these?

Looks like there is also case where 2 threads are trying to clean up same image and one gets done faster.
e.g. I ran skaffold dev with 3 dev iterations meaning i did 3 changes. A warning message for following sha is shown sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1 which was already deleted.

Pruning images...
untagged image gcr.io/tejal-test/leeroy-web:5628408061c83ba51a63cb275af27df4c2218cbbe9a67736bd79b23df3df015c
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
deleted image sha256:5628408061c83ba51a63cb275af27df4c2218cbbe9a67736bd79b23df3df015c
deleted image sha256:cf6f6df4bf235623a37e8172fb08dc520471c0a0fbac1130c5475af6cb662d75
deleted image sha256:10f418057b3d65881a14307385fd42ebc439e834404e52ea7889b806a0a3217e
untagged image gcr.io/tejal-test/leeroy-app:9cb23b1972e6ac18780e5c5675ca64de57f1e1919a4e1353e11e73c8e0883776
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
deleted image sha256:9cb23b1972e6ac18780e5c5675ca64de57f1e1919a4e1353e11e73c8e0883776
deleted image sha256:17bef137cc327f9080e98e7aa0e961af76f0fcca28c273c523997ec7df27a0a5
untagged image gcr.io/tejal-test/leeroy-app:1c82b4696f7597cc2c550b81ef22e49bf4269a44110289058c93bf9455d56d09
deleted image sha256:1c82b4696f7597cc2c550b81ef22e49bf4269a44110289058c93bf9455d56d09
deleted image sha256:17dbf9658929dc54ecd582152c5596c366483e9ac56acfa500efc145849c4c8d
untagged image gcr.io/tejal-test/leeroy-app:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec-dirty
deleted image sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1
deleted image sha256:d2ace1e5925c67f3ad74d74ee9c013d81a020295ca833737d79a95a047fb04ca
deleted image sha256:5c1302439f7cc1317471a4b9dfb8fb6f395dfc2ac6921645f1f13929d82b360e
WARN[0072] builder cleanup: pruning images: Error: No such image: sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1 

Ideally it would be great to only print out errors.

@tejal29
Copy link
Contributor

tejal29 commented Sep 28, 2020

@ilya-zuyev want to get this in before it becomes stale.

Post-pruner tries to remove all images built during all iterations of dev loop. Some of them can be already pruned by pre-pruner.

Also it's possible to have duplicated image ids in Builder.builtImages array, fix this case.
@ilya-zuyev
Copy link
Contributor Author

ilya-zuyev commented Sep 28, 2020

@ilya-zuyev I tried your PR locally,

../../out/skaffold dev -d gcr.io/tejal-test --cache-artifacts=false 
Listing files to watch...
 - leeroy-web
 - leeroy-app
Generating tags...
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
Found [minikube] context, using local docker daemon.
pruner started
Building [leeroy-web]...
Sending build context to Docker daemon  3.072kB
Step 1/7 : FROM golang:1.12.9-alpine3.10 as builder
 ---> e0d646523991
Step 2/7 : COPY web.go .
 ---> Using cache
 ---> be01011bf3ce
Step 3/7 : RUN go build -o /web .
 ---> Using cache
 ---> 3df073bc07b2
Step 4/7 : FROM alpine:3.10
 ---> be4e4bea2c2e
Step 5/7 : ENV GOTRACEBACK=single
 ---> Using cache
 ---> 086338dabf14
Step 6/7 : CMD ["./web"]
 ---> Using cache
 ---> 78628644cb2e
Step 7/7 : COPY --from=builder /web .
 ---> Using cache
 ---> 39830b2cb645
Successfully built 39830b2cb645
Successfully tagged gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
Building [leeroy-app]...

Sending build context to Docker daemon  3.072kB
Step 1/7 : FROM golang:1.12.9-alpine3.10 as builder
 ---> e0d646523991
Step 2/7 : COPY app.go .
 ---> Using cache
 ---> f00288cffe47
Step 3/7 : RUN go build -o /app .
 ---> Using cache
 ---> c47916ce0192
Step 4/7 : FROM alpine:3.10
 ---> be4e4bea2c2e
Step 5/7 : ENV GOTRACEBACK=single
 ---> Using cache
 ---> 086338dabf14
Step 6/7 : CMD ["./app"]
 ---> Using cache
 ---> e6aed69fc5d5
Step 7/7 : COPY --from=builder /app .
 ---> Using cache
 ---> afe0b21289c8
Successfully built afe0b21289c8
Successfully tagged gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
Tags used in deployment:
 - leeroy-web -> gcr.io/tejal-test/leeroy-web:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
 - leeroy-app -> gcr.io/tejal-test/leeroy-app:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
Starting deploy...
 - deployment.apps/leeroy-web created
 - service/leeroy-app created
 - deployment.apps/leeroy-app created
Waiting for deployments to stabilize...
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]
 - deployment/leeroy-app is ready.
Deployments stabilized in 2.546520353s
Press Ctrl+C to exit
Watching for changes...
[leeroy-app] 2020/09/24 05:27:37 leeroy app server ready
[leeroy-web] 2020/09/24 05:27:37 leeroy web server ready
^CCleaning up...
 - deployment.apps "leeroy-web" deleted
 - service "leeroy-app" deleted
 - deployment.apps "leeroy-app" deleted
Pruning images...
untagged image gcr.io/tejal-test/leeroy-web:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-6-gab9135046
deleted image sha256:39830b2cb645921195ed51c1981af5b2eb289f9d05f13a69c7f22d8e3d05cc13
deleted image sha256:78628644cb2e2e0e153852980c60220eaf3727ac95b483ed61188752482b551a
untagged image gcr.io/tejal-test/leeroy-app:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-6-gab9135046
deleted image sha256:afe0b21289c83b7edbbd140c88c6cd99f9e5621241ea34ebb35a3a06fa18af95
deleted image sha256:e6aed69fc5d511baa3686251065e17ef58d93c3aafa89c10d9bad5f9c6416f4d

I see these messages at the end. This is not something you added but since we are now pruning aggressively, for a dev loop with many iterations, this number could be really high.
They are adding to the output. Can we please avoid these?

Done.

Looks like there is also case where 2 threads are trying to clean up same image and one gets done faster.
e.g. I ran skaffold dev with 3 dev iterations meaning i did 3 changes. A warning message for following sha is shown sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1 which was already deleted.

Pruning images...
untagged image gcr.io/tejal-test/leeroy-web:5628408061c83ba51a63cb275af27df4c2218cbbe9a67736bd79b23df3df015c
untagged image gcr.io/tejal-test/leeroy-web:v1.14.0-42-g08b1127ec
deleted image sha256:5628408061c83ba51a63cb275af27df4c2218cbbe9a67736bd79b23df3df015c
deleted image sha256:cf6f6df4bf235623a37e8172fb08dc520471c0a0fbac1130c5475af6cb662d75
deleted image sha256:10f418057b3d65881a14307385fd42ebc439e834404e52ea7889b806a0a3217e
untagged image gcr.io/tejal-test/leeroy-app:9cb23b1972e6ac18780e5c5675ca64de57f1e1919a4e1353e11e73c8e0883776
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec
deleted image sha256:9cb23b1972e6ac18780e5c5675ca64de57f1e1919a4e1353e11e73c8e0883776
deleted image sha256:17bef137cc327f9080e98e7aa0e961af76f0fcca28c273c523997ec7df27a0a5
untagged image gcr.io/tejal-test/leeroy-app:1c82b4696f7597cc2c550b81ef22e49bf4269a44110289058c93bf9455d56d09
deleted image sha256:1c82b4696f7597cc2c550b81ef22e49bf4269a44110289058c93bf9455d56d09
deleted image sha256:17dbf9658929dc54ecd582152c5596c366483e9ac56acfa500efc145849c4c8d
untagged image gcr.io/tejal-test/leeroy-app:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1
untagged image gcr.io/tejal-test/leeroy-app:v1.14.0-42-g08b1127ec-dirty
deleted image sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1
deleted image sha256:d2ace1e5925c67f3ad74d74ee9c013d81a020295ca833737d79a95a047fb04ca
deleted image sha256:5c1302439f7cc1317471a4b9dfb8fb6f395dfc2ac6921645f1f13929d82b360e
WARN[0072] builder cleanup: pruning images: Error: No such image: sha256:273726eba366d31403364f00d4e930164169ac627ebfae89ce72a365e3b3fac1 

Ideally it would be great to only print out errors.

Actually, I had an issue there. When dev loop is complete, skaffold tries to prune all images built in all iterations of dev loop. Some of them can be already removed by the pre-pruner and this caused a warning. Fixed.

@tejal29
Copy link
Contributor

tejal29 commented Sep 29, 2020

@ilya-zuyev Can we move this

fmt.Fprintf(out, "untagged image %s\n", r.Untagged)
and
fmt.Fprintf(out, "deleted image %s\n", r.Deleted)

to debug?

@ilya-zuyev
Copy link
Contributor Author

ilya-zuyev commented Sep 29, 2020

@ilya-zuyev Can we move this

fmt.Fprintf(out, "untagged image %s\n", r.Untagged)

and

fmt.Fprintf(out, "deleted image %s\n", r.Deleted)

to debug?

@tejal29 Sure, but those lines were not changed in this PR and this output is made only in the old pruning scenarios (the new pruner uses io.Discard for out). Is it safe to change it? Just want to double check that it's not going to break any customer UX expectations.

@tejal29
Copy link
Contributor

tejal29 commented Sep 29, 2020

@tejal29 Sure, but those lines were not changed in this PR and this output is made only in the old pruning scenarios (the new pruner uses io.Discard for out).

I can see these lines printed on your branch. Is there a way to switch to new use pruner?

Is it safe to change it? Just want to double check that it's not going to break any customer UX expectations.

Do agree the behavior is not from newly added code like i mentioned before

I see these messages at the end. This is not something you added but since we are now pruning aggressively, for a dev loop >with many iterations, this number could be really high.
They are adding to the output. Can we please avoid these?

@ilya-zuyev
Copy link
Contributor Author

@tejal29 Sure, but those lines were not changed in this PR and this output is made only in the old pruning scenarios (the new pruner uses io.Discard for out).

I can see these lines printed on your branch. Is there a way to switch to new use pruner?

both pruners are controlled by --cache-artifacts flag. Currently there's no way enable just one of them.

Is it safe to change it? Just want to double check that it's not going to break any customer UX expectations.

Do agree the behavior is not from newly added code like i mentioned before

I see these messages at the end. This is not something you added but since we are now pruning aggressively, for a dev loop >with many iterations, this number could be really high.
They are adding to the output. Can we please avoid these?

Yeah, I agree. This PR reuses the existed Prune() function which prints a lot to the stddout. The initial PR used the same logging policy for the pre-pruner. After your comment, I fixed that and now logging is exactly as it was before this PR - we output noting during the dev cycle and do print when we prune at exit. Just want to confirm, that we want to hide the original logs as well. Not sure if some tools uses skaffold output and need those lines for diagnostic of some sort.

@tejal29
Copy link
Contributor

tejal29 commented Sep 29, 2020

got it. We never really thought about the output before :).
Not sure if someone would be relying on these log lines. In case they do, we can ask them to rely on the event API which sends events when "pruning" starts.

Please feel free to move these to debug.

@ilya-zuyev
Copy link
Contributor Author

got it. We never really thought about the output before :).
Not sure if someone would be relying on these log lines. In case they do, we can ask them to rely on the event API which sends events when "pruning" starts.

Please feel free to move these to debug.

Got it. Thanks! Will update

@tejal29
Copy link
Contributor

tejal29 commented Sep 30, 2020

Thanks @ilya-zuyev This looks great!!

Watching for changes...
^CCleaning up...
 - deployment.apps "leeroy-web" deleted
 - service "leeroy-app" deleted
 - deployment.apps "leeroy-app" deleted
Pruning images...
➜  microservices git:(gh_4693_prune_on_dev_start) ✗ 

Copy link
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prune old images at beginning of dev loop
5 participants