Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -prune option to dockerregistry #14585

Merged
merged 3 commits into from
Aug 2, 2017

Conversation

dmage
Copy link
Contributor

@dmage dmage commented Jun 12, 2017

oadm policy add-cluster-role-to-user system:image-pruner system:serviceaccount:default:registry
oc -n default rsh dc/docker-registry /usr/bin/dockerregistry -prune=check
oc -n default rsh dc/docker-registry /usr/bin/dockerregistry -prune=delete

Resolves bz#1467340

@@ -45,8 +47,41 @@ import (
registryconfig "github.com/openshift/origin/pkg/dockerregistry/server/configuration"
)

var prune = flag.Bool("prune", false, "Prune blobs from the storage. DANGEROUS! Do it only in read only mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check if registry is not in read-only mode and error out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but only if dockerregistry -prune is executed in dc/docker-registry's container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we could get blob's modification time, it would be a relatively safe operation even without switching to read-only mode. And it is relatively unsafe operation on an eventually consistent storage (like S3) even in read-only mode.


// ExecutePruner runs the pruner.
func ExecutePruner(configFile io.Reader) {
log.Infof("prune version=%s", version.Version)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distribution version doesn't say much. Can this output OpenShift release/git version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time="2017-06-14T10:39:56.705688255Z" level=info msg="start prune" distribution_version="v2.4.1+unknown" kubernetes_version=v1.6.1+5115d708d7 openshift_version=v3.6.0-alpha.1+bb45ac4-1014-dirty

Is it ok?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func Prune(ctx context.Context, storageDriver driver.StorageDriver, registry distribution.Namespace, registryClient RegistryClient) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some godoc would be nice, what it does, what is missing and what is required.

logger.Printf("Deleting manifest: %s@%s", repoName, dgst)
err := manifestService.Delete(ctx, dgst)
if err != nil {
return fmt.Errorf("delete manifest %s: %s", dgst, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these errors begin with something like failed to …? I'd like to have warnings and errors easily distinguishable from Deleting ….

return nil
}

err := vacuum.RemoveBlob(string(dgst))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add logger.Printf("Deleting blob: %s", dgst)?

Copy link
Contributor Author

@dmage dmage Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already logged from the inside of RemoveBlob.

INFO[0000] Deleting blob: /docker/registry/v2/blobs/sha256/f6/f60ecfcf302b2378acfc896904b637497acb7863ded1c8c867c92e97593ae412  go.version=go1.8.3 instance.id=68dc9918-abae-47cc-b118-8f0f161c5cca

Should I log it one more time, but with the digest only?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. It's OK then.

@miminar
Copy link

miminar commented Jun 14, 2017

Would it be possible to treat

time=\"2017-06-14T11:32:26.388294978Z\" level=fatal msg=\"filesystem: Path not found: /docker/registry/v2/repositories\" go.version=go1.7.5 instance.id=e0e8beb
2-86ac-43ea-88f3-3fae75800cd0",

as just nothing to delete and exit with 0?

@miminar
Copy link

miminar commented Jun 14, 2017

Actually, the output looks like junk:

time="2017-06-14T12:21:29.712404174Z" level=debug msg="filesystem.List(\"/docker/registry/v2/blobs/sha256/f4/f42a62b56ab6611ac37e7a73b1d55d6684fc66516b1eea26769f44a74ba05cc8\")" go.version=go
1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91 trace.duration=37.637µs trace.file="/data/src/github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/
base/base.go" trace.func="github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).List" trace.id=f8694190-7524-4075-aa09-587e5fb03bcb trace.line
=150
time="2017-06-14T12:21:29.712456967Z" level=debug msg="filesystem.Stat(\"/docker/registry/v2/blobs/sha256/f4/f42a62b56ab6611ac37e7a73b1d55d6684fc66516b1eea26769f44a74ba05cc8/data\")" go.versi
on=go1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91 trace.duration=27.087µs trace.file="/data/src/github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/dr
iver/base/base.go" trace.func="github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=9d84de13-8d98-42a6-aa60-42afb5acd6a5 trace
.line=137
time="2017-06-14T12:21:29.712490538Z" level=debug msg="Keep blob sha256:f42a62b56ab6611ac37e7a73b1d55d6684fc66516b1eea26769f44a74ba05cc8 (blob belongs to image 172.30.123.226:5000/extended-te
st-prune-images-dvkzq-w8njz/a@sha256:f42a62b56ab6611ac37e7a73b1d55d6684fc66516b1eea26769f44a74ba05cc8)" go.version=go1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91
time="2017-06-14T12:21:29.712526544Z" level=debug msg="filesystem.Stat(\"/docker/registry/v2/blobs/sha256/fc\")" go.version=go1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91 trace.dura
tion=17.871µs trace.file="/data/src/github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/openshift/origin/vendor/gith
ub.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=3ee3f10f-c4c3-4d92-9514-9910b32b8940 trace.line=137
time="2017-06-14T12:21:29.712586792Z" level=debug msg="filesystem.List(\"/docker/registry/v2/blobs/sha256/fc\")" go.version=go1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91 trace.dura
tion=38.401µs trace.file="/data/src/github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/openshift/origin/vendor/gith
ub.com/docker/distribution/registry/storage/driver/base.(*Base).List" trace.id=3eb002d1-1e81-408d-9c53-0a3801b7942e trace.line=150
time="2017-06-14T12:21:29.712643789Z" level=debug msg="filesystem.Stat(\"/docker/registry/v2/blobs/sha256/fc/fc8314fda8b17e980e73f810c05d96359967c0d8202007dfc83199205f43cff8\")" go.version=go
1.7.5 instance.id=b208766c-be05-4f8a-959c-4157372d3f91 trace.duration=25.672µs trace.file="/data/src/github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/
base/base.go" trace.func="github.com/openshift/origin/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=bcda900c-6fa4-42ff-9b09-d3acbe6c15c4 trace.line
=137

By default, some human readable output should be produced. Could you ignore the loglevel from the config file up to WARNING and print the Deleting … statements to the stdout?

@dmage
Copy link
Contributor Author

dmage commented Jun 14, 2017

@miminar I can ignore loglevel from the config file, but how then log level should be configured?

@dmage dmage force-pushed the prune branch 3 times, most recently from e6738bd to c930106 Compare June 14, 2017 16:45
return nil
})
if e, ok := err.(driver.PathNotFoundError); ok {
logger.Warnf("No repositories are found: %s", e)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/are found/found/

@miminar
Copy link

miminar commented Jun 14, 2017

I still think we should print "Keeping" and "Deleting" messages to stdout without any other garbage. All the context attributes like time="2017-06-14T12:21:29.712490538Z" level=debug go.version=go1.7.5 polute the output and make it hard to read.

Could there be a different logger used to output these messages?

for _, repoName := range reposToDelete {
err = vacuum.RemoveRepository(repoName)
if err != nil {
logger.Fatal("Failed to remove the repository %s: %v", repoName, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"F/"f/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases we should start a log message with a lowercase letter?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other Fatal statements here print either lowercase messages or they print errors that are lower-cased.

@dmage
Copy link
Contributor Author

dmage commented Jun 15, 2017

Could there be a different logger used to output these messages?

No, "Deleting repo" and "Deleting blob" come from the distribution code. And distribution can use only logrus as a logger. If we make another logrus instance that writes to stdout and ignores fields, we might end up with errors on stdout.

@miminar
Copy link

miminar commented Jun 15, 2017

@mfojtik This looks good to me except for:

  • a lack of --dry-run similar to oadm prune images
    • the argument against is that it's hard to verify the correctness of what going to be deleted
    • the argument in favor of the dry run is that admin wants to see how many blobs and which repositories are going to be deleted before the they really are
  • human unfriendly output like above Add -prune option to dockerregistry #14585 (comment) by default

If you are fine with these, you are free to merge.

@miminar
Copy link

miminar commented Jun 16, 2017

Since there are two out of three people against my pleas, let's call it a democracy and merge it. @legionus

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@dmage
Copy link
Contributor Author

dmage commented Jun 16, 2017

So we waited for 8 hours just to get the "Branch is Closed for Non-Bugs" error?!

@dmage
Copy link
Contributor Author

dmage commented Jun 17, 2017

56 failures
OMG

@legionus, the branch is closed to merge, you need to say [merge][severity: blocker] or something like this.

@legionus
Copy link
Contributor

@dmage Yep. My mistake. Since this PR is not proposed for new release, I think it would be better if @mfojtik merge it. I really don't know can we merge it or not.

@pweil-
Copy link
Contributor

pweil- commented Jun 19, 2017

this needs to wait until 3.6 is closed since hard prune is scheduled for 3.7

@pweil-
Copy link
Contributor

pweil- commented Jun 19, 2017

@miminar @dmage @legionus @mfojtik

I am in favor of a dry-run flag on this command that shows what will be deleted. It provides administrators the ability to verify that the orphan prune will actually solve their issue. In the use case of low space alerts the administrator will only be able to guess that this will help their problem without really knowing it will. There is a chance that this command will do nothing at all and they really need to add disk space.

It is also in line with our other commands.

@sreber84
Copy link

I's also like to see a --dry-run option. As @pweil- said, it will the user give the ability to check what is going to happen and if really something can be deleted or not. If not, they need to think about adding storage anyways. But with the --dry-run you'd also be able to regularly check and see if you see growth and then run hard pruning once you reach a certain number or orphans.
Further, it would also add the ability to check what is going to happen, before actually doing it. This is key, as the registry is a key part. In case we delete too many things in there we can break a lot of things and with a --dry-run option available we have a possibility to avoid that from happening.

@mfojtik
Copy link
Contributor

mfojtik commented Jun 19, 2017

@pweil- you aware that running this with dry-run will give you probably list of 1000+ blobs that will be deleted without letting you know which image that blob belonging to?

So basically the dry-run gives you pretty use-less list of blobs to remove. Since the prune will have to run in the docker-registry container, we will have to capture the list and send it to user (streaming a huge list of blobs).

Said that this is only for limited number of customers (normal users should be fine with regular prune), I would punt on dry-run here. We can do it as follow up, but I would rather unblock the customer first.

Saying that, we have to be sure what blobs we are removing to not break existing images, I guess @dmage have that covered in a test or something :-)

@pweil-
Copy link
Contributor

pweil- commented Jun 19, 2017

@pweil- you aware that running this with dry-run will give you probably list of 1000+ blobs that will be deleted without letting you know which image that blob belonging to?

Yes and that is not useful 😄

Summary of discussion: If we're going to recommend to folks that they only use this command when they absolutely have to but cannot tell them when they have to then we're not providing the information we need to. Especially given the fact that we're going to require that they shut down uploads cluster-wide.

@dmage is going to see if we can provide the amount of space that this command will free up if it is run. An admin can see that they're running out of space and oc rsh to run this command. That will tell them that, yes, this command will help or no, this command is not going to fix their problem.

@dmage
Copy link
Contributor Author

dmage commented Jun 19, 2017

$ oc -n default rsh dc/docker-registry /usr/bin/dockerregistry -prune=check
Blobs deleted: 3
Disk space freed: 2.225 MiB
Use -prune=delete to actually delete the data.
$ oc -n default rsh dc/docker-registry /usr/bin/dockerregistry -prune=delete
Blobs deleted: 3
Disk space freed: 2.225 MiB

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2017
@0xmichalis
Copy link
Contributor

/test all

@miminar
Copy link

miminar commented Jul 31, 2017

@dmage

++ Building go targets for linux/amd64: images/pod cmd/dockerregistry cmd/gitserver vendor/k8s.io/kubernetes/cmd/hyperkube pkg/sdn/plugin/sdn-cni-plugin vendor/github.com/containernetworking/cni/plugins/ipam/host-local vendor/github.com/containernetworking/cni/plugins/main/loopback examples/hello-openshift
pkg/cmd/dockerregistry/dockerregistry.go:16:2: cannot find package "github.com/Sirupsen/logrus/formatters/logstash" in any of:
	/openshifttmp/openshift/build-rpm-release/tito/rpmbuild-originLwPcvH/BUILD/origin-3.6.0/_output/local/go/src/github.com/openshift/origin/vendor/github.com/Sirupsen/logrus/formatters/logstash (vendor tree)
	/usr/local/go/src/github.com/Sirupsen/logrus/formatters/logstash (from $GOROOT)
	/openshifttmp/openshift/build-rpm-release/tito/rpmbuild-originLwPcvH/BUILD/origin-3.6.0/_output/local/go/src/github.com/Sirupsen/logrus/formatters/logstash (from $GOPATH)
pkg/dockerregistry/server/prune/prune.go:18:2: cannot find package "github.com/openshift/origin/pkg/image/api" in any of:
	/openshifttmp/openshift/build-rpm-release/tito/rpmbuild-originLwPcvH/BUILD/origin-3.6.0/_output/local/go/src/github.com/openshift/origin/vendor/github.com/openshift/origin/pkg/image/api (vendor tree)
	/usr/local/go/src/github.com/openshift/origin/pkg/image/api (from $GOROOT)
	/openshifttmp/openshift/build-rpm-release/tito/rpmbuild-originLwPcvH/BUILD/origin-3.6.0/_output/local/go/src/github.com/openshift/origin/pkg/image/api (from $GOPATH)

Either you can wait for #15423 or you can include it in your PR.

@miminar
Copy link

miminar commented Jul 31, 2017

Anyway, this needs a rebase. Please, include #14509 when you are at it.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2017
Oleg Bulatov and others added 2 commits August 1, 2017 14:48
@openshift-merge-robot openshift-merge-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 1, 2017
@miminar
Copy link

miminar commented Aug 1, 2017

/approve

Let's wait for green tests. I'd like to run [testextended][extended:core(ImagePrun|registry)] as well but I've been told it's not possible ATM 😢.

@miminar
Copy link

miminar commented Aug 1, 2017

Opened a new issue: openshift-eng/aos-cd-jobs#483

@smarterclayton
Copy link
Contributor

Good isolation here - I really like how cleanly this is encapsulated in the image and the lifecycle.

@dmage
Copy link
Contributor Author

dmage commented Aug 2, 2017

/test end_to_end

@miminar
Copy link

miminar commented Aug 2, 2017

I'm not an assignee, but let's see if it works:
/lgtm

Update: Looks like it does ... and suddenly, I've become an assignee.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dmage, mfojtik, miminar

Associated issue: 1467340

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue

@openshift-merge-robot openshift-merge-robot merged commit 8ec3b6f into openshift:master Aug 2, 2017
openshift-merge-robot added a commit that referenced this pull request Sep 8, 2017
…anes

Automatic merge from submit-queue

[3.6][Backport] Prune orphaned blobs on registry storage

Resolves [bz#1479340](https://bugzilla.redhat.com/show_bug.cgi?id=1479340)
Backports #14585
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.