Introduce the sync command #524

flavio · 2018-07-13T13:25:40Z

The skopeo sync command can sync images between a SOURCE and a destination.

The purpose of this command is to assist with the mirroring of container images from different docker registries to a single docker registry.

Right now the following transport matrix is implemented:

docker:// -> docker://
docker:// -> dir:
dir: -> docker://

The dir: transport is supported to handle the use case of air-gapped environments.
In this context users can perform an initial sync on a trusted machine connected to the internet; that would be a docker:// -> dir: sync.
The target directory can be copied to a removable drive that can then be plugged into a node of the air-gapped environment. From there a dir: -> docker:// sync will import all the images into the registry serving the air-gapped environment.

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest.

The image namespace is not changed when doing a
dir: -> docker:// sync operation.

The alteration of the image namespace is used to nicely scope images coming from different registries (the Docker Hub, quay.io, gcr, other registries). That allows all of them to be hosted on the same registry without incurring in clashes and making their origin explicit.

TODO

I hope you like this feature and the direction it's going. Once we agree on its final design we will update this PR to extend the current test suites.

Future work

Currently sync will keep adding missing content from SOURCE to DESTINATION. It would be nice to add a --delete flag to remove from DESTINATION contents that are no longer available inside of SOURCE. That would be a bit like rsync's --delete` flag.

If wanted, that should be addressed with a separate PR.

Signed-off-by: Flavio Castelli [email protected]
Co-authored-by: Marco Vedovati [email protected]

runcom · 2018-07-13T13:46:43Z

@mtrmac PTAL as well

vrothberg · 2018-08-08T12:48:45Z

@runcom @rhatdan @mtrmac friendly ping. I'd love to see this feature in an upcoming Skopeo release.

giuseppe

I've added some comments. @mtrmac PTAL

giuseppe · 2018-08-16T10:28:29Z

cmd/skopeo/sync.go

+type sourceCfg map[string]registrySyncCfg
+
+func newSourceConfig(yamlFile string) (cfg sourceCfg, err error) {
+


extra empty line

Still valid.

cmd/skopeo/sync.go

TomSweeneyRedHat · 2018-08-16T15:19:20Z

docs/skopeo.1.md

+
+Copy all the images from _source_ (or _source-file_) to _destination_.
+
+Useful to keep in sync a local docker registry mirror. It can be used to populate also registries running inside of air-gapped environments.


Suggest:
Useful to keep in sync with a local docker registry mirror. It can also be used to populate registries running inside of air-gapped environments.

docs/skopeo.1.md

TomSweeneyRedHat · 2018-08-16T15:20:41Z

docs/skopeo.1.md

+
+_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).
+
+When _destination_ is a local directory one directory per 'image:tag' is going to be created.


Suggest:
"is going to be" to "will be"

docs/skopeo.1.md

TomSweeneyRedHat · 2018-08-16T15:22:46Z

docs/skopeo.1.md

+    images:
+        coreos/etcd:
+            - latest
+```

 # SEE ALSO
 kpod-login(1), docker-login(1)


I know this isn't part of this change, but this should be changed to :
"podman-login(1), docker-login(1)

cmd/skopeo/sync.go

rhatdan · 2018-08-16T15:30:17Z

I would like to see this a little less "docker' centric. Perhaps use some examples other then docker.io. registry.example.com or some such.

mtrmac · 2018-08-16T15:45:54Z

docs/skopeo.1.md

+
+**skopeo sync** will copy all the tags of an image when _source_ uses the docker://' transport and no tag is specified.
+
+_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).


The PR introduction says

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest

This inflexibly hard-codes policy, and seems entirely unnecessary: whatever script is used for the regular sync can just specify registry.local.lan/registry.example.com/busybox as the destination.

Also, this policy forces sync to invent an entirely new docker://hostname syntax, which is ambiguous with the image name syntax used everywhere else, and has a different meaning (the image docker://hostname == docker://docker.io/library/hostname:latest).

… and AFAICT it s not documented in this man page.

Some kind of scoping is necessary when running sync from a YAML file, as you may be pulling the same image name from different namespaces or domains.

The reason to have this kind of scoping is that the original image location can be easily extrapolated, and this information is useful when running a server that is configured to mirror multiple registries.

Maybe the requirement can be dropped when source references to a docker:// location (and a flag can be added to make the current behaviour optional)

Some kind of scoping is necessary when running sync from a YAML file, as you may be pulling the same image name from different namespaces or domains.

My first instinct is to specify the destination in the YAML as well, then, not to hard-code the naming in software.

It’s not obvious to me why sync --source-yaml $src $dest would have the sources specified in a (presumably mostly fixed) config file but the destination in a (presumably mostly changing) destination.

If a user sets up a regular (e.g. weekly) sync process, won’t both the source and the destination the same for all runs (every week the same servers and same destinations)? (My best guess is that when the destination is on a removable medium, the physical device, and thus presumably the mount path, may change each time — but aren’t mounts often named by the volume label, which is always the same?)

cmd/skopeo/sync.go

rhatdan · 2018-09-21T12:42:43Z

@mtrmac @runcom Can we merge this?

mtrmac · 2018-09-26T18:59:08Z

I’m afraid I still haven’t actually reviewed this. My fault entirely.

caiobegotti · 2018-09-27T12:22:52Z

Hi guys, I am currently writing a bunch of scripts to do exactly that for our Kubernetes cluster running on an air-gapped place, then I discovered this PR 👍 ...any chance of getting this reviewed or even merged soon?

chilicat · 2018-10-03T18:27:43Z

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image
registry.example.com/busybox:latest will be copied to
registry.local.lan/registry.example.com/busybox:latest.

I would prefer to not have the original registry name as namespace in the destination registry. The source registry might be a internal registry where the name changes over time or where we want to hide the actual name on the customer side.

Maybe that is something which could be configurable?

flavio · 2018-10-04T08:13:18Z

I would prefer to not have the original registry name as namespace in the destination registry. The
source registry might be a internal registry where the name changes over time or where we want to
hide the actual name on the customer side.

Maybe that is something which could be configurable?

We could add a --dest-prefix flag to allow something like that:

skopeo sync --dest-prefix acme-inc docker://wip-registry.lan/busybox docker://example.com

That would copy all the busybox images from wip-registry.lan/busybox to example.com/acme-inc/busybox.

We should also cover this option inside of the source yaml file. Let's take this source.yaml as an example:

docker.io:
    images:
        busybox: []
wip-registry:
    dest-prefix: acme-inc
    images:
        coreos/etcd:
            - latest

Let's assume we run this command:

skopeo sync --source-yaml source.yaml docker://target.com

By this point, inside of the target.com registry, we would have the following images:

docker.io/busybox -> docker pull target.com/docker.io/busybox
acme-inc/coreos/etcd -> docker pull target.com/acme-inc/coreos/etcd

Would that work with you?

How should we proceed if everybody agrees with this design? Should we get this PR merged and then implement this new flag via a new PR (to limit the amount of code to be reviewed)?

marcov · 2018-10-04T09:00:08Z

Rebased and added a few integration tests for sync. Run with: make test-integration TESTFLAGS="-check.f Sync"

flavio · 2019-11-26T09:13:11Z

@vrothberg done, sorry about the hiccup

vrothberg · 2019-11-26T09:28:52Z

Thanks, @flavio! No need to update the branch, we can rebase and merge via GitHub.

vrothberg · 2019-11-26T09:37:18Z

@flavio, one last time, now it's vendor/modules.txt. Are you using go 1.13 for vendoring? In case not, there's a make vendor-in-container now that uses golang:1.13 for vendoring.

flavio · 2019-11-26T11:26:17Z

@vrothberg you're right. I was using go 1.12. I've followed your advice, hopefully everything is good now 🤞

vrothberg · 2019-11-26T11:49:09Z

@flavio not yet. I assume the encryption PR caused a conflict with the go.{mod,sum}. Sorry for that!

Remove the $HOME/.docker directory when tearing down a cluster, so that subsequent cluster creations can be carried out successfully. Signed-off-by: Marco Vedovati <[email protected]>

flavio · 2019-11-26T13:15:22Z

@vrothberg fixed again. I ❤️ go modules 😉

Let me know if you want me to squash some commits together to clean up the history

vrothberg · 2019-11-26T13:20:19Z

@vrothberg fixed again. I heart go modules wink

It's a very bumpy ride with go but I hope things will calm down in 2020 :D

Let me know if you want me to squash some commits together to clean up the history

Good point, sorry for missing that. Yes, please squash them into one commit. One commit can avoid some noise during bisects.

flavio · 2019-11-26T14:15:58Z

@vrothberg I squashed all the commits into a single one, except for 544fd2e

I think that was done to make the test pass, but it doesn't seem strictly related with the sync command. I think it would be better to not squash it. What do you think?

vrothberg · 2019-11-26T14:18:05Z

SGTM

vrothberg · 2019-11-26T14:38:00Z

Test failures look legit. Feel free to ping me if you want me to dig deeper.

flavio · 2019-11-26T16:49:20Z

Don't worry, I'm looking into them

The skopeo sync command can sync images between a SOURCE and a destination. The purpose of this command is to assist with the mirroring of container images from different docker registries to a single docker registry. Right now the following source/destination locations are implemented: * docker -> docker * docker-> dir * dir -> docker The dir location is supported to handle the use case of air-gapped environments. In this context users can perform an initial sync on a trusted machine connected to the internet; that would be a `docker` -> `dir` sync. The target directory can be copied to a removable drive that can then be plugged into a node of the air-gapped environment. From there a `dir` -> `docker` sync will import all the images into the registry serving the air-gapped environment. Notes when specifying the `--scoped` option: The image namespace is changed during the `docker` to `docker` or `dir` copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image `registry.example.com/busybox:latest` will be copied to `registry.local.lan/registry.example.com/busybox:latest`. The image namespace is not changed when doing a `dir:` -> `docker` sync operation. The alteration of the image namespace is used to nicely scope images coming from different registries (the Docker Hub, quay.io, gcr, other registries). That allows all of them to be hosted on the same registry without incurring in clashes and making their origin explicit. Signed-off-by: Flavio Castelli <[email protected]> Co-authored-by: Marco Vedovati <[email protected]>

flavio · 2019-11-29T18:37:49Z

Updated to fix the broken integration tests. Everything should be green, at least it looks like that when I run the checks locally 🤞

rhatdan · 2019-11-29T19:57:59Z

@vrothberg Is this ready to go in?

vrothberg · 2019-12-02T12:43:32Z

Thanks to @flavio and @marcov for the long breath and the great work!

rhatdan · 2019-12-02T14:31:39Z

Thanks @flavio and @marcov for getting this in.

marcov · 2019-12-02T14:44:20Z

Happy to see this merged at last! 🎉

flavio · 2019-12-03T07:57:55Z

Thanks @vrothberg and @rhatdan for the assistance!

saschagrunert · 2019-12-19T09:38:33Z

Do we want to release a new version including this change 😇 ?

vrothberg · 2019-12-19T10:23:55Z

Do we want to release a new version including this change innocent ?

Maybe in January. We wanted the new changes to cook a bit longer in master until releasing it.

vrothberg · 2019-12-19T10:24:29Z

There's a couple of other things (e.g., login/logout support) that would be good to get in as well.

flavio force-pushed the sync branch from 3ae6c66 to 65bf6bc Compare July 13, 2018 16:11

marcov force-pushed the sync branch from 8f4777c to 4133e72 Compare July 20, 2018 18:40

giuseppe reviewed Aug 16, 2018

View reviewed changes