Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the sync command #524

Merged
merged 2 commits into from
Dec 2, 2019
Merged

Introduce the sync command #524

merged 2 commits into from
Dec 2, 2019

Conversation

flavio
Copy link
Contributor

@flavio flavio commented Jul 13, 2018

The skopeo sync command can sync images between a SOURCE and a destination.

The purpose of this command is to assist with the mirroring of container images from different docker registries to a single docker registry.

Right now the following transport matrix is implemented:

  • docker:// -> docker://
  • docker:// -> dir:
  • dir: -> docker://

The dir: transport is supported to handle the use case of air-gapped environments.
In this context users can perform an initial sync on a trusted machine connected to the internet; that would be a docker:// -> dir: sync.
The target directory can be copied to a removable drive that can then be plugged into a node of the air-gapped environment. From there a dir: -> docker:// sync will import all the images into the registry serving the air-gapped environment.

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest.

The image namespace is not changed when doing a
dir: -> docker:// sync operation.

The alteration of the image namespace is used to nicely scope images coming from different registries (the Docker Hub, quay.io, gcr, other registries). That allows all of them to be hosted on the same registry without incurring in clashes and making their origin explicit.

TODO

I hope you like this feature and the direction it's going. Once we agree on its final design we will update this PR to extend the current test suites.

Future work

Currently sync will keep adding missing content from SOURCE to DESTINATION. It would be nice to add a --delete flag to remove from DESTINATION contents that are no longer available inside of SOURCE. That would be a bit like rsync's --delete` flag.

If wanted, that should be addressed with a separate PR.

Signed-off-by: Flavio Castelli [email protected]
Co-authored-by: Marco Vedovati [email protected]

@runcom
Copy link
Member

runcom commented Jul 13, 2018

@mtrmac PTAL as well

@vrothberg
Copy link
Member

@runcom @rhatdan @mtrmac friendly ping. I'd love to see this feature in an upcoming Skopeo release.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments. @mtrmac PTAL

type sourceCfg map[string]registrySyncCfg

func newSourceConfig(yamlFile string) (cfg sourceCfg, err error) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra empty line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still valid.

docs/skopeo.1.md Outdated

Copy all the images from _source_ (or _source-file_) to _destination_.

Useful to keep in sync a local docker registry mirror. It can be used to populate also registries running inside of air-gapped environments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:
Useful to keep in sync with a local docker registry mirror. It can also be used to populate registries running inside of air-gapped environments.

docs/skopeo.1.md Outdated

_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).

When _destination_ is a local directory one directory per 'image:tag' is going to be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:
"is going to be" to "will be"

docs/skopeo.1.md Outdated
images:
coreos/etcd:
- latest
```

# SEE ALSO
kpod-login(1), docker-login(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't part of this change, but this should be changed to :
"podman-login(1), docker-login(1)

@rhatdan
Copy link
Member

rhatdan commented Aug 16, 2018

I would like to see this a little less "docker' centric. Perhaps use some examples other then docker.io. registry.example.com or some such.

docs/skopeo.1.md Outdated

**skopeo sync** will copy all the tags of an image when _source_ uses the docker://' transport and no tag is specified.

_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR introduction says

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest

This inflexibly hard-codes policy, and seems entirely unnecessary: whatever script is used for the regular sync can just specify registry.local.lan/registry.example.com/busybox as the destination.

Also, this policy forces sync to invent an entirely new docker://hostname syntax, which is ambiguous with the image name syntax used everywhere else, and has a different meaning (the image docker://hostname == docker://docker.io/library/hostname:latest).

… and AFAICT it s not documented in this man page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scoping is necessary when running sync from a YAML file, as you may be pulling the same image name from different namespaces or domains.

The reason to have this kind of scoping is that the original image location can be easily extrapolated, and this information is useful when running a server that is configured to mirror multiple registries.

Maybe the requirement can be dropped when source references to a docker:// location (and a flag can be added to make the current behaviour optional)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some kind of scoping is necessary when running sync from a YAML file, as you may be pulling the same image name from different namespaces or domains.

My first instinct is to specify the destination in the YAML as well, then, not to hard-code the naming in software.

It’s not obvious to me why sync --source-yaml $src $dest would have the sources specified in a (presumably mostly fixed) config file but the destination in a (presumably mostly changing) destination.

If a user sets up a regular (e.g. weekly) sync process, won’t both the source and the destination the same for all runs (every week the same servers and same destinations)? (My best guess is that when the destination is on a removable medium, the physical device, and thus presumably the mount path, may change each time — but aren’t mounts often named by the volume label, which is always the same?)

@marcov marcov force-pushed the sync branch 4 times, most recently from 30faa1e to 703b644 Compare August 27, 2018 10:13
@rhatdan
Copy link
Member

rhatdan commented Sep 21, 2018

@mtrmac @runcom Can we merge this?

@mtrmac
Copy link
Contributor

mtrmac commented Sep 26, 2018

I’m afraid I still haven’t actually reviewed this. My fault entirely.

@caiobegotti
Copy link

Hi guys, I am currently writing a bunch of scripts to do exactly that for our Kubernetes cluster running on an air-gapped place, then I discovered this PR 👍 ...any chance of getting this reviewed or even merged soon?

@chilicat
Copy link

chilicat commented Oct 3, 2018

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image
registry.example.com/busybox:latest will be copied to
registry.local.lan/registry.example.com/busybox:latest.

I would prefer to not have the original registry name as namespace in the destination registry. The source registry might be a internal registry where the name changes over time or where we want to hide the actual name on the customer side.

Maybe that is something which could be configurable?

@flavio
Copy link
Contributor Author

flavio commented Oct 4, 2018

I would prefer to not have the original registry name as namespace in the destination registry. The
source registry might be a internal registry where the name changes over time or where we want to
hide the actual name on the customer side.

Maybe that is something which could be configurable?

We could add a --dest-prefix flag to allow something like that:

skopeo sync --dest-prefix acme-inc docker://wip-registry.lan/busybox docker://example.com 

That would copy all the busybox images from wip-registry.lan/busybox to example.com/acme-inc/busybox.

We should also cover this option inside of the source yaml file. Let's take this source.yaml as an example:

docker.io:
    images:
        busybox: []
wip-registry:
    dest-prefix: acme-inc
    images:
        coreos/etcd:
            - latest

Let's assume we run this command:

skopeo sync --source-yaml source.yaml docker://target.com

By this point, inside of the target.com registry, we would have the following images:

  • docker.io/busybox -> docker pull target.com/docker.io/busybox
  • acme-inc/coreos/etcd -> docker pull target.com/acme-inc/coreos/etcd

Would that work with you?

How should we proceed if everybody agrees with this design? Should we get this PR merged and then implement this new flag via a new PR (to limit the amount of code to be reviewed)?

@marcov
Copy link
Contributor

marcov commented Oct 4, 2018

Rebased and added a few integration tests for sync. Run with: make test-integration TESTFLAGS="-check.f Sync"

@flavio
Copy link
Contributor Author

flavio commented Nov 26, 2019

@vrothberg done, sorry about the hiccup

@vrothberg
Copy link
Member

Thanks, @flavio! No need to update the branch, we can rebase and merge via GitHub.

@vrothberg
Copy link
Member

@flavio, one last time, now it's vendor/modules.txt. Are you using go 1.13 for vendoring? In case not, there's a make vendor-in-container now that uses golang:1.13 for vendoring.

@flavio
Copy link
Contributor Author

flavio commented Nov 26, 2019

@vrothberg you're right. I was using go 1.12. I've followed your advice, hopefully everything is good now 🤞

@vrothberg
Copy link
Member

vrothberg commented Nov 26, 2019

@flavio not yet. I assume the encryption PR caused a conflict with the go.{mod,sum}. Sorry for that!

Remove the $HOME/.docker directory when tearing down a cluster,
so that subsequent cluster creations can be carried out successfully.

Signed-off-by: Marco Vedovati <[email protected]>
@flavio
Copy link
Contributor Author

flavio commented Nov 26, 2019

@vrothberg fixed again. I ❤️ go modules 😉

Let me know if you want me to squash some commits together to clean up the history

@vrothberg
Copy link
Member

@vrothberg fixed again. I heart go modules wink

It's a very bumpy ride with go but I hope things will calm down in 2020 :D

Let me know if you want me to squash some commits together to clean up the history

Good point, sorry for missing that. Yes, please squash them into one commit. One commit can avoid some noise during bisects.

@flavio
Copy link
Contributor Author

flavio commented Nov 26, 2019

@vrothberg I squashed all the commits into a single one, except for 544fd2e

I think that was done to make the test pass, but it doesn't seem strictly related with the sync command. I think it would be better to not squash it. What do you think?

@vrothberg
Copy link
Member

SGTM

@vrothberg
Copy link
Member

Test failures look legit. Feel free to ping me if you want me to dig deeper.

@flavio
Copy link
Contributor Author

flavio commented Nov 26, 2019

Don't worry, I'm looking into them

The skopeo sync command can sync images between a SOURCE and a
destination.

The purpose of this command is to assist with the mirroring of
container images from different docker registries to a single
docker registry.

Right now the following source/destination locations are implemented:

  * docker -> docker
  * docker-> dir
  * dir -> docker

The dir location is supported to handle the use case
of air-gapped environments.
In this context users can perform an initial sync on a trusted machine
connected to the internet; that would be a `docker` -> `dir` sync.
The target directory can be copied to a removable drive that can then be
plugged into a node of the air-gapped environment. From there a
`dir` -> `docker` sync will import all the images into the registry serving
the air-gapped environment.

Notes when specifying the `--scoped` option:

The image namespace is changed during the  `docker` to `docker` or `dir` copy.
The FQDN of the registry hosting the image will be added as new root namespace
of the image. For example, the image `registry.example.com/busybox:latest`
will be copied to
`registry.local.lan/registry.example.com/busybox:latest`.

The image namespace is not changed when doing a
`dir:` -> `docker` sync operation.

The alteration of the image namespace is used to nicely scope images
coming from different registries (the Docker Hub, quay.io, gcr,
other registries). That allows all of them to be hosted on the same
registry without incurring in clashes and making their origin explicit.

Signed-off-by: Flavio Castelli <[email protected]>
Co-authored-by: Marco Vedovati <[email protected]>
@flavio
Copy link
Contributor Author

flavio commented Nov 29, 2019

Updated to fix the broken integration tests. Everything should be green, at least it looks like that when I run the checks locally 🤞

@rhatdan
Copy link
Member

rhatdan commented Nov 29, 2019

@vrothberg Is this ready to go in?

@vrothberg vrothberg merged commit 9c402f3 into containers:master Dec 2, 2019
@vrothberg
Copy link
Member

Thanks to @flavio and @marcov for the long breath and the great work!

@rhatdan
Copy link
Member

rhatdan commented Dec 2, 2019

Thanks @flavio and @marcov for getting this in.

@marcov
Copy link
Contributor

marcov commented Dec 2, 2019

Happy to see this merged at last! 🎉

@flavio
Copy link
Contributor Author

flavio commented Dec 3, 2019

Thanks @vrothberg and @rhatdan for the assistance!

@saschagrunert
Copy link
Member

Do we want to release a new version including this change 😇 ?

@vrothberg
Copy link
Member

Do we want to release a new version including this change innocent ?

Maybe in January. We wanted the new changes to cook a bit longer in master until releasing it.

@vrothberg
Copy link
Member

There's a couple of other things (e.g., login/logout support) that would be good to get in as well.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.