Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disabling automatic "rslave" fallback #161

Open
AkihiroSuda opened this issue Feb 13, 2023 · 9 comments
Open

Support disabling automatic "rslave" fallback #161

AkihiroSuda opened this issue Feb 13, 2023 · 9 comments

Comments

@AkihiroSuda
Copy link
Contributor

AkihiroSuda commented Feb 13, 2023

While containerd and CRI-O consistently maps PROPAGATION_PRIVATE (CRI) to rprivate (OCI),
cri-dockerd (dockerd) conditionally maps it to either rprivate or rslave:

switch m.Propagation {
case v1.MountPropagation_PROPAGATION_PRIVATE:
// noop, dockerd will decide the propagation.
//
// dockerd's default propagation is "rprivate": https://github.com/moby/moby/blob/v20.10.23/volume/mounts/linux_parser.go#L145
//
// However, dockerd automatically changes the default propagation to "rslave"
// when the mount source contains the daemon root (/var/lib/docker):
// - https://github.com/moby/moby/blob/v20.10.23/daemon/volumes.go#L137-L143
// - https://github.com/moby/moby/blob/v20.10.23/daemon/volumes_linux.go#L11-L36
//
// This behavior was introduced in Docker 18.03: https://github.com/moby/moby/pull/36055
case v1.MountPropagation_PROPAGATION_BIDIRECTIONAL:
attrs = append(attrs, "rshared")
case v1.MountPropagation_PROPAGATION_HOST_TO_CONTAINER:
attrs = append(attrs, "rslave")
default:
logrus.Infof("Unknown propagation mode for hostPath %s", m.HostPath)
// let dockerd decide the propagation
}

cri-dockerd should have an option to disable falling back to rslave, without sacrificing backward compatibility.

  • Plan A: Introduce a custom Pod annotation like cri-dockerd.mirantis.com/deterministic-propagation=true.

  • Plan B: Add a new constant like PROPAGATION_LITERALLY_PRIVATE to the CRI API.
    In the YAML it would look like mountPropagation: LiterallyNone.

  • Plan C: Just return an error when recursivelyReadOnly == Required && propagation == None aka private && sourceContainsDaemonRoot. recursiveReadOnly is proposed in KEP-3857: Recursive Read-only (RRO) mounts kubernetes/enhancements#3858

Plan B probably needs going through the full KEP process, which may take several years.
So, my preference is Plan A or C.

@evol262
Copy link
Contributor

evol262 commented Feb 13, 2023

@AkihiroSuda is there actually a request for this upstream? If not, there are plenty of capabilities for containerd/CRI-O which aren't exposed either.

If Moby is taking an interest (which is great!), a sane way to pass headers to docker over the client, pre-seeding pull credentials (k8s expects them to be in a secret, but there's a gap when k8s bootstrapping where secrets do not exist yet when private registries are used), and implementing CRI stats which would be great.

@AkihiroSuda
Copy link
Contributor Author

If Moby is taking an interest (which is great!), a sane way to pass headers to docker over the client, pre-seeding pull credentials (k8s expects them to be in a secret, but there's a gap when k8s bootstrapping where secrets do not exist yet when private registries are used), and implementing CRI stats which would be great.

Sorry, I'm not sure how your comment is relevant to my OP.

@neersighted
Copy link
Contributor

neersighted commented Feb 13, 2023

I believe @AkihiroSuda is referring to simply explicitly setting the propagation instead of using the noop in the case statement when a magic label is set.

Re: other improvements in Moby itself, please file tickets in the ENGINE project. There are multiple Moby maintainers on the MCR team now, and we can prioritize improvements MKE would like to see if they are surfaced.

@evol262
Copy link
Contributor

evol262 commented Feb 13, 2023

The suggest improvements are not in Moby itself. They are improvements for cri-dockerd where a deeper knowledge of Moby/Docker would be helpful.

The question is: "has anyone requested this functionality"? If not, then there is not a clear reason to include it. No matter how technically doable "if this label is set, explicitly set the propagation" is, it's adding a moving piece which must now be tested, may require upkeep/maintenance if k8s and/or Moby change this behavior, and closing this as WONTFIX is a better resolution. Just because we can doesn't mean we should, and if nobody is asking for it in the long window between 18.03 and now, then there's no reason to do so.

@neersighted
Copy link
Contributor

I for one have no idea why you would want to allow the daemon root to propagate into itself/what the use cases might be.

The suggest improvements are not in Moby itself.

This seems to be contradicted by:

If Moby is taking an interest (which is great!), a sane way to pass headers to docker over the client, pre-seeding pull credentials (k8s expects them to be in a secret, but there's a gap when k8s bootstrapping where secrets do not exist yet when private registries are used), and implementing CRI stats which would be great.

I think the two got mixed together -- I was suggesting that if some of these require design work and/or changes in Moby, we can put those in the backlog.

@AkihiroSuda
Copy link
Contributor Author

AkihiroSuda commented Feb 13, 2023

has anyone requested this functionality

I request, because recursively read-only (RRO) mounts have to be used with rprivate (otherwise a writable submount may appear inside the RRO mount):

@evol262
Copy link
Contributor

evol262 commented Feb 13, 2023

I for one have no idea why you would want to allow the daemon root to propagate into itself/what the use cases might be.

That is not the question. It's basically the same argument as thockin on the actual KEP. The question for cri-dockerd is "should we implement a potentially breaking feature which does not have an accepted KEP or release version"? If the KEP is accepted, then support for it can/should be added when the feature graduates to Alpha/Beta, and not before.

When or why users would want to allow this sort of propagation is not at issue, and while I also think there are better ways to do it, cri-dockerd should follow whatever mechanism (labeling, annotations, etc) upstream decides upon as a standard, and enable it once there are feature gates for it, then let the kubelet decide what to do based on those.

What we do not want to do is be even more of a "mostly compliant CRI but slightly off from the standard" (since log streaming/etc are not supported).

This seems to be contradicted by:

If Moby is taking an interest (which is great!), a sane way to pass headers to docker over the client, pre-seeding pull credentials (k8s expects them to be in a secret, but there's a gap when k8s bootstrapping where secrets do not exist yet when private registries are used), and implementing CRI stats which would be great.

I think the two got mixed together -- I was suggesting that if some of these require design work and/or changes in Moby, we can put those in the backlog.

This is not a contradiction, and none of these require design work and/or changes in Moby. The comment is more or less "if Moby maintainers/engineers are contributing here, the header issue, passing credentials as part of a CRI-level daemon config for bootstrapping, and implementing 'pure' CRI stat collection over the Docker API/module are places where deep knowledge of the Docker API/client can meaningfully improve this project right now".

If that's a non-starter, then we'll set that conversation aside, but the top of this post (implementing non-standard annotations/labels to support an unaccepted KEP which the k8s maintainers feel is a breaking change) will not happen or be prioritized right now.

@AkihiroSuda
Copy link
Contributor Author

If that's a non-starter, then we'll set that conversation aside, but the top of this post (implementing non-standard annotations/labels to support an unaccepted KEP which the k8s maintainers feel is a breaking change) will not happen or be prioritized right now.

The current revision kubernetes/enhancements@761883a isn't considered to be a breaking change AFAICS, but I'm ok to defer this discussion until the KEP settles.

If the KEP settles probably cri-dockerd should just return an error when recursivelyReadOnly == Required && propagation == None aka private && sourceContainsDaemonRoot

@evol262
Copy link
Contributor

evol262 commented Feb 14, 2023

It depends on exactly what it settles down to, because this chain is a bit back and forth. Either way, the question still is not about when the KEP settles, but when is actually graduates (even to Alpha), at which point it becomes suitable for inclusion.

When it does, what cri-dockerd does (as far as returning errors, etc) will be whatever the specified behavior from the graduated feature says it should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants