Mounts: allow runtimes to create the destination #955

alban · 2018-03-08T17:55:48Z

Consider the following mount:

"mounts": [
  {
    "destination": "/non-existent",
    "source": "/etc/resolv.conf",
    "options": ["bind","ro"]
  }
]

When /non-existent does not exist in the container rootfs, the spec does
not say whether the runtime can create it or if it should let the mount
fails.

runc just calls 'mkdir' if it does not exist, after creating and
entering the namespaces. It works most of the times but fails if:

the source of the bind mount is a file instead of a directory, or
the container does not have the rights for the mkdir (readonly rootfs
or rootfs belonging to an unmapped uid).

Doing the mkdir/touch before entering the namespaces is difficult to get
correctly because:

symlinks would need to be resolved manually as if we were chrooted in
the container (symlinks are resolved by the mount syscall: a symlink
itself cannot be a source or a target of a bind mount), and
ordering nested mounts is complicated by possible symlinks

So I'd prefer if the spec does not mandate to create the file or
directory in all possible scenarios because it's overly complicated. But
the runtime should be allowed to address the simple case like runc does.
Hence the "MAY" wording.

Signed-off-by: Alban Crequy [email protected]

Consider the following mount: ``` "mounts": [ { "destination": "/non-existent", "source": "/etc/resolv.conf", "options": ["bind","ro"] } ] ``` When /non-existent does not exist in the container rootfs, the spec does not say whether the runtime can create it or if it should let the mount fails. runc just calls 'mkdir' if it does not exist, after creating and entering the namespaces. It works most of the times but fails if: - the source of the bind mount is a file instead of a directory, or - the container does not have the rights for the mkdir (readonly rootfs or rootfs belonging to an unmapped uid). Doing the mkdir/touch before entering the namespaces is difficult to get correctly because: - symlinks would need to be resolved manually as if we were chrooted in the container (symlinks are resolved by the mount syscall: a symlink itself cannot be a source or a target of a bind mount), and - ordering nested mounts is complicated by possible symlinks So I'd prefer if the spec does not mandate to create the file or directory in all possible scenarios because it's overly complicated. But the runtime should be allowed to address the simple case like runc does. Hence the "MAY" wording. Signed-off-by: Alban Crequy <[email protected]>

wking · 2018-03-08T18:09:18Z

I'd rather MUST this, since otherwise runtime-callers cannot rely on it. Previous discussion in #591.

Don't validate uid mappings and gid mappings separately: containers with only user mappings or with only group mappings are not usable. Additionnally, don't rely on the runtime to create the directories to be mounted. runc mounts them in the easy cases but it does not work with user namespaces. Marking as WIP because this is in discussion in opencontainers/runtime-spec#955 Signed-off-by: Alban Crequy <[email protected]>

Don't validate uid mappings and gid mappings separately: containers with only user mappings or with only group mappings are not usable. Additionnally, don't rely on the runtime to create the directories to be mounted. runc mounts them in the easy cases but it does not work with user namespaces. Marking as WIP/RFC because this is in discussion in opencontainers/runtime-spec#955 Signed-off-by: Alban Crequy <[email protected]>

alban · 2018-03-08T19:01:32Z

This is the work that made me want to clarify the spec: opencontainers/runtime-tools#597

alban · 2018-03-08T19:28:04Z

What should the runtime do if the uid-mappings don't give enough privileges for the directories to be created after entering the namespaces? Should it give up with EPERM like runc is doing now? Or should it be able to work around this by using mkdir/touch before entering the user namespaces? rkt is doing the latter but it has a cost in code complexity: rkt evaluates symlinks manually in the rootfs (EvaluateSymlinksInsideApp and rkt#3490).

If the spec were to say "MUST" here, it should also say what to do in the case above.

/cc @s-urbaniak @iaguis

tianon · 2018-03-08T20:18:39Z

IMO it'd make more sense to make this a MUST NOT, and consider later adding a new option for allowing the runtime to do the appropriate mkdir (similar to Docker's services and their --mount behavior).

Auto-creating mounted directories isn't horrible, but when it creates a directory and a file was intended, it causes more harm than good (as can be and has been seen with docker run's default "create if not exists" behavior, and as was the impetus for services to go the other direction).

wking · 2018-03-08T20:30:00Z

IMO it'd make more sense to make this a MUST NOT, and consider later adding a new option for allowing the runtime to do the appropriate mkdir (similar to Docker's services and their --mount behavior).

This is the approach that was considered and rejected in #591 (although we can obviously revisit it). I think the crux there was @crosbymichael pointing out that, if you're about to die because of a missing destination directory, who wouldn't want the runtime to at least attempt to create the destination?

wking · 2018-03-08T20:31:38Z

Auto-creating mounted directories isn't horrible, but when it creates a directory and a file was intended, it causes more harm than good…

You can handle this by creating target files for bind mounts where the source resolves to a non-directory (e.g. wking/ccon@ce2d4a45629, fixes in wking/ccon@8b368c61fc). I'm fine with that being a configured option instead of detecting it though, if folks feel like detection is too high a runtime burden.

wking · 2018-03-09T01:15:36Z

What should the runtime do if the uid-mappings don't give enough privileges for the directories to be created after entering the namespaces? Should it give up with EPERM like runc is doing now?

I think this should be allowed, but not required. Creating the target dir/file when the container process has sufficient priveleges should be required, because it's helpful and a dozen or two lines of C.

Or should it be able to work around this by using mkdir/touch before entering the user namespaces? rkt is doing the latter but it has a cost in code complexity...

Besides symlink rooting, you also have the "maybe a previpus mount changes one of these" issue. I think you'd need to have the container process check the source type, and the tell the runtime process (over their synchronization pipe) "can you create a {file|dir} at {container path}", and the runtime would have to clone a sub-process into the mount namespace (while staying in the runtime user namespace) to create the target. So not impossible, and great for runtimes that want to support this, but not something I'm ready to impose on runtimes withough more motivation.

If the spec were to say "MUST" here, it should also say what to do in the case above.

Agreed.

lucab · 2018-03-14T16:03:07Z

@alban FYI this behavior from Docker caused some frictions in the past. Kubernetes thus introduced dedicated ${Type}OrCreate hostpath types to correctly drive the runtime. I'm personally aligned with @tianon's comment.

vbatts · 2018-04-04T21:57:56Z

+1 to @tianon position

vbatts · 2019-12-17T18:41:53Z

house keeping: i know the expectations of creating the path are unclear, but just having this as a MAY with no guidance will just cause inconsistent behaviors. A couple folks would rather this be MUST NOT, so i'm closing this for now.

alban mentioned this pull request Mar 8, 2018

validation: LinuxUIDMapping: fix tests opencontainers/runtime-tools#597

Merged

alban mentioned this pull request Mar 19, 2018

config: runtime SHOULD create mountpoints and cwd #928

Closed

vbatts closed this Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mounts: allow runtimes to create the destination #955

Mounts: allow runtimes to create the destination #955

alban commented Mar 8, 2018

wking commented Mar 8, 2018

alban commented Mar 8, 2018

alban commented Mar 8, 2018

tianon commented Mar 8, 2018

wking commented Mar 8, 2018

wking commented Mar 8, 2018 •

edited

Loading

wking commented Mar 9, 2018 •

edited

Loading

lucab commented Mar 14, 2018 •

edited

Loading

vbatts commented Apr 4, 2018

vbatts commented Dec 17, 2019

Mounts: allow runtimes to create the destination #955

Mounts: allow runtimes to create the destination #955

Conversation

alban commented Mar 8, 2018

wking commented Mar 8, 2018

alban commented Mar 8, 2018

alban commented Mar 8, 2018

tianon commented Mar 8, 2018

wking commented Mar 8, 2018

wking commented Mar 8, 2018 • edited Loading

wking commented Mar 9, 2018 • edited Loading

lucab commented Mar 14, 2018 • edited Loading

vbatts commented Apr 4, 2018

vbatts commented Dec 17, 2019

wking commented Mar 8, 2018 •

edited

Loading

wking commented Mar 9, 2018 •

edited

Loading

lucab commented Mar 14, 2018 •

edited

Loading