Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounts: allow runtimes to create the destination #955

Closed
wants to merge 1 commit into from

Conversation

alban
Copy link
Contributor

@alban alban commented Mar 8, 2018

Consider the following mount:

"mounts": [
  {
    "destination": "/non-existent",
    "source": "/etc/resolv.conf",
    "options": ["bind","ro"]
  }
]

When /non-existent does not exist in the container rootfs, the spec does
not say whether the runtime can create it or if it should let the mount
fails.

runc just calls 'mkdir' if it does not exist, after creating and
entering the namespaces. It works most of the times but fails if:

  • the source of the bind mount is a file instead of a directory, or
  • the container does not have the rights for the mkdir (readonly rootfs
    or rootfs belonging to an unmapped uid).

Doing the mkdir/touch before entering the namespaces is difficult to get
correctly because:

  • symlinks would need to be resolved manually as if we were chrooted in
    the container (symlinks are resolved by the mount syscall: a symlink
    itself cannot be a source or a target of a bind mount), and
  • ordering nested mounts is complicated by possible symlinks

So I'd prefer if the spec does not mandate to create the file or
directory in all possible scenarios because it's overly complicated. But
the runtime should be allowed to address the simple case like runc does.
Hence the "MAY" wording.

Signed-off-by: Alban Crequy [email protected]

Consider the following mount:
```
"mounts": [
  {
    "destination": "/non-existent",
    "source": "/etc/resolv.conf",
    "options": ["bind","ro"]
  }
]
```

When /non-existent does not exist in the container rootfs, the spec does
not say whether the runtime can create it or if it should let the mount
fails.

runc just calls 'mkdir' if it does not exist, after creating and
entering the namespaces. It works most of the times but fails if:
- the source of the bind mount is a file instead of a directory, or
- the container does not have the rights for the mkdir (readonly rootfs
  or rootfs belonging to an unmapped uid).

Doing the mkdir/touch before entering the namespaces is difficult to get
correctly because:
- symlinks would need to be resolved manually as if we were chrooted in
  the container (symlinks are resolved by the mount syscall: a symlink
  itself cannot be a source or a target of a bind mount), and
- ordering nested mounts is complicated by possible symlinks

So I'd prefer if the spec does not mandate to create the file or
directory in all possible scenarios because it's overly complicated. But
the runtime should be allowed to address the simple case like runc does.
Hence the "MAY" wording.

Signed-off-by: Alban Crequy <[email protected]>
@wking
Copy link
Contributor

wking commented Mar 8, 2018

I'd rather MUST this, since otherwise runtime-callers cannot rely on it. Previous discussion in #591.

alban added a commit to kinvolk-archives/runtime-tools that referenced this pull request Mar 8, 2018
Don't validate uid mappings and gid mappings separately: containers with
only user mappings or with only group mappings are not usable.

Additionnally, don't rely on the runtime to create the directories to be
mounted. runc mounts them in the easy cases but it does not work with
user namespaces.

Marking as WIP because this is in discussion in opencontainers/runtime-spec#955

Signed-off-by: Alban Crequy <[email protected]>
alban added a commit to kinvolk-archives/runtime-tools that referenced this pull request Mar 8, 2018
Don't validate uid mappings and gid mappings separately: containers with
only user mappings or with only group mappings are not usable.

Additionnally, don't rely on the runtime to create the directories to be
mounted. runc mounts them in the easy cases but it does not work with
user namespaces.

Marking as WIP/RFC because this is in discussion in opencontainers/runtime-spec#955

Signed-off-by: Alban Crequy <[email protected]>
@alban
Copy link
Contributor Author

alban commented Mar 8, 2018

This is the work that made me want to clarify the spec: opencontainers/runtime-tools#597

@alban
Copy link
Contributor Author

alban commented Mar 8, 2018

What should the runtime do if the uid-mappings don't give enough privileges for the directories to be created after entering the namespaces? Should it give up with EPERM like runc is doing now? Or should it be able to work around this by using mkdir/touch before entering the user namespaces? rkt is doing the latter but it has a cost in code complexity: rkt evaluates symlinks manually in the rootfs (EvaluateSymlinksInsideApp and rkt#3490).

If the spec were to say "MUST" here, it should also say what to do in the case above.

/cc @s-urbaniak @iaguis

@tianon
Copy link
Member

tianon commented Mar 8, 2018

IMO it'd make more sense to make this a MUST NOT, and consider later adding a new option for allowing the runtime to do the appropriate mkdir (similar to Docker's services and their --mount behavior).

Auto-creating mounted directories isn't horrible, but when it creates a directory and a file was intended, it causes more harm than good (as can be and has been seen with docker run's default "create if not exists" behavior, and as was the impetus for services to go the other direction).

@wking
Copy link
Contributor

wking commented Mar 8, 2018

IMO it'd make more sense to make this a MUST NOT, and consider later adding a new option for allowing the runtime to do the appropriate mkdir (similar to Docker's services and their --mount behavior).

This is the approach that was considered and rejected in #591 (although we can obviously revisit it). I think the crux there was @crosbymichael pointing out that, if you're about to die because of a missing destination directory, who wouldn't want the runtime to at least attempt to create the destination?

@wking
Copy link
Contributor

wking commented Mar 8, 2018

Auto-creating mounted directories isn't horrible, but when it creates a directory and a file was intended, it causes more harm than good…

You can handle this by creating target files for bind mounts where the source resolves to a non-directory (e.g. wking/ccon@ce2d4a45629, fixes in wking/ccon@8b368c61fc). I'm fine with that being a configured option instead of detecting it though, if folks feel like detection is too high a runtime burden.

@wking
Copy link
Contributor

wking commented Mar 9, 2018

What should the runtime do if the uid-mappings don't give enough privileges for the directories to be created after entering the namespaces? Should it give up with EPERM like runc is doing now?

I think this should be allowed, but not required. Creating the target dir/file when the container process has sufficient priveleges should be required, because it's helpful and a dozen or two lines of C.

Or should it be able to work around this by using mkdir/touch before entering the user namespaces? rkt is doing the latter but it has a cost in code complexity...

Besides symlink rooting, you also have the "maybe a previpus mount changes one of these" issue. I think you'd need to have the container process check the source type, and the tell the runtime process (over their synchronization pipe) "can you create a {file|dir} at {container path}", and the runtime would have to clone a sub-process into the mount namespace (while staying in the runtime user namespace) to create the target. So not impossible, and great for runtimes that want to support this, but not something I'm ready to impose on runtimes withough more motivation.

If the spec were to say "MUST" here, it should also say what to do in the case above.

Agreed.

@lucab
Copy link

lucab commented Mar 14, 2018

@alban FYI this behavior from Docker caused some frictions in the past. Kubernetes thus introduced dedicated ${Type}OrCreate hostpath types to correctly drive the runtime. I'm personally aligned with @tianon's comment.

@vbatts
Copy link
Member

vbatts commented Apr 4, 2018

+1 to @tianon position

@vbatts
Copy link
Member

vbatts commented Dec 17, 2019

house keeping: i know the expectations of creating the path are unclear, but just having this as a MAY with no guidance will just cause inconsistent behaviors. A couple folks would rather this be MUST NOT, so i'm closing this for now.

@vbatts vbatts closed this Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants