Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: block-device mounts #2168

Merged
merged 1 commit into from
Jun 26, 2024
Merged

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Jun 13, 2024

This PR adds capability to mount virtual and passthrough disks as block devices inside containers.

We add a new "blockdev://" prefix to OCI Mount.ContainerPath, which indicates that the source should be mounted as a blcok device.

A new BlockDev field has been added to mountConfig used by mountManager, which indicates that the SCSI attachment should be mounted as a block device.

The GCS has also been updated to handle BlockDev. Instead of mounting the filesystem, GCS creates a symlink to the block device corresponding to the SCSI attachment. The symlink path is set by shim as a source of bind mount in OCI container spec. GCS resolves the symlink and adds the corresponding device cgroup. Without the cgroup, the container won't be able to work with the block device.

We chose a symlink approach instead of bind mounting the device directly, because the shim doesn't know the path at which the device will appear inside UVM. For this to work, we either need to encode the SCSI controller/LUN in the OCI mount's HostPath or update the communication protocol between the shim and GCS, where GCS would either return the device path, or add capability for the shim to query for it.

Below are some CRI container config examples for physical and virtual disks:

Passthrough physical disk:

{
    ...
    "mounts": [
        {
            "host_path": "\\\\.\\PHYSICALDRIVE1",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}

Virtual VHD disk:

{
    ...
    "mounts": [
        {
            "host_path": "C:\\path\\to\\my\\disk.vhdx",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}

Mount manager will differentiate between a block device and a
filesystem mount. Two containers can use the same managed disk
inside UVM as a block device or filesystem at the same time.
For block device mount a symlink will be created, for filesystem
mount the block device will be mounted in the UVM.

bash-5.0# ls -l /run/mounts/scsi/
total 16
drwxr-xr-x    3 root     root          4096 Jan  1  1970 m0
drwxr-xr-x    4 root     root          4096 Jun 20 23:20 m1
drwxr-xr-x   18 root     root          4096 Jan  1  1970 m2
drwxr-xr-x    3 root     root          4096 Jun 20 23:20 m3
lrwxrwxrwx    1 root     root             8 Jun 20 23:22 m4 -> /dev/sde
bash-5.0# mount | grep sde
/dev/sde on /run/mounts/scsi/m3 type ext4 (rw,relatime)

@anmaxvl anmaxvl requested a review from a team as a code owner June 13, 2024 18:51
@anmaxvl anmaxvl force-pushed the block-device-mounts branch from b1337e9 to ca6f3c0 Compare June 20, 2024 21:51
@anmaxvl anmaxvl changed the title feat: block-device mounts featture: block-device mounts Jun 20, 2024
@anmaxvl anmaxvl changed the title featture: block-device mounts feature: block-device mounts Jun 20, 2024
@anmaxvl anmaxvl force-pushed the block-device-mounts branch from ca6f3c0 to 4ef8a0d Compare June 20, 2024 23:25
internal/guest/storage/scsi/scsi.go Outdated Show resolved Hide resolved
internal/guest/storage/scsi/scsi.go Outdated Show resolved Hide resolved
internal/guest/storage/scsi/scsi.go Show resolved Hide resolved
This PR adds capability to mount virtual and passthrough disks
as block devices inside containers.

We add a new "blockdev://" prefix to OCI `Mount.ContainerPath`,
which indicates that the source should be mounted as a blcok
device.

A new `BlockDev` field has been added to `mountConfig` used by
`mountManager`, which indicates that the SCSI attachment should
be mounted as a block device.

The GCS has also been updated to handle `BlockDev`. Instead of
mounting the filesystem, GCS creates a symlink to the block device
corresponding to the SCSI attachment. The symlink path is set
by shim as a source of bind mount in OCI container spec. GCS
resolves the symlink and adds the corresponding device cgroup.
Without the cgroup, the container won't be able to work with the
block device.

We chose a symlink approach instead of bind mounting the device
directly, because the shim doesn't know the path at which the
device will appear inside UVM. For this to work, we either need
to encode the SCSI controller/LUN in the OCI mount's HostPath or
update the communication protocol between the shim and GCS, where
GCS would either return the device path, or add capability for
the shim to query for it.

Below are some CRI container config examples for physical and
virtual disks:

Passthrough physical disk:
```json
{
    ...
    "mounts": [
        {
            "host_path": "\\\\.\\PHYSICALDRIVE1",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}
```

Virtual VHD disk:
```json
{
    ...
    "mounts": [
        {
            "host_path": "C:\\path\\to\\my\\disk.vhdx",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}
```

Mount manager will differentiate between a block device and a
filesystem mount. Two containers can use the same managed disk
inside UVM as a block device or filesystem at the same time.
For block device mount a symlink will be created, for filesystem
mount the block device will be mounted in the UVM.
```
bash-5.0# ls -l /run/mounts/scsi/
total 16
drwxr-xr-x    3 root     root          4096 Jan  1  1970 m0
drwxr-xr-x    4 root     root          4096 Jun 20 23:20 m1
drwxr-xr-x   18 root     root          4096 Jan  1  1970 m2
drwxr-xr-x    3 root     root          4096 Jun 20 23:20 m3
lrwxrwxrwx    1 root     root             8 Jun 20 23:22 m4 -> /dev/sde
bash-5.0# mount | grep sde
/dev/sde on /run/mounts/scsi/m3 type ext4 (rw,relatime)
```

Signed-off-by: Maksim An <[email protected]>
@anmaxvl anmaxvl force-pushed the block-device-mounts branch from 4ef8a0d to 92ca394 Compare June 24, 2024 20:19
@anmaxvl anmaxvl merged commit 53f2486 into microsoft:main Jun 26, 2024
19 checks passed
@anmaxvl anmaxvl deleted the block-device-mounts branch June 26, 2024 16:14
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
This PR adds capability to mount virtual and passthrough disks
as block devices inside containers.

We add a new "blockdev://" prefix to OCI `Mount.ContainerPath`,
which indicates that the source should be mounted as a blcok
device.

A new `BlockDev` field has been added to `mountConfig` used by
`mountManager`, which indicates that the SCSI attachment should
be mounted as a block device.

The GCS has also been updated to handle `BlockDev`. Instead of
mounting the filesystem, GCS creates a symlink to the block device
corresponding to the SCSI attachment. The symlink path is set
by shim as a source of bind mount in OCI container spec. GCS
resolves the symlink and adds the corresponding device cgroup.
Without the cgroup, the container won't be able to work with the
block device.

We chose a symlink approach instead of bind mounting the device
directly, because the shim doesn't know the path at which the
device will appear inside UVM. For this to work, we either need
to encode the SCSI controller/LUN in the OCI mount's HostPath or
update the communication protocol between the shim and GCS, where
GCS would either return the device path, or add capability for
the shim to query for it.

Below are some CRI container config examples for physical and
virtual disks:

Passthrough physical disk:
```json
{
    ...
    "mounts": [
        {
            "host_path": "\\\\.\\PHYSICALDRIVE1",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}
```

Virtual VHD disk:
```json
{
    ...
    "mounts": [
        {
            "host_path": "C:\\path\\to\\my\\disk.vhdx",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}
```

Mount manager will differentiate between a block device and a
filesystem mount. Two containers can use the same managed disk
inside UVM as a block device or filesystem at the same time.
For block device mount a symlink will be created, for filesystem
mount the block device will be mounted in the UVM.
```
bash-5.0# ls -l /run/mounts/scsi/
total 16
drwxr-xr-x    3 root     root          4096 Jan  1  1970 m0
drwxr-xr-x    4 root     root          4096 Jun 20 23:20 m1
drwxr-xr-x   18 root     root          4096 Jan  1  1970 m2
drwxr-xr-x    3 root     root          4096 Jun 20 23:20 m3
lrwxrwxrwx    1 root     root             8 Jun 20 23:22 m4 -> /dev/sde
bash-5.0# mount | grep sde
/dev/sde on /run/mounts/scsi/m3 type ext4 (rw,relatime)
```

Signed-off-by: Maksim An <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants