Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

grpc: Handle hotplugged devices #133

Merged
merged 1 commit into from
Mar 2, 2018

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Feb 3, 2018

This commit adds a new Device structure in order to avoid confusion
between what should be mounted and what is a device. When creating
a new container, we expect the caller to pass a list of storages,
and a list of devices.

When adding a host device to the container by using things like
"docker run --device /dev/sda1:/dev/blk1", the runtime will not
be able to anticipate the major/minor numbers for the device
being hotplugged into the VM. This means the agent has to update
the list of devices provided through the spec, by relying on the
list of devices provided.

Fixes #132

Signed-off-by: Sebastien Boeuf [email protected]

@sboeuf sboeuf force-pushed the fix_device_passing branch from 2c3dec9 to 4306ec2 Compare February 3, 2018 01:07
@sboeuf
Copy link
Author

sboeuf commented Feb 3, 2018

@bergwolf
Copy link
Member

bergwolf commented Feb 3, 2018

@sboeuf looking at libcontainer, it seems that it ignores major/minor in the spec when creating containers. Can you explain a bit why the agent needs to fill in device major/minor on its own?

grpc.go Outdated

if mnt.Driver == "" {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, what's the content of mnt.Driver?
I think this code needs to be moved to the corresponding storage driver.
The behavior of this function will be a part of the ABI between runtime <-> agent.
So it needs to be documented in protocol or storage driver.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I wanted to raise this PR to discuss if you were okay using this field in order to determine if we actually have a mount or a device. I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something. Let me know if you're okay with this (@laijs @bergwolf @jodh-intel @sameo), and I will update the doc in the protocol.
Another option would be to define a new field/structure Device that we would pass the same way we pass Storage for the mounts.

Copy link
Contributor

@laijs laijs Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. register drivers implementations to the agent as a list/map
  2. if the driver field contains something, get the driver from the list/map and call the handling function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something

If we agree on this, please can you add that comment to the definition to make this assumption clear to all:

@sboeuf
Copy link
Author

sboeuf commented Feb 5, 2018

@bergwolf if we don't provide libcontainer the appropriate major/minor numbers, it will create sort of a fake device because it won't actually be tied to the real device that we expect.
Let's say we have /dev/sda1 on the host, and we want to mount that inside the container, all we need is to give the name and the major/minor numbers to libcontainer. As a block device, the device will show up as /dev/vdaX inside the VM (the name needs to be anticipated by the runtime), and then the agent has to find the numbers since they cannot be anticipated in that case.

@bergwolf
Copy link
Member

bergwolf commented Feb 6, 2018

@sboeuf I'm afraid the place to assign major/minor of a device does not ensure it is the same device we'd expect to show up in the guest. In short, /dev/vdaX is not a reliable identifier of the device. Therefore the major/minor you get with it may refer to a different device other than /dev/sda1 on the host.

In hyperstart, we solved the issue with scsi-id of the device. The workflow is that:

  1. runtime specify the scsi id of the block device when hotplug it to the guest
  2. hyperstart scans /sys/class/scsi_disk/ directory to find out the device name in the guest
  3. check device existence and wait for it if it is not present

The final wait is necessary because Linux kernel device initialization is asynchronous.

@sboeuf
Copy link
Author

sboeuf commented Feb 6, 2018

@bergwolf I agree for SCSI device, unfortunately this case is not applicable for block devices.

@sboeuf
Copy link
Author

sboeuf commented Feb 8, 2018

@bergwolf @laijs @sameo any other comments on this PR ? What should I do here ?

@sboeuf sboeuf force-pushed the fix_device_passing branch from 4306ec2 to 530b15a Compare February 9, 2018 19:32
@sboeuf
Copy link
Author

sboeuf commented Feb 9, 2018

@laijs PR updated adding the list of handlers for different devices.

@sboeuf sboeuf force-pushed the fix_device_passing branch 2 times, most recently from 4291b41 to 15e72eb Compare February 9, 2018 19:50
@sboeuf
Copy link
Author

sboeuf commented Feb 12, 2018

ping @WeiZhang555 @bergwolf @laijs @sameo

@@ -194,7 +194,7 @@ message OnlineCPUMemRequest {
}

message Storage {
string driver = 1; // empty in most cases. it will support "drbd", "bcache" ...
string driver = 1; // Empty driver field means that we expect to do a mount. If driver field is not empty, we expect to do an operation related to a device. The type of device should be given through this field, triggering a specific function from the agent code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove "Empty driver field means that we expect to do a mount. "
the driver field is needed when the agent's general mount code can't handle it,
for example special devices, fstypes...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated !

@sboeuf sboeuf force-pushed the fix_device_passing branch from 15e72eb to 0c470fb Compare February 13, 2018 05:05
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you add some unit tests to this PR.

device.go Outdated
major := int64(stat.Rdev / 256)
minor := int64(stat.Rdev % 256)

agentLog.Infof("Device: %s, Major: %d, Minor: %d", device.Source, major, minor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use fields for this? Something like:

agentLog.WithFields(logrus.Fields{
    "device-path": device.Source,
    "device-major": major,
    "device-minor": minor,
}).Info("handling block device")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil
}

uEvHandler, err := uevent.NewHandler()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of these handlers is created for every block device passed in the spec. Is that going to be a problem from a performance perspective do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added this code in this PR. I only did a copy/paste from mount.go.

device.go Outdated
}

if !updated {
return fmt.Errorf("Should have found a matching device in the spec")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is the previous log call, but it might be useful to atleast log device.Source here so it's clear which device wasn't found in the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

device.go Outdated
updated := false
for idx, d := range spec.Linux.Devices {
if d.Path == device.MountPoint {
spec.Linux.Devices[idx].Major = major
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overwriting the original values right (https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#devices)? If so, should we log the old ones just in case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

grpc.go Outdated

if mnt.Driver == "" {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something

If we agree on this, please can you add that comment to the definition to make this assumption clear to all:

@sboeuf
Copy link
Author

sboeuf commented Feb 14, 2018

@jodh-intel I have updated the comment related to the driver field. Also, I am adding some unit tests for the new functions.

@sboeuf sboeuf force-pushed the fix_device_passing branch from 0c470fb to 9f97a8f Compare February 14, 2018 20:12
@sboeuf
Copy link
Author

sboeuf commented Feb 14, 2018

@jodh-intel PR updated
@bergwolf @laijs PTAL

@jodh-intel
Copy link
Contributor

jodh-intel commented Feb 15, 2018

lgtm

Approved with PullApprove

device.go Outdated
type deviceDriversHandler func(device pb.Storage, spec *pb.Spec) error

var deviceDriversHandlerList = map[string]deviceDriversHandler{
"blk": blockDeviceHandler,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing the driver name to "device"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean instead of "blk" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the name is, it needs to be defined in the protocol instead of here.

Copy link
Contributor

@laijs laijs Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code will not be only used for blk devices. so "blk" is not a proper name.
Even "device" doesn't provide enough information. IIUC, it is oci-style passing the devices which can be seen by the runtime/libcontainer into the container. Using "oci-device" or "device-passthrough" for the driver name instead of "blk" could be a good choice either.

Drivers are a part of the agent protocol, but I think just putting some comments on the api documents is sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, comments are fine to me. The point is that the supported values need to be documented there.

The association with pb.Storage makes the handler block device specific. I think we can:

  1. remove the association with pb.Storage in the handler, and rename "blk` to a more generic name
  2. make sure that if a block device dev.Path matches pb.Storage.Source, it will be mounted at pb.Storage.Mountpoint

@sboeuf
Copy link
Author

sboeuf commented Feb 22, 2018

@bergwolf @laijs ping ! Could you please take a look, as I'd like to move forward with this.

Copy link
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to clearly define and document two things at the protocol level:

  1. ALL supported values in the pb.Storage.Driver field.
  2. The relationship between pb.Storage and ociSpec.Linux.Devices

device.go Outdated
type deviceDriversHandler func(device pb.Storage, spec *pb.Spec) error

var deviceDriversHandlerList = map[string]deviceDriversHandler{
"blk": blockDeviceHandler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the name is, it needs to be defined in the protocol instead of here.

device.go Outdated
// Update the spec
updated := false
for idx, d := range spec.Linux.Devices {
if d.Path == device.MountPoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.Path in the OCI spec is the device path. How do you expect it to be a MountPoint path here?

It seems that you really need d.Path == device.Source instead? But again, this function only works for block devices that are also present in the storage spec (which is also useful since agent can mount the block devices for containers), whereas the commit message and #132 both intend to provide a generic way to pass host devices to the sandbox.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no, I need MountPoint because this is the expected destination for the device, and we expect this destination to be the source of the device which has been passed-through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OCI spec, spec.Linux.Devices.Path is a device path like /dev/sdb and libcontainer would call mknodat to create it with the provided device numbers. While in kata agent protocol, pb.Storage.MountPoint the is mount point of the storage which is a directory outside of the container rootfs. How do you expect them to be equal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf I guess we have some misunderstanding here. Can you please provide an example config of spec.Linux.Devices and pb.Storage that you intend to handle with this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so let me try to clarify here. In case we want to pass-through a block device such as /dev/sdb from the host to the container, we need to hotplug this device into the VM. After it has been hotplugged, it will show up as /dev/vdX name that we can anticipate from the runtime. That's why I expect the runtime to provide the right name /dev/vdX through the Source field, and the name expected by the OCI spec through the Mountpoint field, so that we can match with the spec which entry we should modify. Once we have found the device /dev/vdX, we get the real major/minor numbers and we update them into the OCI spec for the corresponding device. If we don't update them, libcontainer will create a new device inside the container based on no real device, which means the device will be there but not linked to the real device.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, Mountpoint is not really a mount point of the device, but a device name inside the container. If we do want to make it so, I'd suggest changing the name from Mountpoint to something like Target, and document that it can either be a mount point of the device, or a device name to be represented inside the container, depending on the value of Storage.Driver.

And there must be 1:1 mapping between pb.Storage.Target and spec.Linux.Devices. Maybe we should make pb.Storage.Target a slice so that a device and be represented at multiple places in a container or to multiple containers in the sandbox?

@sboeuf
Copy link
Author

sboeuf commented Feb 23, 2018

@bergwolf sure, I can put a clear definition of what is expected into the protocol, this makes sense. But I'd like to understand what are ALL the different values possible for Driver ?
@laijs using device instead of blk might work for now, but what if we pass a scsi device for instance, how will you make the difference with a block device using the flag device ? I am asking because they won't be processed exactly the same way.

@bergwolf
Copy link
Member

@sboeuf by ALL, I mean all the currently supported values. An empty string and "device" (or any proper name picked in this PR) are the only two currently supported. We can add more when necessary.

@sboeuf
Copy link
Author

sboeuf commented Feb 26, 2018

@laijs please let me know what you think about this comment:

using device instead of blk might work for now, but what if we pass a scsi device for instance, how will you make the difference with a block device using the flag device ? I am asking because they won't be processed exactly the same way.

@sboeuf sboeuf closed this Feb 26, 2018
@sboeuf sboeuf reopened this Feb 26, 2018
@sboeuf sboeuf added the review label Feb 26, 2018
@sboeuf
Copy link
Author

sboeuf commented Feb 26, 2018

sorry, I hit the wrong button...

@laijs
Copy link
Contributor

laijs commented Feb 27, 2018

@sboeuf DriverOptions can be used in this case.

@sboeuf
Copy link
Author

sboeuf commented Feb 27, 2018

@laijs @sameo @jodh-intel @mcastelino @grahamwhaley @WeiZhang555
After some discussion with @bergwolf , we think it would be clearer to add a structure Device additionally to Storage. This would avoid confusion by trying to make all the different cases fit into the structure Storage.

@sboeuf sboeuf force-pushed the fix_device_passing branch from f5e3907 to cdba0c4 Compare February 28, 2018 05:50
@sboeuf sboeuf changed the title grpc: Handle hotplugged devices by updating spec info grpc: Handle hotplugged devices Feb 28, 2018
@sboeuf
Copy link
Author

sboeuf commented Feb 28, 2018

@bergwolf @laijs PR updated ! Let me know what you think about it.

@laijs
Copy link
Contributor

laijs commented Feb 28, 2018

LGTM

Approved with PullApprove

"blk": blockDeviceHandler,
}

func blockDeviceHandler(device pb.Device, spec *pb.Spec) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is no longer block device specific, maybe just a call it genericDeviceHandler?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get your point here. We still need a deviceHandler for each type of device that is going to be passed, right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but blockDeviceHandler is not really block device specific now, right? I was hoping that we can use it to handle some other types of devices as well. There are two ways of doing it:

  1. rename blk to a more generic name for devices
  2. define other types of deviceDriversHandler and use the same handler to handle them

I'm OK with either way but the handler function name needs also to be generic.

Anyway it is just a minor issue. We can get it in as is and change it later when adding other device types support.

grpc.go Outdated
@@ -340,6 +359,10 @@ func (a *agentGRPC) CreateContainer(ctx context.Context, req *pb.CreateContainer
return emptyResp, err
}

if err := addDevices(req.Devices, req.OCI); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addDevices should be done before addMounts so that the latter can find the block device.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have definitely missed that, and this is needed since we expect to be able to mount only after the devices have been discovered inside the VM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated.

@sboeuf sboeuf force-pushed the fix_device_passing branch from cdba0c4 to 7cc12b2 Compare February 28, 2018 16:45
@bergwolf
Copy link
Member

bergwolf commented Mar 1, 2018

LGTM

Approved with PullApprove

@sboeuf
Copy link
Author

sboeuf commented Mar 2, 2018

@bergwolf let me rebase on top of changes from @teawater about the fmt.Errorf( and I'll re-submit so that we can merge this.

// case of rootfs, when a device has to be waited for after it has
// been hotplugged. An equivalent Storage entry should be defined if
// any mount needs to be performed afterwards.
string ctr_path = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer calling this container_path instead of ctr_path here. We are losing readability for the sake of brevity here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed ! I have changed it into container_path

This commit adds a new Device structure in order to avoid confusion
between what should be mounted and what is a device. When creating
a new container, we expect the caller to pass a list of storages,
and a list of devices.

When adding a host device to the container by using things like
"docker run --device /dev/sda1:/dev/blk1", the runtime will not
be able to anticipate the major/minor numbers for the device
being hotplugged into the VM. This means the agent has to update
the list of devices provided through the spec, by relying on the
list of devices provided.

Fixes kata-containers#132

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf sboeuf force-pushed the fix_device_passing branch from 7cc12b2 to 5e7ea0f Compare March 2, 2018 05:00
@bergwolf
Copy link
Member

bergwolf commented Mar 2, 2018

It seems ci on Azure is still WiP. Merging.

@bergwolf bergwolf merged commit 33eecb2 into kata-containers:master Mar 2, 2018
@amshinde amshinde removed the review label Mar 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants