-
Notifications
You must be signed in to change notification settings - Fork 113
Conversation
2c3dec9
to
4306ec2
Compare
@sboeuf looking at libcontainer, it seems that it ignores major/minor in the spec when creating containers. Can you explain a bit why the agent needs to fill in device major/minor on its own? |
grpc.go
Outdated
|
||
if mnt.Driver == "" { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, what's the content of mnt.Driver?
I think this code needs to be moved to the corresponding storage driver.
The behavior of this function will be a part of the ABI between runtime <-> agent.
So it needs to be documented in protocol or storage driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I wanted to raise this PR to discuss if you were okay using this field in order to determine if we actually have a mount or a device. I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something. Let me know if you're okay with this (@laijs @bergwolf @jodh-intel @sameo), and I will update the doc in the protocol.
Another option would be to define a new field/structure Device
that we would pass the same way we pass Storage
for the mounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- register drivers implementations to the agent as a list/map
- if the driver field contains something, get the driver from the list/map and call the handling function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something
If we agree on this, please can you add that comment to the definition to make this assumption clear to all:
@bergwolf if we don't provide libcontainer the appropriate major/minor numbers, it will create sort of a fake device because it won't actually be tied to the real device that we expect. |
@sboeuf I'm afraid the place to assign major/minor of a device does not ensure it is the same device we'd expect to show up in the guest. In short, In hyperstart, we solved the issue with scsi-id of the device. The workflow is that:
The final wait is necessary because Linux kernel device initialization is asynchronous. |
@bergwolf I agree for SCSI device, unfortunately this case is not applicable for block devices. |
4306ec2
to
530b15a
Compare
@laijs PR updated adding the list of handlers for different devices. |
4291b41
to
15e72eb
Compare
protocols/grpc/agent.proto
Outdated
@@ -194,7 +194,7 @@ message OnlineCPUMemRequest { | |||
} | |||
|
|||
message Storage { | |||
string driver = 1; // empty in most cases. it will support "drbd", "bcache" ... | |||
string driver = 1; // Empty driver field means that we expect to do a mount. If driver field is not empty, we expect to do an operation related to a device. The type of device should be given through this field, triggering a specific function from the agent code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove "Empty driver field means that we expect to do a mount. "
the driver field is needed when the agent's general mount code can't handle it,
for example special devices, fstypes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated !
15e72eb
to
0c470fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you add some unit tests to this PR.
device.go
Outdated
major := int64(stat.Rdev / 256) | ||
minor := int64(stat.Rdev % 256) | ||
|
||
agentLog.Infof("Device: %s, Major: %d, Minor: %d", device.Source, major, minor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use fields for this? Something like:
agentLog.WithFields(logrus.Fields{
"device-path": device.Source,
"device-major": major,
"device-minor": minor,
}).Info("handling block device")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return nil | ||
} | ||
|
||
uEvHandler, err := uevent.NewHandler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of these handlers is created for every block device passed in the spec. Is that going to be a problem from a performance perspective do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't added this code in this PR. I only did a copy/paste from mount.go
.
device.go
Outdated
} | ||
|
||
if !updated { | ||
return fmt.Errorf("Should have found a matching device in the spec") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is the previous log call, but it might be useful to atleast log device.Source
here so it's clear which device wasn't found in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
device.go
Outdated
updated := false | ||
for idx, d := range spec.Linux.Devices { | ||
if d.Path == device.MountPoint { | ||
spec.Linux.Devices[idx].Major = major |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overwriting the original values right (https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#devices)? If so, should we log the old ones just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
grpc.go
Outdated
|
||
if mnt.Driver == "" { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming an empty driver will mean we have a mount, vs a device if the driver field contains something
If we agree on this, please can you add that comment to the definition to make this assumption clear to all:
@jodh-intel I have updated the comment related to the driver field. Also, I am adding some unit tests for the new functions. |
0c470fb
to
9f97a8f
Compare
@jodh-intel PR updated |
device.go
Outdated
type deviceDriversHandler func(device pb.Storage, spec *pb.Spec) error | ||
|
||
var deviceDriversHandlerList = map[string]deviceDriversHandler{ | ||
"blk": blockDeviceHandler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing the driver name to "device"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean instead of "blk" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever the name is, it needs to be defined in the protocol instead of here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code will not be only used for blk devices. so "blk" is not a proper name.
Even "device" doesn't provide enough information. IIUC, it is oci-style passing the devices which can be seen by the runtime/libcontainer into the container. Using "oci-device" or "device-passthrough" for the driver name instead of "blk" could be a good choice either.
Drivers are a part of the agent protocol, but I think just putting some comments on the api documents is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comments are fine to me. The point is that the supported values need to be documented there.
The association with pb.Storage
makes the handler block device specific. I think we can:
- remove the association with
pb.Storage
in the handler, and rename "blk` to a more generic name - make sure that if a block device
dev.Path
matchespb.Storage.Source
, it will be mounted atpb.Storage.Mountpoint
9f97a8f
to
f5e3907
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to clearly define and document two things at the protocol level:
- ALL supported values in the
pb.Storage.Driver
field. - The relationship between
pb.Storage
andociSpec.Linux.Devices
device.go
Outdated
type deviceDriversHandler func(device pb.Storage, spec *pb.Spec) error | ||
|
||
var deviceDriversHandlerList = map[string]deviceDriversHandler{ | ||
"blk": blockDeviceHandler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever the name is, it needs to be defined in the protocol instead of here.
device.go
Outdated
// Update the spec | ||
updated := false | ||
for idx, d := range spec.Linux.Devices { | ||
if d.Path == device.MountPoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.Path
in the OCI spec is the device path. How do you expect it to be a MountPoint
path here?
It seems that you really need d.Path == device.Source
instead? But again, this function only works for block devices that are also present in the storage spec (which is also useful since agent can mount the block devices for containers), whereas the commit message and #132 both intend to provide a generic way to pass host devices to the sandbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no, I need MountPoint
because this is the expected destination for the device, and we expect this destination to be the source of the device which has been passed-through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OCI spec, spec.Linux.Devices.Path
is a device path like /dev/sdb
and libcontainer would call mknodat
to create it with the provided device numbers. While in kata agent protocol, pb.Storage.MountPoint
the is mount point of the storage which is a directory outside of the container rootfs. How do you expect them to be equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf I guess we have some misunderstanding here. Can you please provide an example config of spec.Linux.Devices
and pb.Storage
that you intend to handle with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so let me try to clarify here. In case we want to pass-through a block device such as /dev/sdb
from the host to the container, we need to hotplug this device into the VM. After it has been hotplugged, it will show up as /dev/vdX
name that we can anticipate from the runtime. That's why I expect the runtime to provide the right name /dev/vdX
through the Source
field, and the name expected by the OCI spec through the Mountpoint
field, so that we can match with the spec which entry we should modify. Once we have found the device /dev/vdX
, we get the real major/minor numbers and we update them into the OCI spec for the corresponding device. If we don't update them, libcontainer will create a new device inside the container based on no real device, which means the device will be there but not linked to the real device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, Mountpoint
is not really a mount point of the device, but a device name inside the container. If we do want to make it so, I'd suggest changing the name from Mountpoint
to something like Target
, and document that it can either be a mount point of the device, or a device name to be represented inside the container, depending on the value of Storage.Driver
.
And there must be 1:1 mapping between pb.Storage.Target
and spec.Linux.Devices
. Maybe we should make pb.Storage.Target
a slice so that a device and be represented at multiple places in a container or to multiple containers in the sandbox?
@bergwolf sure, I can put a clear definition of what is expected into the protocol, this makes sense. But I'd like to understand what are ALL the different values possible for |
@sboeuf by ALL, I mean all the currently supported values. An empty string and "device" (or any proper name picked in this PR) are the only two currently supported. We can add more when necessary. |
@laijs please let me know what you think about this comment:
|
sorry, I hit the wrong button... |
@sboeuf DriverOptions can be used in this case. |
@laijs @sameo @jodh-intel @mcastelino @grahamwhaley @WeiZhang555 |
f5e3907
to
cdba0c4
Compare
"blk": blockDeviceHandler, | ||
} | ||
|
||
func blockDeviceHandler(device pb.Device, spec *pb.Spec) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is no longer block device specific, maybe just a call it genericDeviceHandler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get your point here. We still need a deviceHandler
for each type of device that is going to be passed, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but blockDeviceHandler
is not really block device specific now, right? I was hoping that we can use it to handle some other types of devices as well. There are two ways of doing it:
- rename
blk
to a more generic name for devices - define other types of
deviceDriversHandler
and use the same handler to handle them
I'm OK with either way but the handler function name needs also to be generic.
Anyway it is just a minor issue. We can get it in as is and change it later when adding other device types support.
grpc.go
Outdated
@@ -340,6 +359,10 @@ func (a *agentGRPC) CreateContainer(ctx context.Context, req *pb.CreateContainer | |||
return emptyResp, err | |||
} | |||
|
|||
if err := addDevices(req.Devices, req.OCI); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addDevices
should be done before addMounts
so that the latter can find the block device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have definitely missed that, and this is needed since we expect to be able to mount only after the devices have been discovered inside the VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated.
cdba0c4
to
7cc12b2
Compare
protocols/grpc/agent.proto
Outdated
// case of rootfs, when a device has to be waited for after it has | ||
// been hotplugged. An equivalent Storage entry should be defined if | ||
// any mount needs to be performed afterwards. | ||
string ctr_path = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer calling this container_path instead of ctr_path here. We are losing readability for the sake of brevity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed ! I have changed it into container_path
This commit adds a new Device structure in order to avoid confusion between what should be mounted and what is a device. When creating a new container, we expect the caller to pass a list of storages, and a list of devices. When adding a host device to the container by using things like "docker run --device /dev/sda1:/dev/blk1", the runtime will not be able to anticipate the major/minor numbers for the device being hotplugged into the VM. This means the agent has to update the list of devices provided through the spec, by relying on the list of devices provided. Fixes kata-containers#132 Signed-off-by: Sebastien Boeuf <[email protected]>
7cc12b2
to
5e7ea0f
Compare
It seems ci on Azure is still WiP. Merging. |
This commit adds a new Device structure in order to avoid confusion
between what should be mounted and what is a device. When creating
a new container, we expect the caller to pass a list of storages,
and a list of devices.
When adding a host device to the container by using things like
"docker run --device /dev/sda1:/dev/blk1", the runtime will not
be able to anticipate the major/minor numbers for the device
being hotplugged into the VM. This means the agent has to update
the list of devices provided through the spec, by relying on the
list of devices provided.
Fixes #132
Signed-off-by: Sebastien Boeuf [email protected]