Consider changing the vhost-user-backend API #279

mtjhrc · 2024-11-20T15:44:49Z

Background

Hi, I have been working on a vhost-user-gpu with @dorindabassey and we ran into some complications.
I think there are some things about how VhostUserBackend is designed and generally the API model that are not great for the GPU device, and in general for other devices too - personally I am also familiar with virtio-net and virtio-console devices because I worked on them in libkrun. The goal would be to make writing new devices and mainting existing ones easier by making the API more flexible.

Current API can be clunky to use in some cases:

Currently the vhost-user-backend handles spawning worker threads for the device, and calls the handle_event callback on that thread. I don't think this model is good, because it leads to some difficulties when programming.
Here are the main issues:

1. Where to store the data of a worker thread

Consider you have a worker thread that wants to have some variables only available to this single thread, this is currently difficult to do. Basically you have a couple of options:

Store the data inside the Backend struct and wrap in a a Mutex/RwLock - for example virtio-sound does this. I feel like this adds unnecessary complexity, and unless I am missing something, we could get rid of some of the extra locking (by using the GuestMemoryAtomic we already have).
If the thread specific data is !Send - such Rutabaga/virglrenderer context in the vhost-user-gpu, your only option is to lazy initialize the struct (putting something !Send inside a Mutex doesn't help), and use thread_local! or use unsafe to work around this.

2. Adding file descriptors for polling after starting the backend

In vhost-user-gpu we needed to add a file descriptor to be signaled by handle_event inside handle_event this is very akward to do. If you want to do this you firt need to pass the epoll handler created by the daemon to the device backend. In handle_event you now have access to the epoll, but but you have to be careful, because you cannot use the VhostUserBackendMut trait and the convenient lock around the whole struct, because if handle_event is called while the lock is held, you would deadlock yourself when num_queues is called here:

vhost/vhost-user-backend/src/event_loop.rs

Line 117 in 176f44a

if data <= self.backend.num_queues() as u64 {

3. `VhostUserBackend` vs `VhostUserBackendMut`

In general the split VhostUserBackend and VhostUserBackendMut traits doesn't seem to me like a good design choice. I feel like when you specify that you want 1 worker thread and you use the VhostUserBackendMut trait it mostly works fine (except the !Send problem), but if you want 2 worker threads or something more complex, you have to change over to VhostUserBackend and use manual locking. Another problem is that if you don't switch to using the VhostUserBackend trait you would have 2 worker threads, but the handle_event could never run at the same time because it is behind a lock leading to a non-obvious performance bug.

More flexible event handling API

My main idea is to just let the device spawn it's own worker threads and use epoll directly in them instead of having the handle_event callback. This would basically change the event system from "push" to "poll". I think this would be a lot more flexible, because it would allow:

any custom thread <-> vring mapping
adding and removing file descriptors to watch easily
having detailed control of which thread has access to what leading to more understandable and efficient code

One of the problems with the current implementation, is that since handle_event is a method and a callback the type systems basically forces self to be Send + Sync, which in practise forces you to lock everything, which could be avoided.

struct FooDevice { ... };

fn worker(mem: GuestMemoryAtomic<GuestMemoryMmap>, vring: VringRwLock, exit_event: EventFd) {
    // Epoll abstraction details are not that important for this proposal, this could change
    Poller poller;
    let vring_ready_event = vring.ready_event();
    poller.add(vring_ready_event);
    poller.add(exit_event);
    
    loop {
        let event = poller.poll();

        if event == exit_event {
            return
        }

        if event == vring_ready_event {
            // handle the vring event here
            // you can even add/remove event file descriptors to the epoll
        }
    }     
}

impl VhostUserBackend for FooDevice {
    
    fn num_queues(&self) -> usize { ... }
    fn max_queue_size(&self) -> usize { .. }
    fn features(&self) -> u64 { ... }
    fn protocol_features(&self) -> VhostUserProtocolFeatures { ... }
    // Other methods like before, except the `queues_per_thread(&self)` which would be removed


    // The entry point where we start the device, this would require the device starting at least 1 worker thread, but we do that today anyway so it shouldn't be an issue.
    // (The exact mem, and vring types should be more generic, this is just an example) 
    fn start(&mut self, vrings: Box<[VringRwLock]>, quit_event: EventFd) -> Result<()> {
        // Notice the thread doesn't have to have a reference to `self` 
// if you need to shared state between the worker threads, you can use something like Arc<Mutex<YourSharedDataStruct>>, or even a channel.
        thread::spawn(move || worker(self.mem.clone(), vrings[0], quit_event));
        Ok(())
    }
}

Consider having devices also be usable as in-process devices by VMMs

Since I am proposing changing the API it could be beneficial to make a bit more breaking changes and also allow consuming the devices as in-process devices. Actually we aren't that far from this, but this would of course require some more thoughs about the abstractions.
This could be useful for libkrun and other VMMs (anyone else interested?)

Questions

I can create a more detailed proposals of APIs, but first I wanted to see:

Is there interest in doing smaller breaking changes to the API like I proposed?
Is there interest in doing bigger API changes to also allow consuming the devices as in-process devices?

The text was updated successfully, but these errors were encountered:

stefano-garzarella · 2024-12-19T08:53:11Z

@mtjhrc sorry for my late reply.

I think this work will be great! I was looking to support macOS/BSD and our dependency on Linux specific stuff like epoll/eventfd is the biggest thing to be solved. Your work may simplify that.

I can create a more detailed proposals of APIs, but first I wanted to see:
1. Is there interest in doing smaller breaking changes to the API like I proposed?\

I think it's fine, but we should help our user (vhost-device, virtiofsd, etc.) to adapt easily their code.

2. Is there interest in doing bigger API changes to also allow consuming the devices as in-process devices?

Sure, no objection on my side.

@Ablu @eryugey @germag @jiangliu @sboeuf @slp WDYT?

slp · 2024-12-19T09:13:46Z

2. Is there interest in doing bigger API changes to also allow consuming the devices as in-process devices?
Sure, no objection on my side.

@Ablu @eryugey @germag @jiangliu @sboeuf @slp WDYT?

I think being able to unify out-of-process and in-process device implementations alone is already worth the effort. 👍

mtjhrc · 2024-12-19T10:10:05Z

think this work will be great! I was looking to support macOS/BSD and our dependency on Linux specific stuff like epoll/eventfd is the biggest thing to be solved. Your work may simplify that.

I don't have a macOS machine, but this is definitely something I am keeping in mind. libkrun supports macOS, so if it were to use these devices, macOS support is necessary.

stefano-garzarella · 2024-12-19T10:16:00Z

think this work will be great! I was looking to support macOS/BSD and our dependency on Linux specific stuff like epoll/eventfd is the biggest thing to be solved. Your work may simplify that.

I don't have a macOS machine, but this is definitely something I am keeping in mind. libkrun supports macOS, so if it were to use these devices, macOS support is necessary.

FYI I'll talk about it at FOSDEM next year: https://fosdem.org/2025/schedule/event/fosdem-2025-5100-can-qemu-and-vhost-user-devices-be-used-on-macos-and-bsd-/

I can mention your work ;-)

alyssais · 2024-12-19T11:35:24Z

FWIW: libepoll-shim can be used on BSD and macOS.

stefano-garzarella · 2024-12-19T11:46:55Z

FWIW: libepoll-shim can be used on BSD and macOS.

@alyssais Thanks, I was thinking also on https://github.com/smol-rs/polling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider changing the vhost-user-backend API #279

Consider changing the vhost-user-backend API #279

mtjhrc commented Nov 20, 2024 •

edited

Loading

stefano-garzarella commented Dec 19, 2024

slp commented Dec 19, 2024

mtjhrc commented Dec 19, 2024

stefano-garzarella commented Dec 19, 2024

alyssais commented Dec 19, 2024

stefano-garzarella commented Dec 19, 2024

Consider changing the vhost-user-backend API #279

Consider changing the vhost-user-backend API #279

Comments

mtjhrc commented Nov 20, 2024 • edited Loading

Background

Current API can be clunky to use in some cases:

1. Where to store the data of a worker thread

2. Adding file descriptors for polling after starting the backend

3. VhostUserBackend vs VhostUserBackendMut

More flexible event handling API

Consider having devices also be usable as in-process devices by VMMs

Questions

stefano-garzarella commented Dec 19, 2024

slp commented Dec 19, 2024

mtjhrc commented Dec 19, 2024

stefano-garzarella commented Dec 19, 2024

alyssais commented Dec 19, 2024

stefano-garzarella commented Dec 19, 2024

mtjhrc commented Nov 20, 2024 •

edited

Loading

3. `VhostUserBackend` vs `VhostUserBackendMut`