fix: setting api_event_fd behave as a semaphore

A race condition has been identified between the API thread and the VMM thread due to a misconfiguration of the `api_event_fd`. What happens behind the scene: The API thread receives API requests that need to be handled one by one. Handling a request by the API thread means the following actions: 1. Write the request on the channel to the VMM thread 2. Notify the VMM thread that it has a new request to handle 3. Call a blocking read on the channel from the VMM in order to get the results On the VMM thread side, handling such a request represents the opposite actions: 1. Wait until getting notified that a new request needs its attention 2. Handle the request 3. Write the result on the channel to the API thread 4. Clear out the notification done by the API thread about the request previously handled Step 2 from the API thread side and step 4 from the VMM thread side are actually implemented using an EventFD (`api_event_fd`) that should work as a semaphore: everytime a new event arrives at the API thread, the internal counter of the EventFD should be incremented by 1 (by the API thread by calling write(1)). When the VMM thread finishes handling a request and sends results back to the API thread, the internal counter of the EventFD should be decremented by 1 (by the VMM thread by calling read()). The actual race condition appears because the `api_event_fd` was initialized with the `initval` set to 0. According to the spec [1] an EventFD behaves as we need only if it is initialized with the `initval` set to `EFD_SEMAPHORE` which has a value of 1 actually [2] Currently, we had it initialized with a `initval` of 0 which means that calling `read()` on it will reset the internal counter of the EventFD to 0. That means that in a situation when the API thread is very fast in comparison with the VMM thread and calls `write(1)` for a second request (after the moment it gets the results for the first request but before the VMM thread calls `read()` for the first request) then the `read()` done by the VMM thread to clear out the notification for the first request will actually clear out the notification for the second request as well. Thus, the VMM thread will not know that it has to process another request and the API thread gets blocks waiting for the resutls of the second request that will never get produced leading to a timeout of the second API request. Fixing this by setting the appropiate `initval` for the `api_event_fd` which is the `EFD_SEMAPHORE` in this case. Furthermore, we switched the order of steps on VMM thread's side described above and moved step 4 in between steps 1 and 2 such that we got a barrier just by a proper ordering of the actions (now the VMM thread clears out the notifification from the `api_event_fd` before handling the request which will not allow the API thread to be unblocked and put another request in the queue). This approach is used in the pre-boot phase when the VMM thread is handling microVM configuration requests. [1]: https://man7.org/linux/man-pages/man2/eventfd.2.html [2]: https://elixir.bootlin.com/linux/v5.10.176/source/include/linux/eventfd.h#L25 (cherry-picked from ee9bb09) Signed-off-by: Trăistaru Andrei Cristian <[email protected]>
firecracker-microvm · Apr 6, 2023 · 9778a67 · 9778a67
1 parent 30ef8a3
commit 9778a67
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/src/firecracker/src/api_server_adapter.rs b/src/firecracker/src/api_server_adapter.rs
@@ -71,6 +71,7 @@ impl MutEventSubscriber for ApiServerAdapter {
         let event_set = event.event_set();
 
         if source == self.api_event_fd.as_raw_fd() && event_set == EventSet::IN {
+            let _ = self.api_event_fd.read();
             match self.from_api.try_recv() {
                 Ok(api_request) => {
                     let request_is_pause = *api_request == VmmAction::Pause;
@@ -101,7 +102,6 @@ impl MutEventSubscriber for ApiServerAdapter {
                     panic!("The channel's sending half was disconnected. Cannot receive data.");
                 }
             };
-            let _ = self.api_event_fd.read();
         } else {
             error!("Spurious EventManager event for handler: ApiServerAdapter");
         }
@@ -129,7 +129,7 @@ pub(crate) fn run_with_api(
     // FD to notify of API events. This is a blocking eventfd by design.
     // It is used in the config/pre-boot loop which is a simple blocking loop
     // which only consumes API events.
-    let api_event_fd = EventFd::new(0).expect("Cannot create API Eventfd.");
+    let api_event_fd = EventFd::new(libc::EFD_SEMAPHORE).expect("Cannot create API Eventfd.");
 
     // Channels for both directions between Vmm and Api threads.
     let (to_vmm, from_api) = channel();