Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
i#6938 sched migrate: Separate run queue per output (#6985)
Removes the global runqueue and global sched_lock_, replacing with per-output runqueues which each have a lock inside a new struct input_queue_t which clearly delineates what the lock protects. The unscheduled queue remains global and has its own lock as another input_queue_t. The output fields .active and .cur_time are now atomics, as they are accessed from other outputs yet are separate from the queue and its mutex. Makes the runqueue lock usage narrow, avoiding holding locks across the larger functions. Establishes a lock ordering convention: input > output > unsched. The removal of the global sched_lock_ avoids the lock contention seen on fast analyzers (the original design targeted heavyweight simulators). On a large internal trace with hundreds of threads on >100 cores we were seeing 41% of lock attempts collide with the global queue: ``` [scheduler] Schedule lock acquired : 72674364 [scheduler] Schedule lock contended : 30144911 ``` With separate runqueues we see < 1 in 10,000 collide: ``` [scheduler] Stats for output #0 <...> [scheduler] Runqueue lock acquired : 34594996 [scheduler] Runqueue lock contended : 29 [scheduler] Stats for output #1 <...> [scheduler] Runqueue lock acquired : 51130763 [scheduler] Runqueue lock contended : 41 <...> [scheduler] Runqueue lock acquired : 46305755 [scheduler] Runqueue lock contended : 44 [scheduler] Unscheduled queue lock acquired : 27834 [scheduler] Unscheduled queue lock contended : 273 $ egrep 'contend' OUT | awk '{n+=$NF}END{ print n}' 11528 $ egrep 'acq' OUT | awk '{n+=$NF}END{ print n}' 6814820713 (gdb) p 11528/6814820713.*100 $1 = 0.00016916072315753086 ``` Before an output goes idle, it attempts to steal work from another output's runqueue. A new input option is added controlling the migration threshold to avoid moving jobs too frequently. The stealing is done inside eof_or_idle() which now returns a new internal status code STATUS_STOLE so the various callers can be sure to read the next record. Adds a periodic rebalancing with a period equal to another new input option. Adds flexible_queue_t::back() for rebalancing to not take from the front of the queues. Updates an output going inactive and promoting everything-unscheduled to use the new rebalancing. Makes output_info_t.active atomic as it is read by other outputs during stealing and rebalancing. Adds statistics on the stealing and rebalancing instances. Updates all of the unit tests, many of which now have different resulting schedules. Adds a new unit test targeting queue rebalancing. Tested under ThreadSanitizer for race detection on a relatively large trace on 90 cores. Issue: #6938
- Loading branch information