-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCU CPU stall warning in a multi-core system simulation #51
Comments
Although implementing multi-threaded system emulation can significantly alleviate this issue, I think it does not mean the problem will no longer occur after multi-threaded system emulation is complete. Perhaps we can start by optimizing the timer. Currently, the function In #49, it was suggested that lowering the frequency set in To strike a balance between these two extremes, I think we can maintain a dedicated emulator timer and updating it at the start of each emulation cycle. This way, I made some modifications to test this approach, and it did result in a slight performance improvement for the emulator. On my machine, with SMP=6, the RCU CPU Stall warning no longer appears. However, with SMP=8, the warning still occurs. |
I completely agree that your implementation can help avoid RCU CPU stall warnings. However, as the number of simulated cores increases, the warning is likely to occur again unless the emulation period per cycle is also increased proportionally to the number of cores. Is my understanding correct? I rebuilt the Linux kernel with
But preemption may increase the frequency of context switches, causing overall CPU usage to increase. |
yes, the improvement is limited, as the number of cores grow up, the warning would occur again. |
The accuracy of the timer primarily affects user programs. I think that during the boot process, it is unnecessary to use such a high-precision timer. Instead, a less precise timer, or even one that simply increments in a straightforward manner, could be used. After the boot process is complete, the system can switch back to a more precise timer. Here is a simple example. I added a global flag static void op_sret(hart_t *vm)
{
/* Restore from stack */
vm->pc = vm->sepc;
mmu_invalidate(vm);
vm->s_mode = vm->sstatus_spp;
vm->sstatus_sie = vm->sstatus_spie;
/* After the booting process is complete, initrd will be loaded. At this
* point, the sytstem will switch to U mode for the first time. Therefore,
* by checking whether the switch to U mode has already occurred, we can
* determine if the boot process has been completed.
*/
if (!boot_complete && !vm->s_mode)
boot_complete = true;
/* Reset stack */
vm->sstatus_spp = false;
vm->sstatus_spie = true;
} Before the boot process is complete, I didn't use bool boot_complete = false;
static struct timespec host_time;
// ...
void semu_timer_init(semu_timer_t *timer, uint64_t freq)
{
timer->freq = freq;
clock_gettime(CLOCKID, &host_time);
semu_timer_rebase(timer, 0);
}
static uint64_t semu_timer_clocksource(uint64_t freq)
{
#if defined(HAVE_POSIX_TIMER)
if (boot_complete) {
clock_gettime(CLOCKID, &host_time);
return host_time.tv_sec * freq +
mult_frac(host_time.tv_nsec, freq, 1e9);
} else {
return host_time.tv_sec * freq +
mult_frac(host_time.tv_nsec++, freq / 1000, 1e9);
}
// ...
#endif
}
uint64_t semu_timer_get(semu_timer_t *timer)
{
/* Rebase the timer to the current time after the boot process. */
static bool first = true;
if (first && boot_complete) {
first = false;
timer->begin = semu_timer_clocksource(timer->freq);
}
return semu_timer_clocksource(timer->freq) - timer->begin;
}
// ... This is the sample output:
|
I agree that your changes can quickly and easily resolve the RCU stall warning issue! However, your log cannot represent the actual boot time in this situation ( I try to reproduce your work, the RCU stall warning is resolved when simulate SMP=32 [ 0.007006] Run /init as init process
[ 0.026121] hrtimer: interrupt took 12000723 ns
Starting syslogd: OK another message A quick research that this warning is produced by I'm not sure is any side effect here |
Yes, the primary reason for incrementing Once the system switches to U mode for the first time, the As for the HRT warning, I think it was triggered due to a sudden jump in the system clock right after the boot process completed. I'm not sure is any side effect here too. |
This option was set for the sake of benchmarking purpose. RT-Tests relies on HRT features. |
The timer increments should align with the frequency defined in the device tree. We could use an approach similar to BogoMips to make the necessary adjustments. |
After multiple attempts, I realize that independently maintaining In contrast, I believe maintaining a frequency scaling factor is a better solution. As mentioned in #49, this achieves the purpose of slowing down time during the boot process. Since it’s merely a scaling factor, we can still derive real-time values from it. As for the Interestingly, on my machine, the warning doesn't appear at all with SMP=16, regardless of how the frequency is adjusted. Moreover, the current implementation only affects the timer during the boot process; after switching to U-mode, the timer behaves exactly as before. Therefore, I believe this warning is due to the current sequential emulation approach. After the multi-threaded emulation is implemented, I think the situation would be mitigated a lot. Here’s a diagram of the overall flow: Below is a simple example: static uint64_t semu_timer_clocksource(uint64_t freq)
{
#if defined(HAVE_POSIX_TIMER)
struct timespec t;
clock_gettime(CLOCKID, &t);
if (boot_complete)
return t.tv_sec * freq + mult_frac(t.tv_nsec, freq, 1e9);
else
return t.tv_sec * (freq / 100) + mult_frac(t.tv_nsec, (freq / 100), 1e9);
#elif defined(HAVE_MACH_TIMER)
static mach_timebase_info_data_t t;
if (t.denom == 0)
(void) mach_timebase_info(&t);
return mult_frac(mult_frac(mach_absolute_time(), t.numer, t.denom), freq,
1e9);
#else
return time(0) * freq;
#endif
}
uint64_t semu_timer_get(semu_timer_t *timer)
{
static bool first = true;
if (first && boot_complete) {
first = false;
semu_timer_rebase(timer, 0);
printf("\033[1;33m[SEMU LOG]: Switch to real time\033[0m\n");
}
return semu_timer_clocksource(timer->freq) - timer->begin;
} I think this approach is better than maintaining two separate timers during the boot process. Dividing the frequency by 100 means the boot process operates at one one-hundredth of real-time, allowing us to easily derive the actual boot time. This scaling factor can also be configured in the Makefile. I attempted to dynamically measure the cost of Here is the output of the test:
Another output for the factor set to 50:
Also another output of the factor set to 10:
In my environment, even with varying scale factors, the hrtimer warning consistently appeared at approximately 60000000 ns. I think this observation supports my hypothesis. |
Here is a summary of two potential approaches to mitigate RCU CPU stalls under the current sequentially-emulation scenario, we have two methods now Methods
1. Scale FrequencyThis method involves calling Pros
Cons
2. Manually Increment
|
SMP | times call semu_timer_clocksource |
time(sec) of boot process | hrtimer warning |
---|---|---|---|
1 | 223,992,364 | 3.40001 | |
2 | 382,486,686 | 8.01002 | |
3 | 577,491,593 | 13.44003 | |
4 | 774,125,110 | 17.85185 | |
5 | 973,274,729 | 22.94007 | |
6 | 1,174,038,398 | 27.11009 | |
7 | 1,377,244,622 | 31.80010 | |
8 | 1,605,001,986 | 37.52011 | |
9 | 1,793,136,295 | 41.41014 | |
10 | 2,005,988,752 | 45.53015 | |
11 | 2,220,126,569 | 51.66018 | |
12 | 2,440,897,255 | 56.13018 | |
13 | 2,651,860,790 | 60.71019 | |
14 | 2,882,701,067 | 65.92020 | |
15 | 3,103,978,838 | 70.30022 | |
16 | 3,343,030,072 | 76.31025 | |
17 | 3,566,365,881 | 80.24026 | |
18 | 3,800,214,669 | 86.59028 | |
19 | 4,031,961,176 | 92.00030 | |
20 | 4,280,331,336 | 94.47030 | |
21 | 4,516,731,902 | 101.68033 | |
22 | 4,883,959,327 | 104.95035 | |
23 | 5,143,022,258 | 110.69036 | |
24 | 5,260,058,753 | 118.59098 | |
25 | 5,526,277,854 | 125.30041 | |
26 | 5,790,681,086 | 132.98045 | 50000184 ns |
27 | 6,044,658,240 | 140.04046 | 80000307 ns |
28 | 6,328,119,424 | 146.18047 | 60000231 ns |
29 | 6,598,156,499 | 154.15050 | 80000261 ns |
30 | 6,868,480,625 | 159.83052 | 90000308 ns |
31 | 7,129,979,196 | 163.82054 | 50000169 ns |
32 | 7,410,129,712 | 170.80054 | 80000508 ns |
Tests were also conducted on my workstation:
SMP | times call semu_timer_clocksource |
time(sec) of boot process | hrtimer warning |
---|---|---|---|
1 | 223,450,834 | 15.21302 | |
2 | 388,551,174 | 31.45406 | |
3 | 586,279,749 | 48.33009 | |
4 | 791,644,232 | 68.00714 | |
5 | 1,003,639,012 | 83.64418 | |
6 | 1,216,761,778 | 99.95122 | 12000031 ns |
7 | 1,438,276,507 | 120.21144 | 14000047 ns |
8 | 1,704,344,789 | 122.50440 | 11000030 ns |
9 | 1,900,605,464 | 156.91848 | 10000031 ns |
10 | 2,140,147,966 | 176.43249 | 11000031 ns |
11 | 2,451,031,756 | 179.20599 | 12000062 ns |
12 | 2,633,717,918 | 217.70393 | 14000046 ns |
13 | 2,993,790,985 | 216.13076 | 15000046 ns |
14 | 3,165,383,012 | 262.75081 | 14000046 ns |
15 | 3,437,855,090 | 286.43180 | 15000015 ns |
Since the workstation was slow, the execution time was long. Thus I just statistics until SMP=15
.
Target Time Configuration
To use the second method, a target time need to be determined. If a target boot time of 10 seconds is set, nsec
increment values can be calculated based on the SMP parameter.
For example, with SMP=4
and a target time of 10 seconds (semu_timer_clocksource
adds approximately:
to nsec
.
However, this method may introduce timing discrepancies across different environments. For instance, with SMP=1
, the boot process takes approximately 3 seconds on my personal computer but 18 seconds on my workstation, resulting in a sixfold difference.
This leads to an implicit problem: if adding a core increases the number of semu_timer_clocksource
calls by an approximate number semu_timer_clocksource
will increment nsec
by approximately:
where SMPs
represents the number followed by SMP parameter.
Under this method, the time in emulator during boot process is calculated as:
If the assumption of semu_timer_clocksource
calls exceeds
The value
$2 \times 10^8$ was derived from tests on my personal computer and workstation. Despite the sixfold difference in execution times, the number ofsemu_timer_clocksource
calls was remarkably consistent, leading to this assumption. The corresponding numbers could be checked in the tables above: for each increment of the SMP parameter, the number of calls tosemu_timer_clocksource
roughly increases by$2 \times 10^8$ .
If we still want a coarse-grained timer during the boot process to roughly approximate real-world time, clock_gettime
could be called at specific intervals (e.g., every
However, if the actual number of semu_timer_clocksource
calls exceeds clock_gettime
may lead to time regression if real-world time is less than emulation time.
In contrast, if the number of semu_timer_clocksource
calls is too low, time will continue to increment, leading only to a deviation that can be corrected via rebase.
Although manually incrementing nsec
avoids calling clock_gettime
, differences in execution time across environments reduce the stability of RCU CPU stall warning mitigation. This also eliminates the ability to correlate boot process logs with real-world time.
Nonetheless, since boot process timing may not be critical, meanwhile, as the number of harts increment, we can easily notice that the execution time of boot process is getting longer and longer, so I think the benefits of avoiding the call of clock_gettime
still remain attractive.
In my opinion,
- If boot time accuracy matters: Use scaled frequency and continue relying on
clock_gettime
to updatensec
. There is an simple example code in the previous discussion of this issue. - If boot time accuracy does not matter: Manually update
nsec
without scaling frequency. Incrementnsec
by the method mentioned above.- If the boot process timing is completely irrelevant, I think even just update
nsec
by an really small PRNG number like1~3
is okay
- If the boot process timing is completely irrelevant, I think even just update
Maybe we can discuss which method to adopt or any better modifications here. Once decided, I think I can start to submit a PR.
@chiangkd and @RinHizakura, please comment the above. |
I tend to prefer using "scaled frequency", based on your analysis.
In my opinion, using real-world time offers valuable benefits for developers. It aids in analyzing and identifying potential improvements to accelerate the booting process (e.g., multi-threaded simulations). |
Execute semu with multi-core system simulation
The RCU CPU stall warning, as discussed in #49 , is accompanied by an increase in timer interrupts and
clock_gettime
system calls to produce a real-time timer. This causes the CPU to wait longer than a typical grace period, which is usually 21 seconds.Implementing multi-thread support for semu, as discussed in PR #49, might improve the performance of the booting process.
The text was updated successfully, but these errors were encountered: