Skip to content

Commit

Permalink
README: add a few more results
Browse files Browse the repository at this point in the history
  • Loading branch information
graysky authored and graysky committed Sep 14, 2024
1 parent 042e80e commit 67e052d
Show file tree
Hide file tree
Showing 7 changed files with 378 additions and 60 deletions.
206 changes: 183 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ The kernel uses its own set of CFLAGS, KCFLAGS. For example, see:
### Alternative way to define a -march= option without this patch
As pointed out by codemac in [this topic](https://bbs.archlinux.org/viewtopic.php?id=281639), one can simply export the value/values for the `KCFLAGS` and `KCPPFLAGS` before calling `make` to achieve the same result, see [here](https://github.com/torvalds/linux/blob/88603b6dc419445847923fcb7fe5080067a30f98/Makefile#L1112).
```
export KCFLAGS=' -march=znver3 -mtune=znver3'
export KCPPFLAGS=' -march=znver3 -mtune=znver3'
export KCFLAGS=' -march=znver3'
export KCPPFLAGS=' -march=znver3'
make all
```

Expand Down Expand Up @@ -276,66 +276,226 @@ make all

# Benchmarks
## Setup
A test machine with an AMD Ryzen 9 5950X CPU was used to measure the time it took to `make -j33 bzImage` of the linux kernel source v6.10.10 (`.config` generated by `make x86_64_defconfig` prior).

Three separate kernels (v6.10.10) were first compiled from source patched with [more-uarches-for-kernel-6.8-rc4+.patch](https://github.com/graysky2/kernel_compiler_patch/blob/master/more-uarches-for-kernel-6.8-rc4%2B.patch).
* Kernel 1 used the default menu config option for Processor family = `Generic-x86-64`
* Kernel 2 used the menu config option for Processor family = `AMD-x86-64-v3`
* Kernel 3 used the menu config option for Processor family = `AMD Zen 3`
The test machine measured the time it took to `make bzImage` of the linux kernel source (`.config` generated by `make x86_64_defconfig` prior).

The machine was booted into each kernel and the make test was conducted. Then the next kernel was installed and the machine was booted into it to run the test.
Three separate test machines were evaluated:
1. AMD Ryzen 9 5950X
2. Intel i7-4790K
3. Intel N100

Separate kernels were first compiled from source patched with [more-uarches-for-kernel-6.8-rc4+.patch](https://github.com/graysky2/kernel_compiler_patch/blob/master/more-uarches-for-kernel-6.8-rc4%2B.patch).
* Kernel 1 used the default menu config option for Processor family = `Generic x86-64`
* Kernel 2 used the menu config option for Processor family = `AMD x86-64-v3` or `Intel x86-64-v3`
* Kernel 3 used the menu config option for Processor family = `AMD Zen 3` or `Intel Haswell` or `Intel Alder Lake`

Each machine was booted into its respective kernel and the make test was conducted. Then the next kernel was installed and the machine was booted into it and the make test was again conducted.

## Conclusion
Both of the kernels built with the optimized processor family (AMD-x86-v3 and AMD Zen 3) ran the compile test faster than the kernel compiled with the default processor family (x86-64) by a small but statistically significant amount as measured by this make compilation.
Consistently across all three test machines, the kernels built with the optimized processor family options introduced by the patch hosted in this repo ran the compile test faster than the kernel compiled with the default processor family option by a small (<1% difference) but statistically significant amount as measured by this make compilation.

## Discussion
1. All the assumptions for ANOVA are met:
* Data are normally distributed
* The population variances are fairly equal
2. The boxplot plot clearly show significance for either pair-wise comparison (x86-64 vs x86-64-v3 or znver3)
* Pair-wise analysis by Tukey-Kramer shows significance at the p<0.001 level
2. The boxplot plot clearly show significance for either pair-wise comparison
* Pair-wise analysis by Tukey-Kramer data shown for all pairs (see tables)

In other words, x86-64-v3 is significantly different from x86-64 and znver3 is significantly different from x86-64, but x86-64-v3 is not different from znver3.
In other words, x86-64-v3 is significantly different from generic x86-64. The various subtargets are also significantly different from x86-64.

### Stats
### Stats for Machine 1. AMD Ryzen 9 X5950
<table>
<tr>
<th>Processor family</th>
<th>Processor family option</th>
<th>Mean compile time</th>
<th>Std dev</th>
<th># of replicates</th>
</tr>
<tr>
<td>x86-64 generic</td>
<td>79.8001 sec</td>
<td>Generic x86-64</td>
<td>79.800 sec</td>
<td>0.1076 sec</td>
<td>12</td>
<td>12</td>
</tr>
<tr>
<td>AMD-x64-64-v3</td>
<td>79.4559 sec</td>
<td>AMD x86-64-v3</td>
<td>79.456 sec</td>
<td>0.0772 sec</td>
<td>12</td>
</tr>
<tr>
<td>AMD Zen 3</td>
<td>79.4400 sec</td>
<td>79.440 sec</td>
<td>0.0912 sec</td>
<td>12</td>
</tr>
</table>

### Box plot
![X9550](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/boxplot.svg)
![X9550](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/boxplot1.svg)

<table>
<tr>
<th>Treatment pairs</th>
<th>Tukey HSD Q stat</th>
<th>Tukey HSD p-value</th>
<th>Tukey HSD interfence</th>
</tr>
<tr>
<td>Generic x86-64 vs AMD x86-64-v3</td>
<td>12.8771</td>
<td>0.0010053</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>Generic x86-64 vs AMD Zen 3</td>
<td>13.4675</td>
<td>0.0010053</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>AMD x86-64-v3 vs AMD Zen 3</td>
<td>9.6524</td>
<td>0.8999947</td>
<td>$${\color{red}insignificant}$$</tr>
</tr>
</table>

### Stats for Machine 2. Intel i7-4790K
<table>
<tr>
<th>Processor family option</th>
<th>Mean compile time</th>
<th>Std dev</th>
<th># of replicates</th>
</tr>
<tr>
<td>Generic x86-64</td>
<td>344.280 sec</td>
<td>0.6455 sec</td>
<td>12</td>
</tr>
<tr>
<td>Intel x86-64-v3</td>
<td>342.035 sec</td>
<td>0.4971 sec</td>
<td>12</td>
</tr>
<tr>
<td>Intel Haswell</td>
<td>342.189 sec</td>
<td>0.2415 sec</td>
<td>12</td>
</tr>
</table>

![i7-4790k](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/boxplot2.svg)

<table>
<tr>
<th>Treatment pairs</th>
<th>Tukey HSD Q stat</th>
<th>Tukey HSD p-value</th>
<th>Tukey HSD interfence</th>
</tr>
<tr>
<td>Generic x86-64 vs Intel x86-64-v3</td>
<td>28.9652</td>
<td>0.0010053</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>Generic x86-64 vs Intel Haswell</td>
<td>24.8335</td>
<td>0.0010053</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>Intel x86-64-v3 vs Intel Haswell</td>
<td>4.1317</td>
<td>0.0167155</td>
<td> $${\color{lightgreen} p<0.05}$$</tr>
</tr>
</table>

### Stats for Machine 3. Intel N100
<table>
<tr>
<th>Processor family option</th>
<th>Mean compile time</th>
<th>Std dev</th>
<th># of replicates</th>
</tr>
<tr>
<td>Generic x86-64</td>
<td>589.457 sec</td>
<td>0.1596 sec</td>
<td>12</td>
</tr>
<tr>
<td>Intel x86-64-v3</td>
<td>589.217 sec</td>
<td>0.1382 sec</td>
<td>12</td>
</tr>
<tr>
<td>Intel Alder Lake</td>
<td>588.797 sec</td>
<td>0.1532 sec</td>
<td>12</td>
</tr>
</table>

![N100](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/boxplot3.svg)

<table>
<tr>
<th>Treatment pairs</th>
<th>Tukey HSD Q stat</th>
<th>Tukey HSD p-value</th>
<th>Tukey HSD interfence</th>
</tr>
<tr>
<td>Generic x86-64 vs Intel x86-64-v3</td>
<td>5.5076</td>
<td>0.0012818</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>Generic x86-64 vs Intel Alder Lake</td>
<td>15.1600</td>
<td>0.0010053</td>
<td>$${\color{green} p<0.01}$$</tr>
</tr>
<tr>
<td>Intel x86-64-v3 vs Intel Alder Lake</td>
<td>9.6524</td>
<td>0.0010053</td>
<td>$${\color{green} \verb|**|p<0.01}$$</tr>
</tr>
</table>

## Software versions used

All machines ran Arch Linux with the all stock repo packages with the exception of the kernel (see below). At the time of work, the following the toolchain versions were used:
* binutils 2.43+r4+g7999dae6961-1
* gcc 14.2.1+r134+gab884fffe3fc-1
* gcc-libs 14.2.1+r134+gab884fffe3fc-1
* glibc 2.40+r16+gaa533d58ff-2
* linux-api-headers 6.10-1

The kernel packages were built on the official Arch Linux PKGBUILD for kernel version 6.10.10-arch1-1 applying the distro config differing only by the modifications introduced by the aforementioned patch from this repo.

The benchmark was compiling the vanilla Linux kernel version 6.10.10 and as mentioned above, the `.config` used was generated by running `make x86_64_defconfig`.

## References
* Bash script that controls the benchmark: https://github.com/graysky2/bin/blob/master/make_bench.sh
* Bash script to run the benchmark: [make_bench.sh](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/make_bench.sh)
* Log file generated by script: [results.csv](https://github.com/graysky2/kernel_compiler_patch/blob/master/benchmark/results.csv)

## Credit
* Original author: jeroen AT linuxforge DOT net
* Link to original version: http://www.linuxforge.net/docs/linux/linux-gcc.php
* Box plot generated with [statisty.app](https://statisty.app/anova-calculator)
* ANOVA stats generated with [astatsa.com](https://astatsa.com/OneWay_Anova_with_TukeyHSD/)

## Legacy support
Find support for older version of the linux kernel and of gcc in the outdated_versions directory.
Loading

0 comments on commit 67e052d

Please sign in to comment.