-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: instruction alignment optimizations for assembly routines, good starter projects #63678
Comments
MemclrUnaligned/0_5-16 1.821n ± 1% 1.803n ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 1.879n ± 1% 1.855n ± 1% ~ (p=0.210 n=20+10) MemclrUnaligned/0_64-16 2.044n ± 1% 2.044n ± 2% ~ (p=0.871 n=20+10) MemclrUnaligned/0_256-16 3.614n ± 1% 3.600n ± 3% ~ (p=0.552 n=20+10) MemclrUnaligned/0_4096-16 32.63n ± 2% 32.34n ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 483.5n ± 3% 479.1n ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 1.800n ± 1% 1.808n ± 1% ~ (p=0.333 n=20+10) MemclrUnaligned/1_16-16 1.863n ± 1% 1.847n ± 2% ~ (p=0.345 n=20+10) MemclrUnaligned/1_64-16 2.929n ± 1% 2.107n ± 2% -28.05% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 4.942n ± 1% 4.973n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/1_4096-16 40.09n ± 1% 39.49n ± 2% ~ (p=0.210 n=20+10) MemclrUnaligned/1_65536-16 650.0n ± 3% 653.7n ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 1.806n ± 1% 1.812n ± 1% ~ (p=0.291 n=20+10) MemclrUnaligned/4_16-16 1.867n ± 1% 1.862n ± 1% ~ (p=0.551 n=20+10) MemclrUnaligned/4_64-16 2.946n ± 2% 2.752n ± 2% -6.59% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 4.942n ± 1% 5.144n ± 2% +4.08% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 39.88n ± 1% 40.21n ± 4% ~ (p=0.346 n=20+10) MemclrUnaligned/4_65536-16 643.7n ± 2% 647.8n ± 4% ~ (p=0.657 n=20+10) MemclrUnaligned/7_5-16 1.802n ± 1% 1.801n ± 3% ~ (p=0.481 n=20+10) MemclrUnaligned/7_16-16 1.863n ± 1% 1.863n ± 2% ~ (p=0.626 n=20+10) MemclrUnaligned/7_64-16 2.947n ± 1% 2.125n ± 2% -27.91% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 4.967n ± 1% 5.005n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/7_4096-16 39.52n ± 3% 40.07n ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 651.5n ± 3% 649.2n ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 7.646µ ± 2% 7.618µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 54.15µ ± 3% 119.05µ ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 108.8µ ± 3% 107.0µ ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 216.2µ ± 2% 216.3µ ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 888.4µ ± 2% 867.3µ ± 6% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 10.85µ ± 2% 11.00µ ± 5% +1.37% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 48.66µ ± 2% 47.79µ ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 96.18µ ± 4% 97.18µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 232.7µ ± 2% 276.9µ ± 19% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 883.2µ ± 2% 892.3µ ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 10.97µ ± 2% 11.04µ ± 4% ~ (p=0.073 n=20+10) MemclrUnaligned/4_4M-16 48.53µ ± 2% 45.21µ ± 11% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 97.31µ ± 2% 96.12µ ± 3% ~ (p=0.311 n=20+10) MemclrUnaligned/4_16M-16 234.7µ ± 6% 241.0µ ± 42% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 891.9µ ± 2% 875.8µ ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 11.03µ ± 3% 10.85µ ± 4% ~ (p=0.495 n=20+10) MemclrUnaligned/7_4M-16 51.37µ ± 2% 48.38µ ± 2% -5.83% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 97.66µ ± 3% 97.83µ ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 231.4µ ± 7% 274.7µ ± 29% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 891.8µ ± 3% 868.4µ ± 4% ~ (p=0.061 n=20+10) MemclrUnaligned/0_5-16 2.558Gi ± 1% 2.583Gi ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 7.931Gi ± 1% 8.030Gi ± 1% ~ (p=0.214 n=20+10) MemclrUnaligned/0_64-16 29.15Gi ± 1% 29.17Gi ± 2% ~ (p=0.914 n=20+10) MemclrUnaligned/0_256-16 65.97Gi ± 1% 66.23Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_4096-16 116.9Gi ± 2% 117.9Gi ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 126.2Gi ± 3% 127.4Gi ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 2.587Gi ± 1% 2.575Gi ± 1% ~ (p=0.328 n=20+10) MemclrUnaligned/1_16-16 7.998Gi ± 1% 8.066Gi ± 2% ~ (p=0.373 n=20+10) MemclrUnaligned/1_64-16 20.35Gi ± 1% 28.29Gi ± 2% +39.02% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 48.24Gi ± 1% 47.94Gi ± 3% ~ (p=0.307 n=20+10) MemclrUnaligned/1_4096-16 95.16Gi ± 1% 96.61Gi ± 2% ~ (p=0.214 n=20+10) MemclrUnaligned/1_65536-16 93.90Gi ± 3% 93.38Gi ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 2.578Gi ± 1% 2.569Gi ± 1% ~ (p=0.286 n=20+10) MemclrUnaligned/4_16-16 7.979Gi ± 1% 8.005Gi ± 1% ~ (p=0.588 n=20+10) MemclrUnaligned/4_64-16 20.24Gi ± 2% 21.67Gi ± 2% +7.06% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 48.24Gi ± 1% 46.35Gi ± 2% -3.92% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 95.65Gi ± 2% 94.87Gi ± 4% ~ (p=0.350 n=20+10) MemclrUnaligned/4_65536-16 94.82Gi ± 2% 94.22Gi ± 5% ~ (p=0.650 n=20+10) MemclrUnaligned/7_5-16 2.584Gi ± 1% 2.585Gi ± 3% ~ (p=0.475 n=20+10) MemclrUnaligned/7_16-16 7.999Gi ± 1% 7.999Gi ± 2% ~ (p=0.619 n=20+10) MemclrUnaligned/7_64-16 20.22Gi ± 1% 28.05Gi ± 2% +38.72% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 48.00Gi ± 1% 47.65Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/7_4096-16 96.54Gi ± 3% 95.19Gi ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 93.69Gi ± 3% 94.02Gi ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 127.7Gi ± 2% 128.2Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 72.14Gi ± 3% 46.75Gi ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 71.82Gi ± 3% 72.98Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 72.29Gi ± 2% 72.24Gi ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 70.35Gi ± 2% 72.07Gi ± 5% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 89.98Gi ± 2% 88.77Gi ± 5% -1.35% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 80.28Gi ± 2% 81.74Gi ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 81.23Gi ± 4% 80.39Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 67.16Gi ± 2% 57.17Gi ± 22% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 70.77Gi ± 2% 70.04Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 88.99Gi ± 2% 88.48Gi ± 4% ~ (p=0.074 n=20+10) MemclrUnaligned/4_4M-16 80.49Gi ± 2% 86.41Gi ± 10% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 80.28Gi ± 2% 81.28Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/4_16M-16 66.58Gi ± 6% 64.83Gi ± 29% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 70.07Gi ± 2% 71.37Gi ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 88.51Gi ± 3% 89.98Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/7_4M-16 76.04Gi ± 2% 80.74Gi ± 2% +6.19% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 80.00Gi ± 3% 79.86Gi ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 67.53Gi ± 7% 58.23Gi ± 24% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 70.08Gi ± 3% 71.98Gi ± 4% ~ (p=0.061 n=20+10) For golang#63678
MemclrUnaligned/0_5-16 1.821n ± 1% 1.803n ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 1.879n ± 1% 1.855n ± 1% ~ (p=0.210 n=20+10) MemclrUnaligned/0_64-16 2.044n ± 1% 2.044n ± 2% ~ (p=0.871 n=20+10) MemclrUnaligned/0_256-16 3.614n ± 1% 3.600n ± 3% ~ (p=0.552 n=20+10) MemclrUnaligned/0_4096-16 32.63n ± 2% 32.34n ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 483.5n ± 3% 479.1n ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 1.800n ± 1% 1.808n ± 1% ~ (p=0.333 n=20+10) MemclrUnaligned/1_16-16 1.863n ± 1% 1.847n ± 2% ~ (p=0.345 n=20+10) MemclrUnaligned/1_64-16 2.929n ± 1% 2.107n ± 2% -28.05% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 4.942n ± 1% 4.973n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/1_4096-16 40.09n ± 1% 39.49n ± 2% ~ (p=0.210 n=20+10) MemclrUnaligned/1_65536-16 650.0n ± 3% 653.7n ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 1.806n ± 1% 1.812n ± 1% ~ (p=0.291 n=20+10) MemclrUnaligned/4_16-16 1.867n ± 1% 1.862n ± 1% ~ (p=0.551 n=20+10) MemclrUnaligned/4_64-16 2.946n ± 2% 2.752n ± 2% -6.59% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 4.942n ± 1% 5.144n ± 2% +4.08% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 39.88n ± 1% 40.21n ± 4% ~ (p=0.346 n=20+10) MemclrUnaligned/4_65536-16 643.7n ± 2% 647.8n ± 4% ~ (p=0.657 n=20+10) MemclrUnaligned/7_5-16 1.802n ± 1% 1.801n ± 3% ~ (p=0.481 n=20+10) MemclrUnaligned/7_16-16 1.863n ± 1% 1.863n ± 2% ~ (p=0.626 n=20+10) MemclrUnaligned/7_64-16 2.947n ± 1% 2.125n ± 2% -27.91% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 4.967n ± 1% 5.005n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/7_4096-16 39.52n ± 3% 40.07n ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 651.5n ± 3% 649.2n ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 7.646µ ± 2% 7.618µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 54.15µ ± 3% 119.05µ ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 108.8µ ± 3% 107.0µ ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 216.2µ ± 2% 216.3µ ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 888.4µ ± 2% 867.3µ ± 6% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 10.85µ ± 2% 11.00µ ± 5% +1.37% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 48.66µ ± 2% 47.79µ ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 96.18µ ± 4% 97.18µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 232.7µ ± 2% 276.9µ ± 19% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 883.2µ ± 2% 892.3µ ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 10.97µ ± 2% 11.04µ ± 4% ~ (p=0.073 n=20+10) MemclrUnaligned/4_4M-16 48.53µ ± 2% 45.21µ ± 11% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 97.31µ ± 2% 96.12µ ± 3% ~ (p=0.311 n=20+10) MemclrUnaligned/4_16M-16 234.7µ ± 6% 241.0µ ± 42% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 891.9µ ± 2% 875.8µ ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 11.03µ ± 3% 10.85µ ± 4% ~ (p=0.495 n=20+10) MemclrUnaligned/7_4M-16 51.37µ ± 2% 48.38µ ± 2% -5.83% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 97.66µ ± 3% 97.83µ ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 231.4µ ± 7% 274.7µ ± 29% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 891.8µ ± 3% 868.4µ ± 4% ~ (p=0.061 n=20+10) MemclrUnaligned/0_5-16 2.558Gi ± 1% 2.583Gi ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 7.931Gi ± 1% 8.030Gi ± 1% ~ (p=0.214 n=20+10) MemclrUnaligned/0_64-16 29.15Gi ± 1% 29.17Gi ± 2% ~ (p=0.914 n=20+10) MemclrUnaligned/0_256-16 65.97Gi ± 1% 66.23Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_4096-16 116.9Gi ± 2% 117.9Gi ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 126.2Gi ± 3% 127.4Gi ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 2.587Gi ± 1% 2.575Gi ± 1% ~ (p=0.328 n=20+10) MemclrUnaligned/1_16-16 7.998Gi ± 1% 8.066Gi ± 2% ~ (p=0.373 n=20+10) MemclrUnaligned/1_64-16 20.35Gi ± 1% 28.29Gi ± 2% +39.02% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 48.24Gi ± 1% 47.94Gi ± 3% ~ (p=0.307 n=20+10) MemclrUnaligned/1_4096-16 95.16Gi ± 1% 96.61Gi ± 2% ~ (p=0.214 n=20+10) MemclrUnaligned/1_65536-16 93.90Gi ± 3% 93.38Gi ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 2.578Gi ± 1% 2.569Gi ± 1% ~ (p=0.286 n=20+10) MemclrUnaligned/4_16-16 7.979Gi ± 1% 8.005Gi ± 1% ~ (p=0.588 n=20+10) MemclrUnaligned/4_64-16 20.24Gi ± 2% 21.67Gi ± 2% +7.06% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 48.24Gi ± 1% 46.35Gi ± 2% -3.92% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 95.65Gi ± 2% 94.87Gi ± 4% ~ (p=0.350 n=20+10) MemclrUnaligned/4_65536-16 94.82Gi ± 2% 94.22Gi ± 5% ~ (p=0.650 n=20+10) MemclrUnaligned/7_5-16 2.584Gi ± 1% 2.585Gi ± 3% ~ (p=0.475 n=20+10) MemclrUnaligned/7_16-16 7.999Gi ± 1% 7.999Gi ± 2% ~ (p=0.619 n=20+10) MemclrUnaligned/7_64-16 20.22Gi ± 1% 28.05Gi ± 2% +38.72% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 48.00Gi ± 1% 47.65Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/7_4096-16 96.54Gi ± 3% 95.19Gi ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 93.69Gi ± 3% 94.02Gi ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 127.7Gi ± 2% 128.2Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 72.14Gi ± 3% 46.75Gi ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 71.82Gi ± 3% 72.98Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 72.29Gi ± 2% 72.24Gi ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 70.35Gi ± 2% 72.07Gi ± 5% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 89.98Gi ± 2% 88.77Gi ± 5% -1.35% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 80.28Gi ± 2% 81.74Gi ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 81.23Gi ± 4% 80.39Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 67.16Gi ± 2% 57.17Gi ± 22% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 70.77Gi ± 2% 70.04Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 88.99Gi ± 2% 88.48Gi ± 4% ~ (p=0.074 n=20+10) MemclrUnaligned/4_4M-16 80.49Gi ± 2% 86.41Gi ± 10% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 80.28Gi ± 2% 81.28Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/4_16M-16 66.58Gi ± 6% 64.83Gi ± 29% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 70.07Gi ± 2% 71.37Gi ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 88.51Gi ± 3% 89.98Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/7_4M-16 76.04Gi ± 2% 80.74Gi ± 2% +6.19% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 80.00Gi ± 3% 79.86Gi ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 67.53Gi ± 7% 58.23Gi ± 24% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 70.08Gi ± 3% 71.98Gi ± 4% ~ (p=0.061 n=20+10) For golang#63678
Change https://go.dev/cl/537055 mentions this issue: |
MemclrUnaligned/0_5-16 1.821n ± 1% 1.803n ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 1.879n ± 1% 1.855n ± 1% ~ (p=0.210 n=20+10) MemclrUnaligned/0_64-16 2.044n ± 1% 2.044n ± 2% ~ (p=0.871 n=20+10) MemclrUnaligned/0_256-16 3.614n ± 1% 3.600n ± 3% ~ (p=0.552 n=20+10) MemclrUnaligned/0_4096-16 32.63n ± 2% 32.34n ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 483.5n ± 3% 479.1n ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 1.800n ± 1% 1.808n ± 1% ~ (p=0.333 n=20+10) MemclrUnaligned/1_16-16 1.863n ± 1% 1.847n ± 2% ~ (p=0.345 n=20+10) MemclrUnaligned/1_64-16 2.929n ± 1% 2.107n ± 2% -28.05% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 4.942n ± 1% 4.973n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/1_4096-16 40.09n ± 1% 39.49n ± 2% ~ (p=0.210 n=20+10) MemclrUnaligned/1_65536-16 650.0n ± 3% 653.7n ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 1.806n ± 1% 1.812n ± 1% ~ (p=0.291 n=20+10) MemclrUnaligned/4_16-16 1.867n ± 1% 1.862n ± 1% ~ (p=0.551 n=20+10) MemclrUnaligned/4_64-16 2.946n ± 2% 2.752n ± 2% -6.59% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 4.942n ± 1% 5.144n ± 2% +4.08% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 39.88n ± 1% 40.21n ± 4% ~ (p=0.346 n=20+10) MemclrUnaligned/4_65536-16 643.7n ± 2% 647.8n ± 4% ~ (p=0.657 n=20+10) MemclrUnaligned/7_5-16 1.802n ± 1% 1.801n ± 3% ~ (p=0.481 n=20+10) MemclrUnaligned/7_16-16 1.863n ± 1% 1.863n ± 2% ~ (p=0.626 n=20+10) MemclrUnaligned/7_64-16 2.947n ± 1% 2.125n ± 2% -27.91% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 4.967n ± 1% 5.005n ± 3% ~ (p=0.302 n=20+10) MemclrUnaligned/7_4096-16 39.52n ± 3% 40.07n ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 651.5n ± 3% 649.2n ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 7.646µ ± 2% 7.618µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 54.15µ ± 3% 119.05µ ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 108.8µ ± 3% 107.0µ ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 216.2µ ± 2% 216.3µ ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 888.4µ ± 2% 867.3µ ± 6% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 10.85µ ± 2% 11.00µ ± 5% +1.37% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 48.66µ ± 2% 47.79µ ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 96.18µ ± 4% 97.18µ ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 232.7µ ± 2% 276.9µ ± 19% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 883.2µ ± 2% 892.3µ ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 10.97µ ± 2% 11.04µ ± 4% ~ (p=0.073 n=20+10) MemclrUnaligned/4_4M-16 48.53µ ± 2% 45.21µ ± 11% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 97.31µ ± 2% 96.12µ ± 3% ~ (p=0.311 n=20+10) MemclrUnaligned/4_16M-16 234.7µ ± 6% 241.0µ ± 42% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 891.9µ ± 2% 875.8µ ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 11.03µ ± 3% 10.85µ ± 4% ~ (p=0.495 n=20+10) MemclrUnaligned/7_4M-16 51.37µ ± 2% 48.38µ ± 2% -5.83% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 97.66µ ± 3% 97.83µ ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 231.4µ ± 7% 274.7µ ± 29% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 891.8µ ± 3% 868.4µ ± 4% ~ (p=0.061 n=20+10) MemclrUnaligned/0_5-16 2.558Gi ± 1% 2.583Gi ± 2% ~ (p=0.076 n=20+10) MemclrUnaligned/0_16-16 7.931Gi ± 1% 8.030Gi ± 1% ~ (p=0.214 n=20+10) MemclrUnaligned/0_64-16 29.15Gi ± 1% 29.17Gi ± 2% ~ (p=0.914 n=20+10) MemclrUnaligned/0_256-16 65.97Gi ± 1% 66.23Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_4096-16 116.9Gi ± 2% 117.9Gi ± 3% ~ (p=0.948 n=20+10) MemclrUnaligned/0_65536-16 126.2Gi ± 3% 127.4Gi ± 5% ~ (p=0.588 n=20+10) MemclrUnaligned/1_5-16 2.587Gi ± 1% 2.575Gi ± 1% ~ (p=0.328 n=20+10) MemclrUnaligned/1_16-16 7.998Gi ± 1% 8.066Gi ± 2% ~ (p=0.373 n=20+10) MemclrUnaligned/1_64-16 20.35Gi ± 1% 28.29Gi ± 2% +39.02% (p=0.000 n=20+10) MemclrUnaligned/1_256-16 48.24Gi ± 1% 47.94Gi ± 3% ~ (p=0.307 n=20+10) MemclrUnaligned/1_4096-16 95.16Gi ± 1% 96.61Gi ± 2% ~ (p=0.214 n=20+10) MemclrUnaligned/1_65536-16 93.90Gi ± 3% 93.38Gi ± 4% ~ (p=0.530 n=20+10) MemclrUnaligned/4_5-16 2.578Gi ± 1% 2.569Gi ± 1% ~ (p=0.286 n=20+10) MemclrUnaligned/4_16-16 7.979Gi ± 1% 8.005Gi ± 1% ~ (p=0.588 n=20+10) MemclrUnaligned/4_64-16 20.24Gi ± 2% 21.67Gi ± 2% +7.06% (p=0.000 n=20+10) MemclrUnaligned/4_256-16 48.24Gi ± 1% 46.35Gi ± 2% -3.92% (p=0.000 n=20+10) MemclrUnaligned/4_4096-16 95.65Gi ± 2% 94.87Gi ± 4% ~ (p=0.350 n=20+10) MemclrUnaligned/4_65536-16 94.82Gi ± 2% 94.22Gi ± 5% ~ (p=0.650 n=20+10) MemclrUnaligned/7_5-16 2.584Gi ± 1% 2.585Gi ± 3% ~ (p=0.475 n=20+10) MemclrUnaligned/7_16-16 7.999Gi ± 1% 7.999Gi ± 2% ~ (p=0.619 n=20+10) MemclrUnaligned/7_64-16 20.22Gi ± 1% 28.05Gi ± 2% +38.72% (p=0.000 n=20+10) MemclrUnaligned/7_256-16 48.00Gi ± 1% 47.65Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/7_4096-16 96.54Gi ± 3% 95.19Gi ± 3% ~ (p=0.650 n=20+10) MemclrUnaligned/7_65536-16 93.69Gi ± 3% 94.02Gi ± 4% ~ (p=0.846 n=20+10) MemclrUnaligned/0_1M-16 127.7Gi ± 2% 128.2Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/0_4M-16 72.14Gi ± 3% 46.75Gi ± 66% ~ (p=0.350 n=20+10) MemclrUnaligned/0_8M-16 71.82Gi ± 3% 72.98Gi ± 3% ~ (p=0.559 n=20+10) MemclrUnaligned/0_16M-16 72.29Gi ± 2% 72.24Gi ± 3% ~ (p=0.681 n=20+10) MemclrUnaligned/0_64M-16 70.35Gi ± 2% 72.07Gi ± 5% ~ (p=0.055 n=20+10) MemclrUnaligned/1_1M-16 89.98Gi ± 2% 88.77Gi ± 5% -1.35% (p=0.028 n=20+10) MemclrUnaligned/1_4M-16 80.28Gi ± 2% 81.74Gi ± 1% ~ (p=0.120 n=20+10) MemclrUnaligned/1_8M-16 81.23Gi ± 4% 80.39Gi ± 5% ~ (p=0.373 n=20+10) MemclrUnaligned/1_16M-16 67.16Gi ± 2% 57.17Gi ± 22% ~ (p=0.286 n=20+10) MemclrUnaligned/1_64M-16 70.77Gi ± 2% 70.04Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/4_1M-16 88.99Gi ± 2% 88.48Gi ± 4% ~ (p=0.074 n=20+10) MemclrUnaligned/4_4M-16 80.49Gi ± 2% 86.41Gi ± 10% ~ (p=0.082 n=20+10) MemclrUnaligned/4_8M-16 80.28Gi ± 2% 81.28Gi ± 3% ~ (p=0.328 n=20+10) MemclrUnaligned/4_16M-16 66.58Gi ± 6% 64.83Gi ± 29% ~ (p=0.328 n=20+10) MemclrUnaligned/4_64M-16 70.07Gi ± 2% 71.37Gi ± 3% ~ (p=0.448 n=20+10) MemclrUnaligned/7_1M-16 88.51Gi ± 3% 89.98Gi ± 4% ~ (p=0.502 n=20+10) MemclrUnaligned/7_4M-16 76.04Gi ± 2% 80.74Gi ± 2% +6.19% (p=0.000 n=20+10) MemclrUnaligned/7_8M-16 80.00Gi ± 3% 79.86Gi ± 3% ~ (p=0.846 n=20+10) MemclrUnaligned/7_16M-16 67.53Gi ± 7% 58.23Gi ± 24% ~ (p=0.286 n=20+10) MemclrUnaligned/7_64M-16 70.08Gi ± 3% 71.98Gi ± 4% ~ (p=0.061 n=20+10) For golang#63678
goos: linux goarch: amd64 pkg: crypto/subtle cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ XORBytes/8Bytes-8 10.90n ± 1% 10.96n ± 5% ~ (p=0.617 n=10) XORBytes/128Bytes-8 14.85n ± 2% 12.05n ± 2% -18.82% (p=0.000 n=10) XORBytes/2048Bytes-8 88.30n ± 2% 72.64n ± 1% -17.73% (p=0.000 n=10) XORBytes/32768Bytes-8 1.489µ ± 2% 1.442µ ± 1% -3.12% (p=0.000 n=10) geomean 67.91n 60.99n -10.19% │ master │ HEAD │ │ B/s │ B/s vs base │ XORBytes/8Bytes-8 700.5Mi ± 1% 696.5Mi ± 5% ~ (p=0.631 n=10) XORBytes/128Bytes-8 8.026Gi ± 2% 9.890Gi ± 2% +23.22% (p=0.000 n=10) XORBytes/2048Bytes-8 21.60Gi ± 2% 26.26Gi ± 1% +21.55% (p=0.000 n=10) XORBytes/32768Bytes-8 20.50Gi ± 2% 21.16Gi ± 1% +3.21% (p=0.000 n=10) geomean 7.022Gi 7.819Gi +11.34% For golang#63678
Change https://go.dev/cl/537856 mentions this issue: |
goos: linux goarch: amd64 pkg: crypto/subtle cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ XORBytes/8Bytes-8 10.90n ± 1% 10.96n ± 5% ~ (p=0.617 n=10) XORBytes/128Bytes-8 14.85n ± 2% 12.05n ± 2% -18.82% (p=0.000 n=10) XORBytes/2048Bytes-8 88.30n ± 2% 72.64n ± 1% -17.73% (p=0.000 n=10) XORBytes/32768Bytes-8 1.489µ ± 2% 1.442µ ± 1% -3.12% (p=0.000 n=10) geomean 67.91n 60.99n -10.19% │ master │ HEAD │ │ B/s │ B/s vs base │ XORBytes/8Bytes-8 700.5Mi ± 1% 696.5Mi ± 5% ~ (p=0.631 n=10) XORBytes/128Bytes-8 8.026Gi ± 2% 9.890Gi ± 2% +23.22% (p=0.000 n=10) XORBytes/2048Bytes-8 21.60Gi ± 2% 26.26Gi ± 1% +21.55% (p=0.000 n=10) XORBytes/32768Bytes-8 20.50Gi ± 2% 21.16Gi ± 1% +3.21% (p=0.000 n=10) geomean 7.022Gi 7.819Gi +11.34% For #63678 Change-Id: I3996873773748a6f78acc6575e70e09bb6aea979 GitHub-Last-Rev: d9129cb GitHub-Pull-Request: #63754 Reviewed-on: https://go-review.googlesource.com/c/go/+/537856 Reviewed-by: David Chase <[email protected]> Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
goos: linux goarch: amd64 pkg: bytes cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ Equal/0-8 0.2800n ± 22% 0.2865n ± 26% ~ (p=0.075 n=10) Equal/1-8 18.57n ± 2% 19.34n ± 6% +4.15% (p=0.014 n=10) Equal/6-8 19.07n ± 1% 19.38n ± 2% +1.63% (p=0.014 n=10) Equal/9-8 19.39n ± 2% 19.05n ± 1% -1.78% (p=0.005 n=10) Equal/15-8 19.46n ± 1% 19.10n ± 1% -1.85% (p=0.000 n=10) Equal/16-8 19.36n ± 2% 18.95n ± 1% -2.09% (p=0.011 n=10) Equal/20-8 20.20n ± 1% 19.83n ± 1% -1.86% (p=0.001 n=10) Equal/32-8 20.95n ± 1% 20.84n ± 1% -0.57% (p=0.010 n=10) Equal/4K-8 97.40n ± 2% 81.34n ± 3% -16.49% (p=0.000 n=10) Equal/4M-8 81.74µ ± 3% 71.52µ ± 4% -12.49% (p=0.000 n=10) Equal/64M-8 1.319m ± 1% 1.139m ± 3% -13.68% (p=0.000 n=10) EqualBothUnaligned/64_0-8 8.707n ± 4% 8.588n ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 8.513n ± 3% 8.614n ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 8.752n ± 3% 8.637n ± 4% ~ (p=0.148 n=10) EqualBothUnaligned/64_7-8 8.742n ± 3% 8.514n ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 89.87n ± 3% 70.44n ± 5% -21.63% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 91.67n ± 5% 70.89n ± 3% -22.67% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 90.43n ± 2% 70.52n ± 3% -22.01% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 89.53n ± 3% 72.02n ± 5% -19.56% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 86.43µ ± 3% 73.40µ ± 4% -15.07% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 85.48µ ± 2% 75.35µ ± 1% -11.85% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 86.51µ ± 3% 75.44µ ± 4% -12.80% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 86.40µ ± 3% 74.41µ ± 3% -13.88% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 1.374m ± 3% 1.171m ± 3% -14.75% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 1.401m ± 4% 1.198m ± 4% -14.49% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 1.393m ± 4% 1.205m ± 4% -13.53% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 1.396m ± 3% 1.199m ± 4% -14.11% (p=0.000 n=10) geomean 735.7n 666.7n -9.39% │ master │ HEAD │ │ B/s │ B/s vs base │ Equal/1-8 51.36Mi ± 2% 49.32Mi ± 6% -3.98% (p=0.015 n=10) Equal/6-8 300.0Mi ± 1% 295.3Mi ± 2% -1.57% (p=0.011 n=10) Equal/9-8 442.5Mi ± 2% 450.6Mi ± 1% +1.82% (p=0.005 n=10) Equal/15-8 734.9Mi ± 1% 748.8Mi ± 1% +1.90% (p=0.000 n=10) Equal/16-8 788.4Mi ± 2% 805.2Mi ± 1% +2.14% (p=0.011 n=10) Equal/20-8 944.2Mi ± 1% 961.8Mi ± 1% +1.87% (p=0.002 n=10) Equal/32-8 1.422Gi ± 0% 1.430Gi ± 1% +0.58% (p=0.011 n=10) Equal/4K-8 39.17Gi ± 2% 46.90Gi ± 3% +19.74% (p=0.000 n=10) Equal/4M-8 47.79Gi ± 3% 54.62Gi ± 4% +14.27% (p=0.000 n=10) Equal/64M-8 47.38Gi ± 1% 54.89Gi ± 3% +15.85% (p=0.000 n=10) EqualBothUnaligned/64_0-8 6.845Gi ± 4% 6.940Gi ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 7.002Gi ± 3% 6.919Gi ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 6.811Gi ± 3% 6.901Gi ± 4% ~ (p=0.165 n=10) EqualBothUnaligned/64_7-8 6.819Gi ± 3% 7.002Gi ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 42.45Gi ± 3% 54.16Gi ± 5% +27.60% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 41.61Gi ± 6% 53.82Gi ± 3% +29.33% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 42.19Gi ± 2% 54.09Gi ± 3% +28.22% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 42.61Gi ± 3% 52.97Gi ± 5% +24.33% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 45.20Gi ± 3% 53.22Gi ± 4% +17.75% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 45.70Gi ± 2% 51.84Gi ± 1% +13.43% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 45.15Gi ± 3% 51.78Gi ± 4% +14.68% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 45.21Gi ± 3% 52.50Gi ± 4% +16.12% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 45.50Gi ± 3% 53.37Gi ± 3% +17.30% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 44.63Gi ± 4% 52.17Gi ± 4% +16.89% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 44.86Gi ± 4% 51.88Gi ± 4% +15.65% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 44.76Gi ± 3% 52.12Gi ± 4% +16.43% (p=0.000 n=10) geomean 9.734Gi 10.79Gi +10.88% For golang#63678
Change https://go.dev/cl/537995 mentions this issue: |
Change https://go.dev/cl/538315 mentions this issue: |
Change https://go.dev/cl/538116 mentions this issue: |
@mauri870 Is such a change worth submitting?
Hash8Bytes/New-16 46.67Mi ± 1% 47.56Mi ± 0% +1.91% (p=0.000 n=10) |
@qiulaidongfeng Probably not, too little of a change that it could be just spurious alignment changes. Generally if a routine really benefits from instruction alignment you'll see a noticeable increase (in the range of > 5%) that is constantly reproducible with a higher -count in benchmarks. |
@mauri870 Is it worth submitting results that show significant changes in only one benchmark?
IndexByte/10-16 3.663Gi ± 2% 3.669Gi ± 1% ~ (p=0.739 n=10) |
@qiulaidongfeng That might be happening because you are optimizing just one branch from the assembly code? Either way seems to be a good optimization for that particular case, without negatively impacting the others. |
Change https://go.dev/cl/538715 mentions this issue: |
For #63678 goos: darwin goarch: arm64 pkg: strings │ count_old.txt │ count_new.txt │ │ sec/op │ sec/op vs base │ CountHard1-8 368.7µ ± 11% 332.0µ ± 1% -9.95% (p=0.002 n=10) CountHard2-8 348.8µ ± 5% 333.1µ ± 1% -4.51% (p=0.000 n=10) CountHard3-8 402.7µ ± 25% 359.5µ ± 1% -10.75% (p=0.000 n=10) CountTorture-8 10.536µ ± 23% 9.913µ ± 0% -5.91% (p=0.000 n=10) CountTortureOverlapping-8 74.86µ ± 9% 67.56µ ± 1% -9.75% (p=0.000 n=10) CountByte/10-8 6.905n ± 3% 6.690n ± 1% -3.11% (p=0.001 n=10) CountByte/32-8 3.247n ± 13% 3.207n ± 2% -1.23% (p=0.030 n=10) CountByte/4096-8 83.72n ± 1% 82.58n ± 1% -1.36% (p=0.007 n=10) CountByte/4194304-8 85.17µ ± 5% 84.02µ ± 8% ~ (p=0.075 n=10) CountByte/67108864-8 1.497m ± 8% 1.397m ± 2% -6.69% (p=0.000 n=10) geomean 9.977µ 9.426µ -5.53% │ count_old.txt │ count_new.txt │ │ B/s │ B/s vs base │ CountByte/10-8 1.349Gi ± 3% 1.392Gi ± 1% +3.20% (p=0.002 n=10) CountByte/32-8 9.180Gi ± 11% 9.294Gi ± 2% +1.24% (p=0.029 n=10) CountByte/4096-8 45.57Gi ± 1% 46.20Gi ± 1% +1.38% (p=0.007 n=10) CountByte/4194304-8 45.86Gi ± 5% 46.49Gi ± 7% ~ (p=0.075 n=10) CountByte/67108864-8 41.75Gi ± 8% 44.74Gi ± 2% +7.16% (p=0.000 n=10) geomean 16.10Gi 16.55Gi +2.85% Change-Id: Ifc2173ba3a926b0fa9598372d4404b8645929d45 Reviewed-on: https://go-review.googlesource.com/c/go/+/538116 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: shuang cui <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
goos: windows goarch: amd64 pkg: bytes cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ IndexByte/10-16 2.613n ± 1% 2.558n ± 1% -2.09% (p=0.014 n=10) IndexByte/32-16 3.034n ± 1% 3.010n ± 2% ~ (p=0.305 n=10) IndexByte/4K-16 57.20n ± 2% 39.58n ± 2% -30.81% (p=0.000 n=10) IndexByte/4M-16 34.48µ ± 1% 33.83µ ± 2% -1.87% (p=0.023 n=10) IndexByte/64M-16 1.493m ± 2% 1.450m ± 2% -2.89% (p=0.000 n=10) IndexBytePortable/10-16 3.172n ± 4% 3.163n ± 2% ~ (p=0.684 n=10) IndexBytePortable/32-16 8.465n ± 2% 8.375n ± 3% ~ (p=0.631 n=10) IndexBytePortable/4K-16 852.0n ± 1% 846.6n ± 3% ~ (p=0.971 n=10) IndexBytePortable/4M-16 868.2µ ± 2% 856.6µ ± 2% ~ (p=0.393 n=10) IndexBytePortable/64M-16 13.81m ± 2% 13.88m ± 3% ~ (p=0.684 n=10) geomean 1.204µ 1.148µ -4.63% │ old.txt │ new.txt │ │ B/s │ B/s vs base │ IndexByte/10-16 3.565Gi ± 1% 3.641Gi ± 1% +2.15% (p=0.015 n=10) IndexByte/32-16 9.821Gi ± 1% 9.899Gi ± 2% ~ (p=0.315 n=10) IndexByte/4K-16 66.70Gi ± 2% 96.39Gi ± 2% +44.52% (p=0.000 n=10) IndexByte/4M-16 113.3Gi ± 1% 115.5Gi ± 2% +1.91% (p=0.023 n=10) IndexByte/64M-16 41.85Gi ± 2% 43.10Gi ± 2% +2.98% (p=0.000 n=10) IndexBytePortable/10-16 2.936Gi ± 4% 2.945Gi ± 2% ~ (p=0.684 n=10) IndexBytePortable/32-16 3.521Gi ± 2% 3.559Gi ± 3% ~ (p=0.631 n=10) IndexBytePortable/4K-16 4.477Gi ± 1% 4.506Gi ± 3% ~ (p=0.971 n=10) IndexBytePortable/4M-16 4.499Gi ± 2% 4.560Gi ± 2% ~ (p=0.393 n=10) IndexBytePortable/64M-16 4.525Gi ± 2% 4.504Gi ± 3% ~ (p=0.684 n=10) geomean 10.04Gi 10.53Gi +4.86% For #63678 Change-Id: I0571c2b540a816d57bd6ed8bb1df4191c7992d92 GitHub-Last-Rev: 7e95b8b GitHub-Pull-Request: #63847 Reviewed-on: https://go-review.googlesource.com/c/go/+/538715 Reviewed-by: David Chase <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://go.dev/cl/539976 mentions this issue: |
goos: linux goarch: amd64 pkg: bytes cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ Equal/0-8 0.2800n ± 22% 0.2865n ± 26% ~ (p=0.075 n=10) Equal/1-8 18.57n ± 2% 19.34n ± 6% +4.15% (p=0.014 n=10) Equal/6-8 19.07n ± 1% 19.38n ± 2% +1.63% (p=0.014 n=10) Equal/9-8 19.39n ± 2% 19.05n ± 1% -1.78% (p=0.005 n=10) Equal/15-8 19.46n ± 1% 19.10n ± 1% -1.85% (p=0.000 n=10) Equal/16-8 19.36n ± 2% 18.95n ± 1% -2.09% (p=0.011 n=10) Equal/20-8 20.20n ± 1% 19.83n ± 1% -1.86% (p=0.001 n=10) Equal/32-8 20.95n ± 1% 20.84n ± 1% -0.57% (p=0.010 n=10) Equal/4K-8 97.40n ± 2% 81.34n ± 3% -16.49% (p=0.000 n=10) Equal/4M-8 81.74µ ± 3% 71.52µ ± 4% -12.49% (p=0.000 n=10) Equal/64M-8 1.319m ± 1% 1.139m ± 3% -13.68% (p=0.000 n=10) EqualBothUnaligned/64_0-8 8.707n ± 4% 8.588n ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 8.513n ± 3% 8.614n ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 8.752n ± 3% 8.637n ± 4% ~ (p=0.148 n=10) EqualBothUnaligned/64_7-8 8.742n ± 3% 8.514n ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 89.87n ± 3% 70.44n ± 5% -21.63% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 91.67n ± 5% 70.89n ± 3% -22.67% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 90.43n ± 2% 70.52n ± 3% -22.01% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 89.53n ± 3% 72.02n ± 5% -19.56% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 86.43µ ± 3% 73.40µ ± 4% -15.07% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 85.48µ ± 2% 75.35µ ± 1% -11.85% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 86.51µ ± 3% 75.44µ ± 4% -12.80% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 86.40µ ± 3% 74.41µ ± 3% -13.88% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 1.374m ± 3% 1.171m ± 3% -14.75% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 1.401m ± 4% 1.198m ± 4% -14.49% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 1.393m ± 4% 1.205m ± 4% -13.53% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 1.396m ± 3% 1.199m ± 4% -14.11% (p=0.000 n=10) geomean 735.7n 666.7n -9.39% │ master │ HEAD │ │ B/s │ B/s vs base │ Equal/1-8 51.36Mi ± 2% 49.32Mi ± 6% -3.98% (p=0.015 n=10) Equal/6-8 300.0Mi ± 1% 295.3Mi ± 2% -1.57% (p=0.011 n=10) Equal/9-8 442.5Mi ± 2% 450.6Mi ± 1% +1.82% (p=0.005 n=10) Equal/15-8 734.9Mi ± 1% 748.8Mi ± 1% +1.90% (p=0.000 n=10) Equal/16-8 788.4Mi ± 2% 805.2Mi ± 1% +2.14% (p=0.011 n=10) Equal/20-8 944.2Mi ± 1% 961.8Mi ± 1% +1.87% (p=0.002 n=10) Equal/32-8 1.422Gi ± 0% 1.430Gi ± 1% +0.58% (p=0.011 n=10) Equal/4K-8 39.17Gi ± 2% 46.90Gi ± 3% +19.74% (p=0.000 n=10) Equal/4M-8 47.79Gi ± 3% 54.62Gi ± 4% +14.27% (p=0.000 n=10) Equal/64M-8 47.38Gi ± 1% 54.89Gi ± 3% +15.85% (p=0.000 n=10) EqualBothUnaligned/64_0-8 6.845Gi ± 4% 6.940Gi ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 7.002Gi ± 3% 6.919Gi ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 6.811Gi ± 3% 6.901Gi ± 4% ~ (p=0.165 n=10) EqualBothUnaligned/64_7-8 6.819Gi ± 3% 7.002Gi ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 42.45Gi ± 3% 54.16Gi ± 5% +27.60% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 41.61Gi ± 6% 53.82Gi ± 3% +29.33% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 42.19Gi ± 2% 54.09Gi ± 3% +28.22% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 42.61Gi ± 3% 52.97Gi ± 5% +24.33% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 45.20Gi ± 3% 53.22Gi ± 4% +17.75% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 45.70Gi ± 2% 51.84Gi ± 1% +13.43% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 45.15Gi ± 3% 51.78Gi ± 4% +14.68% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 45.21Gi ± 3% 52.50Gi ± 4% +16.12% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 45.50Gi ± 3% 53.37Gi ± 3% +17.30% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 44.63Gi ± 4% 52.17Gi ± 4% +16.89% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 44.86Gi ± 4% 51.88Gi ± 4% +15.65% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 44.76Gi ± 3% 52.12Gi ± 4% +16.43% (p=0.000 n=10) geomean 9.734Gi 10.79Gi +10.88% For golang#63678
Change https://go.dev/cl/541756 mentions this issue: |
For #63678 goos: linux goarch: amd64 pkg: runtime cpu: AMD EPYC Processor │ base.txt │ 16.txt │ │ sec/op │ sec/op vs base │ Hash5-2 4.969n ± 1% 4.583n ± 1% -7.75% (n=100) Hash16-2 4.966n ± 1% 4.536n ± 1% -8.65% (n=100) Hash64-2 5.687n ± 1% 5.726n ± 1% ~ (p=0.181 n=100) Hash1024-2 26.73n ± 1% 25.72n ± 1% -3.76% (n=100) Hash65536-2 1.345µ ± 0% 1.331µ ± 0% -1.04% (p=0.000 n=100) HashStringSpeed-2 12.76n ± 1% 12.53n ± 1% -1.76% (p=0.000 n=100) HashBytesSpeed-2 20.13n ± 1% 19.96n ± 1% ~ (p=0.176 n=100) HashInt32Speed-2 9.065n ± 1% 9.007n ± 1% ~ (p=0.247 n=100) HashInt64Speed-2 9.076n ± 1% 9.027n ± 1% ~ (p=0.179 n=100) HashStringArraySpeed-2 33.33n ± 1% 32.94n ± 3% -1.19% (p=0.028 n=100) FastrandHashiter-2 16.47n ± 0% 16.54n ± 1% +0.39% (p=0.013 n=100) geomean 17.85n 17.43n -2.33% │ base.txt │ 16.txt │ │ B/s │ B/s vs base │ Hash5-2 959.7Mi ± 1% 1040.4Mi ± 1% +8.41% (p=0.000 n=100) Hash16-2 3.001Gi ± 1% 3.285Gi ± 1% +9.48% (p=0.000 n=100) Hash64-2 10.48Gi ± 1% 10.41Gi ± 1% ~ (p=0.179 n=100) Hash1024-2 35.68Gi ± 1% 37.08Gi ± 1% +3.92% (p=0.000 n=100) Hash65536-2 45.41Gi ± 0% 45.86Gi ± 0% +1.01% (p=0.000 n=100) geomean 8.626Gi 9.001Gi +4.35% Change-Id: Icf98dc935181ea5d30f7cbd5dcf284ec7aef8e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/539976 Run-TryBot: qiulaidongfeng <[email protected]> Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: David Chase <[email protected]>
goos: linux goarch: amd64 pkg: bytes cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ Equal/0-8 0.2800n ± 22% 0.2865n ± 26% ~ (p=0.075 n=10) Equal/1-8 18.57n ± 2% 19.34n ± 6% +4.15% (p=0.014 n=10) Equal/6-8 19.07n ± 1% 19.38n ± 2% +1.63% (p=0.014 n=10) Equal/9-8 19.39n ± 2% 19.05n ± 1% -1.78% (p=0.005 n=10) Equal/15-8 19.46n ± 1% 19.10n ± 1% -1.85% (p=0.000 n=10) Equal/16-8 19.36n ± 2% 18.95n ± 1% -2.09% (p=0.011 n=10) Equal/20-8 20.20n ± 1% 19.83n ± 1% -1.86% (p=0.001 n=10) Equal/32-8 20.95n ± 1% 20.84n ± 1% -0.57% (p=0.010 n=10) Equal/4K-8 97.40n ± 2% 81.34n ± 3% -16.49% (p=0.000 n=10) Equal/4M-8 81.74µ ± 3% 71.52µ ± 4% -12.49% (p=0.000 n=10) Equal/64M-8 1.319m ± 1% 1.139m ± 3% -13.68% (p=0.000 n=10) EqualBothUnaligned/64_0-8 8.707n ± 4% 8.588n ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 8.513n ± 3% 8.614n ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 8.752n ± 3% 8.637n ± 4% ~ (p=0.148 n=10) EqualBothUnaligned/64_7-8 8.742n ± 3% 8.514n ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 89.87n ± 3% 70.44n ± 5% -21.63% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 91.67n ± 5% 70.89n ± 3% -22.67% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 90.43n ± 2% 70.52n ± 3% -22.01% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 89.53n ± 3% 72.02n ± 5% -19.56% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 86.43µ ± 3% 73.40µ ± 4% -15.07% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 85.48µ ± 2% 75.35µ ± 1% -11.85% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 86.51µ ± 3% 75.44µ ± 4% -12.80% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 86.40µ ± 3% 74.41µ ± 3% -13.88% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 1.374m ± 3% 1.171m ± 3% -14.75% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 1.401m ± 4% 1.198m ± 4% -14.49% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 1.393m ± 4% 1.205m ± 4% -13.53% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 1.396m ± 3% 1.199m ± 4% -14.11% (p=0.000 n=10) geomean 735.7n 666.7n -9.39% │ master │ HEAD │ │ B/s │ B/s vs base │ Equal/1-8 51.36Mi ± 2% 49.32Mi ± 6% -3.98% (p=0.015 n=10) Equal/6-8 300.0Mi ± 1% 295.3Mi ± 2% -1.57% (p=0.011 n=10) Equal/9-8 442.5Mi ± 2% 450.6Mi ± 1% +1.82% (p=0.005 n=10) Equal/15-8 734.9Mi ± 1% 748.8Mi ± 1% +1.90% (p=0.000 n=10) Equal/16-8 788.4Mi ± 2% 805.2Mi ± 1% +2.14% (p=0.011 n=10) Equal/20-8 944.2Mi ± 1% 961.8Mi ± 1% +1.87% (p=0.002 n=10) Equal/32-8 1.422Gi ± 0% 1.430Gi ± 1% +0.58% (p=0.011 n=10) Equal/4K-8 39.17Gi ± 2% 46.90Gi ± 3% +19.74% (p=0.000 n=10) Equal/4M-8 47.79Gi ± 3% 54.62Gi ± 4% +14.27% (p=0.000 n=10) Equal/64M-8 47.38Gi ± 1% 54.89Gi ± 3% +15.85% (p=0.000 n=10) EqualBothUnaligned/64_0-8 6.845Gi ± 4% 6.940Gi ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 7.002Gi ± 3% 6.919Gi ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 6.811Gi ± 3% 6.901Gi ± 4% ~ (p=0.165 n=10) EqualBothUnaligned/64_7-8 6.819Gi ± 3% 7.002Gi ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 42.45Gi ± 3% 54.16Gi ± 5% +27.60% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 41.61Gi ± 6% 53.82Gi ± 3% +29.33% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 42.19Gi ± 2% 54.09Gi ± 3% +28.22% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 42.61Gi ± 3% 52.97Gi ± 5% +24.33% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 45.20Gi ± 3% 53.22Gi ± 4% +17.75% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 45.70Gi ± 2% 51.84Gi ± 1% +13.43% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 45.15Gi ± 3% 51.78Gi ± 4% +14.68% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 45.21Gi ± 3% 52.50Gi ± 4% +16.12% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 45.50Gi ± 3% 53.37Gi ± 3% +17.30% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 44.63Gi ± 4% 52.17Gi ± 4% +16.89% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 44.86Gi ± 4% 51.88Gi ± 4% +15.65% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 44.76Gi ± 3% 52.12Gi ± 4% +16.43% (p=0.000 n=10) geomean 9.734Gi 10.79Gi +10.88% For #63678 Change-Id: I427b8756e361fd4d36984c2bdb8bc3661ac3a0b8 GitHub-Last-Rev: 981d272 GitHub-Pull-Request: #63757 Reviewed-on: https://go-review.googlesource.com/c/go/+/537995 Reviewed-by: David Chase <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: qiulaidongfeng <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Mauri de Souza Meneguzzo <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]>
For #63678 Benchmark on Milk-V Mars CM eMMC (Starfive/JH7110 SoC) goos: linux goarch: riscv64 pkg: bytes │ /root/bytes.old.bench │ /root/bytes.pc16.bench │ │ sec/op │ sec/op vs base │ Count/10 223.9n ± 1% 220.8n ± 1% -1.36% (p=0.001 n=10) Count/32 571.6n ± 0% 571.3n ± 0% ~ (p=0.054 n=10) Count/4K 38.56µ ± 0% 38.55µ ± 0% -0.01% (p=0.010 n=10) Count/4M 40.13m ± 0% 39.21m ± 0% -2.28% (p=0.000 n=10) Count/64M 627.5m ± 0% 627.4m ± 0% -0.01% (p=0.019 n=10) CountEasy/10 101.3n ± 0% 101.3n ± 0% ~ (p=1.000 n=10) ¹ CountEasy/32 139.3n ± 0% 139.3n ± 0% ~ (p=1.000 n=10) ¹ CountEasy/4K 5.565µ ± 0% 5.564µ ± 0% -0.02% (p=0.001 n=10) CountEasy/4M 5.619m ± 0% 5.619m ± 0% ~ (p=0.190 n=10) CountEasy/64M 89.94m ± 0% 89.93m ± 0% ~ (p=0.436 n=10) CountSingle/10 53.80n ± 0% 46.06n ± 0% -14.39% (p=0.000 n=10) CountSingle/32 104.30n ± 0% 79.64n ± 0% -23.64% (p=0.000 n=10) CountSingle/4K 10.413µ ± 0% 7.247µ ± 0% -30.40% (p=0.000 n=10) CountSingle/4M 11.603m ± 0% 8.388m ± 0% -27.71% (p=0.000 n=10) CountSingle/64M 230.9m ± 0% 172.3m ± 0% -25.40% (p=0.000 n=10) CountHard1 9.981m ± 0% 9.981m ± 0% ~ (p=0.810 n=10) CountHard2 9.981m ± 0% 9.981m ± 0% ~ (p=0.315 n=10) CountHard3 9.981m ± 0% 9.981m ± 0% ~ (p=0.159 n=10) geomean 144.6µ 133.5µ -7.70% ¹ all samples are equal │ /root/bytes.old.bench │ /root/bytes.pc16.bench │ │ B/s │ B/s vs base │ Count/10 42.60Mi ± 1% 43.19Mi ± 1% +1.39% (p=0.001 n=10) Count/32 53.38Mi ± 0% 53.42Mi ± 0% +0.06% (p=0.049 n=10) Count/4K 101.3Mi ± 0% 101.3Mi ± 0% ~ (p=0.077 n=10) Count/4M 99.68Mi ± 0% 102.01Mi ± 0% +2.34% (p=0.000 n=10) Count/64M 102.0Mi ± 0% 102.0Mi ± 0% ~ (p=0.076 n=10) CountEasy/10 94.18Mi ± 0% 94.18Mi ± 0% ~ (p=0.054 n=10) CountEasy/32 219.1Mi ± 0% 219.1Mi ± 0% +0.01% (p=0.016 n=10) CountEasy/4K 702.0Mi ± 0% 702.0Mi ± 0% +0.00% (p=0.000 n=10) CountEasy/4M 711.9Mi ± 0% 711.9Mi ± 0% ~ (p=0.133 n=10) CountEasy/64M 711.6Mi ± 0% 711.7Mi ± 0% ~ (p=0.447 n=10) CountSingle/10 177.2Mi ± 0% 207.0Mi ± 0% +16.81% (p=0.000 n=10) CountSingle/32 292.7Mi ± 0% 383.2Mi ± 0% +30.91% (p=0.000 n=10) CountSingle/4K 375.1Mi ± 0% 539.0Mi ± 0% +43.70% (p=0.000 n=10) CountSingle/4M 344.7Mi ± 0% 476.9Mi ± 0% +38.33% (p=0.000 n=10) CountSingle/64M 277.2Mi ± 0% 371.5Mi ± 0% +34.05% (p=0.000 n=10) geomean 199.7Mi 219.8Mi +10.10% Change-Id: I1abf6b220b9802028f8ad5eebc8d3b7cfa3e89ea Reviewed-on: https://go-review.googlesource.com/c/go/+/541756 Reviewed-by: David Chase <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: Joel Sing <[email protected]> Run-TryBot: M Zhuo <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Wang Yaduo <[email protected]> Reviewed-by: Mark Ryan <[email protected]>
Issue #56474 added support for instruction alignment on the amd64 architecture. This is achieved with the PCALIGN assembly pseudo instruction, which inserts NOP's to align the next instruction to a given boundary.
Since this feature is pretty new on amd64, we didn't had much time to check which assembly routines in the runtime/libraries would benefit com instruction alignment. In most cases, the effect of instruction alignment is minimal, but on critical subroutines and critical innermost loops it can deliver a significant boost in performance.
Some examples:
There are multiple places were we might get interesting results using PCALIGN in amd64 assembly:
runtime
functions (memmove, memclr, etc)internal/bytealg
crypto/*
*amd64*.s
hot loops/critical sections in assembly codeGenerally a 16-byte alignment works fine, while 32-byte is better when aligning AVX2 instructions. Be careful when overusing it, routines may end up slower than before.
In order to verify if there are any speedups you can run the benchmarks for the affected package with a higher count, at least
-count=10
. Then use benchstat to compare the before/after results. I'd say anything higher than 3-5% consistently is worth submitting.Code alignment is already supported on ppc64, arm64, loong64 and riscv64. Feel free to look into improvements for these architectures as well! You can rely on qemu-static to run the benchmarks after compiling the tests with
go test -c
.Happy hacking!
The text was updated successfully, but these errors were encountered: