-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalar/Packed conversions for floating point to integer #97529
Scalar/Packed conversions for floating point to integer #97529
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsNO NEED FOR REVIEW AS OF NOW
|
Diff results for #97529Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 95,287 contexts (40,999 MinOpts, 54,288 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 54,574 (35.48%) Overall (+12,884 bytes)
MinOpts (-512 bytes)
FullOpts (+13,396 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 90,586 contexts (40,285 MinOpts, 50,301 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 47,361 (33.47%) Overall (+12,056 bytes)
MinOpts (-640 bytes)
FullOpts (+12,696 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 95,065 contexts (36,507 MinOpts, 58,558 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 54,513 (35.50%) Overall (+10,872 bytes)
MinOpts (-248 bytes)
FullOpts (+11,120 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 104,582 contexts (34,422 MinOpts, 70,160 FullOpts). MISSED contexts: base: 2,785 (1.74%), diff: 52,364 (32.62%) Overall (+7,880 bytes)
MinOpts (-226 bytes)
FullOpts (+8,106 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 64,348 contexts (22,203 MinOpts, 42,145 FullOpts). MISSED contexts: base: 3 (0.00%), diff: 74,344 (52.26%) Overall (+6,474 bytes)
MinOpts (-42 bytes)
FullOpts (+6,516 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.36% to +0.25%)
MinOpts (-0.01% to +1.09%)
FullOpts (-0.55% to +0.25%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.20% to +0.21%)
MinOpts (0.00% to +0.17%)
FullOpts (-0.43% to +0.21%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.56% to +0.10%)
MinOpts (-0.00% to +0.01%)
FullOpts (-0.65% to +0.10%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.55% to +0.55%)
MinOpts (-0.00% to +0.24%)
FullOpts (-0.63% to +0.74%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.45% to +0.09%)
MinOpts (-0.00% to +0.01%)
FullOpts (-0.57% to +0.09%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.77% to +0.09%)
MinOpts (-0.01% to +0.01%)
FullOpts (-0.93% to +0.09%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-11.45% to -0.38%)
MinOpts (-9.46% to +0.00%)
FullOpts (-11.45% to +0.34%)
Details here Throughput diffs for linux/arm ran on linux/x86Overall (-0.35% to +0.14%)
MinOpts (0.00% to +0.02%)
FullOpts (-0.39% to +0.14%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for windows/x86 ran on windows/x86Overall (-1.58% to +0.11%)
MinOpts (-0.01% to +0.01%)
FullOpts (-1.80% to +0.11%)
Details here |
78e0353
to
a730a9b
Compare
Diff results for #97529Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.36% to +0.25%)
MinOpts (-0.01% to +1.09%)
FullOpts (-0.55% to +0.25%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.21% to +0.20%)
MinOpts (0.00% to +0.15%)
FullOpts (-0.44% to +0.20%)
Details here |
Diff results for #97529Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 95,287 contexts (40,999 MinOpts, 54,288 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 54,574 (35.48%) Overall (+12,884 bytes)
MinOpts (-512 bytes)
FullOpts (+13,396 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 90,586 contexts (40,285 MinOpts, 50,301 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 47,361 (33.47%) Overall (+12,056 bytes)
MinOpts (-640 bytes)
FullOpts (+12,696 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 95,065 contexts (36,507 MinOpts, 58,558 FullOpts). MISSED contexts: base: 2 (0.00%), diff: 54,513 (35.50%) Overall (+10,872 bytes)
MinOpts (-248 bytes)
FullOpts (+11,120 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 104,582 contexts (34,422 MinOpts, 70,160 FullOpts). MISSED contexts: base: 2,785 (1.74%), diff: 52,364 (32.62%) Overall (+7,880 bytes)
MinOpts (-226 bytes)
FullOpts (+8,106 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 64,348 contexts (22,203 MinOpts, 42,145 FullOpts). MISSED contexts: base: 3 (0.00%), diff: 74,344 (52.26%) Overall (+6,474 bytes)
MinOpts (-42 bytes)
FullOpts (+6,516 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.58% to +0.07%)
FullOpts (-0.67% to +0.07%)
Throughput diffs for linux/x64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.55% to +0.55%)
MinOpts (-0.01% to +0.23%)
FullOpts (-0.63% to +0.73%)
Throughput diffs for osx/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.46% to +0.06%)
FullOpts (-0.59% to +0.07%)
Throughput diffs for windows/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.78% to +0.06%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.95% to +0.07%)
Throughput diffs for windows/x64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-11.46% to -0.39%)
MinOpts (-9.46% to +0.00%)
FullOpts (-11.46% to +0.34%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.36% to +0.25%)
MinOpts (-0.01% to +1.09%)
FullOpts (-0.55% to +0.25%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.21% to +0.20%)
MinOpts (0.00% to +0.15%)
FullOpts (-0.44% to +0.20%)
Details here |
Diff results for #97529Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 95,287 contexts (40,999 MinOpts, 54,288 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 54,574 (35.48%) Overall (+12,884 bytes)
MinOpts (-512 bytes)
FullOpts (+13,396 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 90,586 contexts (40,285 MinOpts, 50,301 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 47,361 (33.47%) Overall (+12,056 bytes)
MinOpts (-640 bytes)
FullOpts (+12,696 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 95,065 contexts (36,507 MinOpts, 58,558 FullOpts). MISSED contexts: base: 2 (0.00%), diff: 54,513 (35.50%) Overall (+10,872 bytes)
MinOpts (-248 bytes)
FullOpts (+11,120 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 104,582 contexts (34,422 MinOpts, 70,160 FullOpts). MISSED contexts: base: 2,785 (1.74%), diff: 52,364 (32.62%) Overall (+7,880 bytes)
MinOpts (-226 bytes)
FullOpts (+8,106 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 64,348 contexts (22,203 MinOpts, 42,145 FullOpts). MISSED contexts: base: 3 (0.00%), diff: 74,344 (52.26%) Overall (+6,474 bytes)
MinOpts (-42 bytes)
FullOpts (+6,516 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.58% to +0.07%)
FullOpts (-0.67% to +0.07%)
Throughput diffs for linux/x64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.55% to +0.55%)
MinOpts (-0.01% to +0.23%)
FullOpts (-0.63% to +0.73%)
Throughput diffs for osx/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.46% to +0.06%)
FullOpts (-0.59% to +0.07%)
Throughput diffs for windows/arm64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.78% to +0.06%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.95% to +0.07%)
Throughput diffs for windows/x64 ran on windows/x64Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-11.46% to -0.39%)
MinOpts (-9.46% to +0.00%)
FullOpts (-11.46% to +0.34%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.36% to +0.25%)
MinOpts (-0.01% to +1.09%)
FullOpts (-0.55% to +0.25%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.21% to +0.20%)
MinOpts (0.00% to +0.15%)
FullOpts (-0.44% to +0.20%)
Details here Throughput diffs for linux/arm ran on linux/x86Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.35% to +0.14%)
MinOpts (0.00% to +0.02%)
FullOpts (-0.39% to +0.14%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for windows/x86 ran on windows/x86Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-1.58% to +0.11%)
MinOpts (-0.01% to +0.01%)
FullOpts (-1.80% to +0.11%)
Details here Throughput diffs for linux/arm ran on linux/x86Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.35% to +0.14%)
MinOpts (0.00% to +0.02%)
FullOpts (-0.39% to +0.15%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/arm ran on windows/x86Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-0.35% to +0.14%)
MinOpts (0.00% to +0.02%)
FullOpts (-0.39% to +0.15%)
Throughput diffs for windows/x86 ran on windows/x86Warning: Different compilers used for base and diff JITs. Results may be misleading. Overall (-1.58% to +0.11%)
MinOpts (-0.01% to +0.01%)
FullOpts (-1.80% to +0.11%)
Details here |
Diff results for #97529Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 108,959 contexts (31,685 MinOpts, 77,274 FullOpts). MISSED contexts: base: 3,443 (1.99%), diff: 60,787 (35.10%) Overall (+11,058 bytes)
MinOpts (-100 bytes)
FullOpts (+11,158 bytes)
Details here Assembly diffs for linux/arm64 ran on windows/x64Diffs are based on 97,556 contexts (38,769 MinOpts, 58,787 FullOpts). MISSED contexts: base: 3 (0.00%), diff: 57,529 (36.17%) Overall (+14,796 bytes)
MinOpts (-524 bytes)
FullOpts (+15,320 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 83,914 contexts (36,117 MinOpts, 47,797 FullOpts). MISSED contexts: base: 9 (0.01%), diff: 41,251 (32.04%) Overall (+11,236 bytes)
MinOpts (-892 bytes)
FullOpts (+12,128 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 87,579 contexts (27,614 MinOpts, 59,965 FullOpts). MISSED contexts: base: 7 (0.00%), diff: 56,114 (38.00%) Overall (+13,532 bytes)
MinOpts (-496 bytes)
FullOpts (+14,028 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.50% to +0.12%)
FullOpts (-0.58% to +0.23%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.49% to +0.70%)
MinOpts (0.00% to +0.07%)
FullOpts (-0.56% to +0.91%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.20% to +0.09%)
MinOpts (-0.01% to +0.01%)
FullOpts (-0.29% to +0.09%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.80% to +0.62%)
FullOpts (-0.99% to +0.83%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-2.25% to +0.00%)
MinOpts (-13.96% to +0.00%)
FullOpts (-2.22% to +0.38%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.19% to +0.18%)
MinOpts (-0.00% to +1.09%)
FullOpts (-0.31% to +0.26%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.27% to +0.54%)
MinOpts (0.00% to +0.09%)
FullOpts (-0.55% to +0.54%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.50% to +0.12%)
FullOpts (-0.58% to +0.23%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.49% to +0.70%)
MinOpts (0.00% to +0.07%)
FullOpts (-0.56% to +0.91%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.20% to +0.09%)
MinOpts (-0.01% to +0.01%)
FullOpts (-0.29% to +0.09%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.80% to +0.62%)
FullOpts (-0.99% to +0.83%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-2.25% to +0.00%)
MinOpts (-13.96% to +0.00%)
FullOpts (-2.22% to +0.38%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.19% to +0.18%)
MinOpts (-0.00% to +1.09%)
FullOpts (-0.31% to +0.26%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.27% to +0.54%)
MinOpts (0.00% to +0.09%)
FullOpts (-0.55% to +0.54%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.21% to +0.14%)
MinOpts (-0.02% to +0.03%)
FullOpts (-0.21% to +0.14%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-1.42% to +0.13%)
MinOpts (-0.02% to +0.01%)
FullOpts (-1.78% to +0.13%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-2.39% to -0.00%)
MinOpts (-52.67% to 0.00%)
FullOpts (-2.39% to -0.00%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-1.77% to +0.00%)
MinOpts (-1.42% to 0.00%)
FullOpts (-1.77% to +0.00%)
Details here |
Diff results for #97529Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 82,026 contexts (29,474 MinOpts, 52,552 FullOpts). MISSED contexts: base: 2,213 (1.72%), diff: 43,415 (33.68%) Overall (-49,260 bytes)
MinOpts (-15,744 bytes)
FullOpts (-33,516 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 56,610 contexts (16,610 MinOpts, 40,000 FullOpts). MISSED contexts: base: 1,422 (1.16%), diff: 62,829 (51.08%) Overall (-49,481 bytes)
MinOpts (-24,699 bytes)
FullOpts (-24,782 bytes)
Details here Assembly diffs for linux/arm64 ran on windows/x64Diffs are based on 91,015 contexts (37,196 MinOpts, 53,819 FullOpts). MISSED contexts: base: 159 (0.11%), diff: 52,639 (35.66%) Overall (-96,164 bytes)
MinOpts (-46,440 bytes)
FullOpts (-49,724 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 79,033 contexts (35,101 MinOpts, 43,932 FullOpts). MISSED contexts: base: 44 (0.04%), diff: 37,673 (31.32%) Overall (-81,444 bytes)
MinOpts (-43,828 bytes)
FullOpts (-37,616 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 80,760 contexts (27,005 MinOpts, 53,755 FullOpts). MISSED contexts: base: 223 (0.17%), diff: 49,330 (36.80%) Overall (-99,128 bytes)
MinOpts (-40,184 bytes)
FullOpts (-58,944 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 79,883 contexts (35,475 MinOpts, 44,408 FullOpts). MISSED contexts: base: 47 (0.04%), diff: 46,452 (35.54%) Overall (-93,689 bytes)
MinOpts (-46,375 bytes)
FullOpts (-47,314 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-2.04% to -0.00%)
MinOpts (-0.61% to +0.00%)
FullOpts (-2.13% to -0.00%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-1.82% to +0.35%)
MinOpts (-0.66% to +0.00%)
FullOpts (-1.82% to +0.47%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-2.02% to -0.00%)
MinOpts (-0.79% to +0.00%)
FullOpts (-2.03% to -0.00%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-4.57% to -0.00%)
MinOpts (-2.46% to 0.00%)
FullOpts (-5.26% to -0.00%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-4.22% to +0.00%)
MinOpts (-13.96% to +0.00%)
FullOpts (-4.20% to +0.00%)
Details here Throughput diffs for linux/arm ran on linux/x86Overall (-4.29% to +0.00%)
MinOpts (-1.65% to +0.00%)
FullOpts (-4.30% to +0.00%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for windows/x86 ran on windows/x86Overall (-2.30% to -0.00%)
MinOpts (-1.18% to +0.00%)
FullOpts (-2.31% to -0.00%)
Details here |
8e3191f
to
488d306
Compare
Diff results for #97529Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 90,968 contexts (39,217 MinOpts, 51,751 FullOpts). MISSED contexts: base: 3,450 (2.42%), diff: 47,377 (33.29%) Overall (-92,984 bytes)
MinOpts (-48,108 bytes)
FullOpts (-44,876 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 92,598 contexts (38,516 MinOpts, 54,082 FullOpts). MISSED contexts: base: 3,640 (2.59%), diff: 43,898 (31.25%) Overall (-96,473 bytes)
MinOpts (-46,977 bytes)
FullOpts (-49,496 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 78,929 contexts (41,462 MinOpts, 37,467 FullOpts). MISSED contexts: base: 2,856 (2.27%), diff: 43,225 (34.38%) Overall (-92,764 bytes)
MinOpts (-55,640 bytes)
FullOpts (-37,124 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 89,877 contexts (39,512 MinOpts, 50,365 FullOpts). MISSED contexts: base: 3,991 (2.85%), diff: 46,273 (33.03%) Overall (-90,944 bytes)
MinOpts (-53,200 bytes)
FullOpts (-37,744 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 93,749 contexts (40,209 MinOpts, 53,540 FullOpts). MISSED contexts: base: 3,301 (2.30%), diff: 45,642 (31.75%) Overall (-98,879 bytes)
MinOpts (-51,463 bytes)
FullOpts (-47,416 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 53,492 contexts (17,492 MinOpts, 36,000 FullOpts). MISSED contexts: base: 3,510 (3.05%), diff: 58,094 (50.45%) Overall (-52,308 bytes)
MinOpts (-27,508 bytes)
FullOpts (-24,800 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-2.89% to -0.00%)
MinOpts (-1.88% to 0.00%)
FullOpts (-2.89% to -0.00%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-2.22% to -0.01%)
MinOpts (-2.34% to 0.00%)
FullOpts (-1.84% to -0.01%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-9.63% to -0.00%)
MinOpts (-0.63% to +0.00%)
FullOpts (-14.40% to -0.00%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-9.59% to +0.01%)
MinOpts (-0.56% to +0.07%)
FullOpts (-13.41% to +0.01%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-6.05% to -0.20%)
MinOpts (-0.79% to 0.00%)
FullOpts (-10.53% to -0.20%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-3.07% to -0.00%)
MinOpts (-0.83% to +0.00%)
FullOpts (-5.35% to -0.00%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-4.31% to -0.52%)
MinOpts (-13.95% to +0.01%)
FullOpts (-5.73% to -0.20%)
Details here Throughput diffs for linux/arm ran on linux/x86Overall (-4.71% to -0.00%)
MinOpts (-0.69% to +0.00%)
FullOpts (-6.56% to -0.00%)
Details here |
Diff results for #97529Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 90,968 contexts (39,217 MinOpts, 51,751 FullOpts). MISSED contexts: base: 3,450 (2.42%), diff: 47,377 (33.29%) Overall (-92,984 bytes)
MinOpts (-48,108 bytes)
FullOpts (-44,876 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 92,598 contexts (38,516 MinOpts, 54,082 FullOpts). MISSED contexts: base: 3,640 (2.59%), diff: 43,898 (31.25%) Overall (-96,473 bytes)
MinOpts (-46,977 bytes)
FullOpts (-49,496 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 78,929 contexts (41,462 MinOpts, 37,467 FullOpts). MISSED contexts: base: 2,856 (2.27%), diff: 43,225 (34.38%) Overall (-92,764 bytes)
MinOpts (-55,640 bytes)
FullOpts (-37,124 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 89,877 contexts (39,512 MinOpts, 50,365 FullOpts). MISSED contexts: base: 3,991 (2.85%), diff: 46,273 (33.03%) Overall (-90,944 bytes)
MinOpts (-53,200 bytes)
FullOpts (-37,744 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 93,749 contexts (40,209 MinOpts, 53,540 FullOpts). MISSED contexts: base: 3,301 (2.30%), diff: 45,642 (31.75%) Overall (-98,879 bytes)
MinOpts (-51,463 bytes)
FullOpts (-47,416 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 53,492 contexts (17,492 MinOpts, 36,000 FullOpts). MISSED contexts: base: 3,510 (3.05%), diff: 58,094 (50.45%) Overall (-52,308 bytes)
MinOpts (-27,508 bytes)
FullOpts (-24,800 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-9.63% to -0.00%)
MinOpts (-0.63% to +0.00%)
FullOpts (-14.40% to -0.00%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-2.22% to -0.01%)
MinOpts (-2.34% to 0.00%)
FullOpts (-1.84% to -0.01%)
Throughput diffs for osx/arm64 ran on linux/x64Overall (-6.05% to -0.20%)
MinOpts (-0.79% to +0.00%)
FullOpts (-10.53% to -0.20%)
Throughput diffs for windows/arm64 ran on linux/x64Overall (-3.07% to -0.00%)
MinOpts (-0.83% to +0.00%)
FullOpts (-5.35% to -0.00%)
Details here Throughput diffs for windows/x86 ran on linux/x86Overall (-2.29% to -0.00%)
MinOpts (-1.11% to +0.00%)
FullOpts (-2.60% to -0.00%)
Details here |
Diff results for #97529Throughput diffsThroughput diffs for linux/x64 ran on windows/x64Overall (-9.59% to +0.01%)
MinOpts (-0.56% to +0.07%)
FullOpts (-13.41% to +0.01%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-4.31% to -0.52%)
MinOpts (-13.95% to +0.01%)
FullOpts (-5.73% to -0.20%)
Details here Throughput diffs for linux/arm ran on linux/x86Overall (-4.71% to -0.00%)
MinOpts (-0.69% to +0.00%)
FullOpts (-6.56% to -0.00%)
Details here |
tracking #40234 for dotnet/doc breaking change documentation. |
@tannergooding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minus a couple bits of code I think can be removed now.
I do think there are some more optimization opportunities available, but we can handle those in a follow up PR.
CC. @dotnet/jit-contrib for secondary sign-off
8ad1be3
to
c4f28c7
Compare
@@ -791,7 +791,7 @@ void Lowering::LowerPutArgStkOrSplit(GenTreePutArgStk* putArgNode) | |||
// don't expect to see them here. | |||
// i) GT_CAST(float/double, int type with overflow detection) | |||
// | |||
void Lowering::LowerCast(GenTree* tree) | |||
GenTree* Lowering::LowerCast(GenTree* tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the comment about return value in summary docs?
break; | ||
{ | ||
GenTree* nextNode = LowerCast(node); | ||
if (nextNode != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we #ifdef TARGET_XARCH
here to check the nextNode != nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO such micro optimizations should be left up to the compiler. This just adds a potential point of confusion/source of bugs if someone ever modifies an architecture specific LowerCast
outside xarch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any comments beyond those already written.
Yes. The general summary is that floating-point to integer conversions have implementation-defined behavior (per both the IEEE 754 and .NET runtime specifications) when a floating-point value has a greater magnitude than the maximum or minimum value of the target integer type. Historically x86/x64 architectures have returned a sentinel value for these cases, while newer platforms such as Arm64 and WASM have opted to saturate instead. .NET is opting to start standardizing this behavior across platforms and make it consistent by performing saturating conversions everywhere. This saturating behavior more closely mirrors the typical IEEE 754 floating-point rules around other operations, which is that operations are done as if to infinite precision and unbounded range, then rounded to the nearest representable result. It also then mirrors similar decisions for deterministic behavior made by other languages/ecosystems including Rust, Java, etc. Compatibility APIs (as detailed in #61885) are being added in a follow up PR for cases where the platform-specific behavior is desired. This is primarily provided for cases where a developer wants the most performance and doesn’t mind getting different results between x64 and Arm64 machines or between an x64 machine that supports AVX2 (~2013 or later) and an x64 machine that supports AVX512 (~2017 or later). |
Thanks @khushal1996 for the contribution, and for persisting through a long review process! |
Thanks @BruceForstall @tannergooding @jkotas for the review. |
* Saturating floating point to integer conversions on Arm32 Follow up on #97529 (comment) * Fixes, cleanup
Just noting per the above that the regressions are expected. We do want to track them and see if we can minimize the cost further where possible (I had a few ideas to improve the codegen and will take a look at it more deeply). The regressions here are also basically "worst case" scenarios, as they are namely the ones primarily testing floating-point to integer conversion perf. |
* merging with main Initial changes for scalar conversion double -> ulong * Basic working version of double -> ulong saturation * Moving the code in a do-while with proper checks to amke sure we are adding the fixup node at all cases * adjusting comments * Merging with main Saturating NaN to 0 and also adding Dbl2Ulng implementation in MathHelpers. Adding vector conversion support for double /float -> ulong conversion * removing conflicts from gentree.h flags merging with main doubel to uint conversion * float to uint conversion verified. removing commented code * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * progress on double to long morphing * another attempt at double to long conversion * Merge with main Merge with main adding a new helper function ofr float to uint scalar conversion for SSE2. * adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * adding float to int working scalar conversion case. Working on vectro case here on. * partial work on float to int packed conversion * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * complete conversions code for floating point to integral conversions for scalar/packed for SSE / avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * accomodate unsigned in IR * adding evex support for cvttss2si * Mergw with main defining nativeaot helpers for x86 * Catch divide by zero exception * Handle overflow cases * Fix tests to check saturating behavior * Correct mapping of instructions * Convert float -> ulong / long as float -> double -> ulong / long * Merging with main Initial changes for scalar conversion double -> ulong * Merging with main adjusting comments * removing conflicts from gentree.h flags merging with main doubel to uint conversion * merging with main. Making changes to simdashwintrinsic.cpp and listxarch.h float -> uint packed conversion * adding a new helper function ofr float to uint scalar conversion for SSE2. * Merging with main adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512. * partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working * partial version of float to int conversion * working version of float to int scalar/packed for avx512 * Merging with main. fixing out of range test case adn adding conversion changes to simdashwintrinsic * Changing the way helper functions are handled in morph fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level * adding JIT_Dbl2Int for target_x86 and other architectures. * Supporting x86 for saturating conversions as well * fixing errors in packed conversion * Correct mapping of instructions * delete extra files * Merging main review changes * Merge with main and adding new helpers in nativeaot Rebasing with main * changing type of cast node as signed when making cast nodes * Avoiding removing extra element from the stack * Fix formatting, Change comp->IsaSupportedDebugOnly to IsBaselineVector512SupportedDebugOnly * Reverting some changes to maintain uniformity in code * Handling cases where AVX512 is not supported in simdashwintrinsic.cpp * fixing exit conditions for ConvertVectorT_ToDouble * Check for AVX512 support for TARGET_XARCH * Avoid avx512 path for x86 * Enable AVX512F codepath for conversions in x86 arch. Move x86 to using c++ helpers * Add SSE41 path for scalar conversions and 128 bit float to int packed conversions * Adding SSE41 path for floating point to UINT scalar conversions * Add AVX path for ConvertToInt32 * Adding comments and cleaning the code * Fix errors in double to ulong * Addressing review comments * Fix tests * Reverse val < 0 check in dbltoUint and dbltoUlng helpers * Add overflow conversions for 86/x64, remove FastDbl2Lng and inline it * Apply suggestions from code review Co-authored-by: Jan Kotas <[email protected]> * Correct Dbl2UlngOvf * Apply suggestions from code review * Apply suggestions from code review * Update src/coreclr/vm/jithelpers.cpp * Disable failing mono tests * Working version of saturating logic moved to lowering for x86/x64 * Making changes for pre SSE41 * Apply suggestions from code review Co-authored-by: Jan Kotas <[email protected]> * Removing dead code * Fix formatting * Address review comments, add proper docstrings --------- Co-authored-by: Jan Kotas <[email protected]>
) * Saturating floating point to integer conversions on Arm32 Follow up on dotnet#97529 (comment) * Fixes, cleanup
This PR covers the following cases -
Following is the overview of what all has been optimized -->
In the image below -
Helper - The conversion defaults to C++ helper function
Optimized - The conversion is optimized using combination of conversion instruction and other instructions to mimic saturation behavior.
Semi-Optimized - Fallback to using the optimized scalar version.
Case: Double to Ulong scalar
Test code
AVX512
Before
After
Non-AVX512
NO CHANGE
Case: Double to Ulong vector
Test code
AVX512
Before
After
Non-AVX512
NO CHANGE
Case: Float to Ulong scalar
Test code
AVX512
Before
After
Non-AVX512
NO CHANGES
Case: Double to Long scalar
Test code
AVX512
Before
After
###AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: Double to Long packed
Test code
AVX512
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: float to Long scalar
Test code
AVX512
Before
After
AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: double to uint scalar
Test code
AVX512
Before
After
AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: float to uint scalar
Test code
AVX512
Before
After
AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: float to uint packed
Test code
AVX512
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: double to int scalar
Test code
AVX512
Before
After
###AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: float to int scalar
Test code
AVX512
Before
After
AVX
*Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Case: float to int packed
Test code
AVX512
Before
After
###AVX
Before
After
SSE41
Before
After
Pre-SSE41
Before
After
Perf
Base - No changes. Same environment variables
Diff - PR changes. Same environment variables as Base.
Diff / Base - Ratio of diff and base
Scalar
Packed
tracking #40234