-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Numerics.Tensors tests timing out under DOTNET_JITMinOpts=1 on x86 #97629
Comments
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors Issue DetailsError Blob{
"ErrorMessage": "'System.Numerics.Tensors.Tests' END OF WORK ITEM LOG: Command timed out, and was killed",
"BuildRetry": false,
"ErrorPattern": "",
"ExcludeConsoleLog": false
} Reproduction StepsThese tests are timing out when run with Example run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=543546&view=ms.vss-test-web.build-test-results-tab @tannergooding @stephentoub are these tests long-running enough for this to be expected? Should we just disable them in stress configurations? Known issue validationBuild: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=543546
|
They take ~40s in release on my machine. They also rely heavily on generic specialization, inlining, and unused code being eliminated early. Might be worth a short investigation to understand what's causing the timeouts, eg is it just the tests taking longer, is the jit itself actually taking too long, etc |
I took a quick look at one of the timeouts; that one seems to make good progress until runtime/src/libraries/System.Private.CoreLib/src/System/Half.cs Lines 1948 to 2000 in a78ddcc
and runtime/src/libraries/System.Private.CoreLib/src/System/Int128.cs Lines 1731 to 1786 in a78ddcc
which, when unfolded, invokes the JIT_GetRuntimeType fcall a number of times, which is itself quite slow in checked builds. A perfview trace for a couple of minutes when running this test single threaded gives
I didn't see any indication that something is deadlocked, so seems like we should just disable under minopts/stress. |
That test |
The whole test suite takes ~40s on my machine. I'm surprised that one test would be taking minutes. That's with a release runtime? |
No, sorry (I edited my comment) -- with a checked runtime but with optimized codegen, I meant. |
I'm not sure how much more light weight such tests can really be made either. We have complex functionality here that needs to be tested. Due to the API signature and the need to use generics, we need to type check to ensure efficiency, and we need to test a range of edge case behaviors. Would splitting it out into more tests help here? |
I think that would help as well. I do wonder if the overall total of conversions could be reduced, however. Currently it is doing 17 * 17 (all from/to combinations) * 256 * 255 (different tensor lengths) conversions. Perhaps not every combination of from/to needs to try all possible tensor lengths in that range. |
Error Blob
Reproduction Steps
These tests are timing out when run with
DOTNET_JITMinOpts=1
.Example run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=543546&view=ms.vss-test-web.build-test-results-tab
Example log: https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-07f52de3725e4b168a/System.Numerics.Tensors.Net8.Tests/1/console.78bd6b1b.log?helixlogtype=result
@tannergooding @stephentoub are these tests long-running enough for this to be expected? Should we just disable them in stress configurations?
Known issue validation
Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=543546
Error message validated:
'System.Numerics.Tensors.Tests' END OF WORK ITEM LOG: Command timed out, and was killed
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 1/29/2024 9:36:30 AM UTC
Report
Summary
The text was updated successfully, but these errors were encountered: