Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some JSON test suites crash in ConsumeStringAndValidate on Red Hat CI machines #41582

Closed
tmds opened this issue Aug 31, 2020 · 9 comments
Closed

Comments

@tmds
Copy link
Member

tmds commented Aug 31, 2020

We do a daily build+test run of dotnet/runtime on Fedora 32 and RHEL8. Since Aug 6th, a number of tests are crashing in ConsumeStringAndValidate

  • Microsoft.Extensions.Configuration.Json.Tests
  • Microsoft.Extensions.Configuration.Functional.Tests
  • Microsoft.Extensions.DependencyModel.Tests
  • System.Net.Http.Json.Unit.Tests
  • System.Net.Http.Json.Functional.Tests
  • System.Text.Json.Tests

Example stack trace:

~/runtime/artifacts/bin/Microsoft.Extensions.Configuration.Json.Tests/net5.0-Debug ~/runtime/src/libraries/Microsoft.Extensions.Configuration.Json/tests
  �[37m  Discovering: Microsoft.Extensions.Configuration.Json.Tests (method display = ClassAndMethod, method display options = None)
  �[m�[37m  Discovered:  Microsoft.Extensions.Configuration.Json.Tests (found 41 test cases)
  �[m�[37m  Starting:    Microsoft.Extensions.Configuration.Json.Tests (parallel test collections = on, max threads = 4)
  Process terminated. Assertion failed.
     at System.Text.Json.Utf8JsonReader.ConsumeStringAndValidate(ReadOnlySpan`1 data, Int32 idx) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs:line 1316
     at System.Text.Json.Utf8JsonReader.ConsumeString() in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs:line 1295
     at System.Text.Json.Utf8JsonReader.ConsumeValue(Byte marker) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs:line 1038
     at System.Text.Json.Utf8JsonReader.ReadSingleSegment() in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs:line 881
     at System.Text.Json.Utf8JsonReader.Read() in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs:line 275
     at System.Text.Json.JsonDocument.Parse(ReadOnlySpan`1 utf8JsonSpan, JsonReaderOptions readerOptions, MetadataDb& database, StackRowStack& stack) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs:line 943
     at System.Text.Json.JsonDocument.Parse(ReadOnlyMemory`1 utf8Json, JsonReaderOptions readerOptions, Byte[] extraRentedBytes) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:line 549
     at System.Text.Json.JsonDocument.Parse(ReadOnlyMemory`1 json, JsonDocumentOptions options) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:line 214
     at System.Text.Json.JsonDocument.Parse(String json, JsonDocumentOptions options) in /home/tester/runtime/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:line 246
     at Microsoft.Extensions.Configuration.Json.JsonConfigurationFileParser.ParseStream(Stream input) in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration.Json/src/JsonConfigurationFileParser.cs:line 34
     at Microsoft.Extensions.Configuration.Json.JsonConfigurationFileParser.Parse(Stream input) in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration.Json/src/JsonConfigurationFileParser.cs:line 21
     at Microsoft.Extensions.Configuration.Json.JsonConfigurationProvider.Load(Stream stream) in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration.Json/src/JsonConfigurationProvider.cs:line 30
     at Microsoft.Extensions.Configuration.ConfigurationProviderJsonTest.<>c__DisplayClass1_0.<LoadThroughProvider>b__0() in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration.Json/tests/ConfigurationProviderJsonTest.cs:line 34
     at Microsoft.Extensions.Configuration.Test.ConfigurationProviderTestBase.BuildConfigRoot(ValueTuple`2[] providers) in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration/tests/ConfigurationProviderTestBase.cs:line 335
     at Microsoft.Extensions.Configuration.Test.ConfigurationProviderTestBase.Has_debug_view() in /home/tester/runtime/src/libraries/Microsoft.Extensions.Configuration/tests/ConfigurationProviderTestBase.cs:line 25
     at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
     at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) in /home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBase.cs:line 49
     at Xunit.Sdk.TestInvoker`1.CallTestMethod(Object testClassInstance) in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\Runners\TestInvoker.cs:line 150
     at Xunit.Sdk.TestInvoker`1.<>c__DisplayClass48_1.<<InvokeTestMethodAsync>b__1>d.MoveNext() in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\Runners\TestInvoker.cs:line 257
     at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) in /home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs:line 42
     at Xunit.Sdk.TestInvoker`1.<>c__DisplayClass48_1.<InvokeTestMethodAsync>b__1()
...

When I run on my development machine, the tests pass without crashing.

cc @omajid @RheaAyase

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Aug 31, 2020
@layomia
Copy link
Contributor

layomia commented Sep 1, 2020

The code paths cited in the description have not been modified since 3.1, so I don't believe this is a regression in 5.0. Will triage as a 6.0 issue.

@tmds when did you start running these tests? You mention Aug 6th, but I don't see any commits in System.Text.Json around that date that would have triggered new failures.

Also, are the stack traces identical in each failure case? Are you able to point me to any actual CI logs/artifacts?

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Sep 1, 2020
@layomia layomia added this to the 6.0.0 milestone Sep 1, 2020
@tmds
Copy link
Member Author

tmds commented Sep 3, 2020

I debugged the issue and traced it down to this function returning an invalid index:

// Vectorized search for either quote, backslash, or any control character.
// If the first found byte is a quote, we have reached an end of string, and
// can avoid validation.
// Otherwise, in the uncommon case, iterate one character at a time and validate.
int idx = localBuffer.IndexOfQuoteOrAnyControlOrBackSlash();

This is performing a vectorized search, and System.Numerics.Vectors.Tests are also failing on our CI machine: #41584. It will be the same root cause related to vectorization.

@mikem8361 to debug this I have configured out CI machine to capture coredumps and the testhost.

When I pass the testhost dotnet and a core-file to lldb, sos debugging doesn't work:

$ lldb /tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet --core core.14813 
(lldb) target create "/tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet" --core "core.14813"
Core file '/tmp/coredumps/core.14813' (x86_64) was loaded.
(lldb) clrstack
Failed to find runtime module (libcoreclr.so), 0x80070057
Extension commands need it in order to have something to do.
ClrStack  failed

On the CI machine, the testhost lives at /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64. When I put the testhost at the same location on my development machine sos debugging works.

Can I make it work without putting the testhost at the same spot?

@mikem8361
Copy link
Member

mikem8361 commented Sep 3, 2020 via email

@layomia
Copy link
Contributor

layomia commented Sep 4, 2020

@tmds it looks like this is caused by the other issue you opened - #41584. Can we close this issue and reopen it if the solution there doesn't fix this issue? cc @tannergooding

@ahsonkhan
Copy link
Member

Also cc @benaadams who has been working in this space recently:
#40729
#40747
#41097

@tmds
Copy link
Member Author

tmds commented Sep 4, 2020

@mikem8361

Is the testhost called “dotnet”? Just checking.

Yes, this is the command line that gets executed:

/home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet exec --runtimeconfig Microsoft.Extensions.Configuration.Functional.Tests.runtimeconfig.json --depsfile Microsoft.Extensions.Configuration.Functional.Tests.deps.json xunit.console.dll Microsoft.Extensions.Configuration.Functional.Tests.dll -xml testResults.xml -nologo -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 

As far as I know putting the full path to the host program on the command should work no matter where it is.

That doesn't work for me:

$ lldb /tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet --core core.14813 
(lldb) target create "/tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet" --core "core.14813"
Core file '/tmp/coredumps/core.14813' (x86_64) was loaded.
(lldb) clrstack
Failed to find runtime module (libcoreclr.so), 0x80070057
Extension commands need it in order to have something to do.
ClrStack  failed

The way you can tell that lldb picked up libcoreclr.so in that module list it doesn’t have the pointer after the name:

This doesn't work for me either:

$ ls
core.14813  dotnet  host  shared
$ lldb dotnet --core core.14813 
(lldb) target create "dotnet" --core "core.14813"
Core file '/tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/core.14813' (x86_64) was loaded.
(lldb) clrstack
Failed to find runtime module (libcoreclr.so), 0x80070057
Extension commands need it in order to have something to do.
ClrStack  failed

I can only make it work when I put the testhost folder at the exact same location on my development machine as where it was on the CI machine (/home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/).

if you got the right host in the right location is that target modules list displays the full set of modules (including libcoreclr.so). The way you can tell that lldb picked up libcoreclr.so in that module list it doesn’t have the pointer after the name:

This is the output of target modules list (with corefile in same folder as dotnet executable):

(lldb) target modules list
[  0] 3A0B0829-FB6E-1052-BFE1-9B5D10133EBC-C0D6E4FD 0x0000556413437000 /tmp/testhost/artifacts/bin/testhost/net5.0-Linux-Debug-x64/dotnet 
[  1] 4A2EA00E-AEB6-26C0-FE0B-4825BF71BB38-9B9CDE62 0x00007ffdbd35c000 [vdso] (0x00007ffdbd35c000)
[  2] E63E30F8 0x00007ffdbd35c000 linux-vdso.so.1 (0x00007ffdbd35c000)
[  3] D7525319-1E7D-61F6-945C-8268759AB2AF-BABF2BC1 0x00007f1dc8a2c000 /lib64/libpthread.so.0 
[  4] 0BABFB17-6B9E-8740-B54B-689C51A6EBFC-1679777B 0x00007f1dc8a25000 /lib64/libdl.so.2 
[  5] 7AFD1CED-0AD3-B1FD-FE1B-121AE2928814-39AF5E57 0x00007f1dc8835000 /lib64/libstdc++.so.6 
[  6] B879FFC6-4219-C355-4850-4494C08999EF-2DE88308 0x00007f1dc86ef000 /lib64/libm.so.6 
[  7] C113C8AF-01F8-512E-1470-FBAE7D7C18BD-BE52FF91 0x00007f1dc86d4000 /lib64/libgcc_s.so.1 
[  8] 7CA24D4D-C3DE-9D62-D9AD-6BB25E5B70A3-E57A342F 0x00007f1dc850a000 /lib64/libc.so.6 
[  9] B840F4E4-E3D5-77B8-705F-1309D8CAA517-2F65E11C 0x00007f1dc8a66000 /lib64/ld-linux-x86-64.so.2 
[ 10] B0768CFC 0x00007f1dc82a1000 /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/host/fxr/6.0.0/libhostfxr.so (0x00007f1dc82a1000)
[ 11] B8524B42 0x00007f1dc8049000 /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libhostpolicy.so (0x00007f1dc8049000)
[ 12] A217988E                    /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libcoreclr.so (0x00007f1dc795d000)
[ 13] 640EFE2F-F2F5-4AE9-4CFE-83A5B6F4A8D2-0CE7D492 0x00007f1dc7952000 /lib64/librt.so.1 
[ 14] D8697A59                    /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libcoreclrtraceptprovider.so (0x00007f1dc789d000)
[ 15] A496EB46-1301-A219-4917-2A4FED6FC026-3D51A07C 0x00007f1dc781f000 /lib64/liblttng-ust.so.0 
[ 16] 19549AD8-19E7-1F85-295E-953B98684739-9D9DC853 0x00007f1dc7801000 /lib64/liblttng-ust-tracepoint.so.0 
[ 17] C3E9F2B8-96E1-1699-DC6D-29BF72B98467-3BC99A59 0x00007f1dc77f3000 /lib64/libnuma.so.1 
[ 18] 5C23895A-FF47-B2A2-21AE-7DBDD436A99D-50297B34 0x00007f1dc77e8000 /lib64/liburcu-bp.so.6 
[ 19] 255E95D7-3EE9-89FB-6D02-5DD13EC85A27-8A1D74EF 0x00007f1dc77dc000 /lib64/liburcu-cds.so.6 
[ 20] 47E5204E-3CA6-C424-BED7-DE9C0C9F0DB8-5EB08043 0x00007f1dc77d5000 /lib64/liburcu-common.so.6 
[ 21] 4C04B0CD                    /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libclrjit.so (0x00007f1db7a37000)
[ 22] E7410302                    /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libSystem.Native.so (0x00007f1dc40ac000)
[ 23] C467F806-B6D4-1352-E6B9-0DB88AB546B5-635F90E6 0x00007f1d4d384000 /lib64/libicuuc.so.65 
[ 24] 60CDE2C0-1E3F-2B35-1078-101C6B5C5E74-47B00A62 0x00007f1d1e54b000 /lib64/libicudata.so.65 
[ 25] B90A697B-370D-3910-6646-097ECE46DB5F-77939801 0x00007f1d4d07d000 /lib64/libicui18n.so.65 
[ 26] 7BC9F980                    /home/tester/runtime/artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/libSystem.IO.Compression.Native.so (0x00007f1cfdebb000)
[ 27] 5AAE69EA-C7B4-C4D0-2837-C950D6616DAF-53FCB0D3 0x00007f1d1c0d0000 /lib64/libz.so.1 

@benaadams
Copy link
Member

I debugged the issue and traced it down to this function returning an invalid index: IndexOfQuoteOrAnyControlOrBackSlash

#41097 changes the method IndexOfQuoteOrAnyControlOrBackSlash but it hasn't been merged; could you test that change to see if it works in this scenario? 🤔

@mikem8361
Copy link
Member

mikem8361 commented Sep 4, 2020 via email

@tmds
Copy link
Member Author

tmds commented Sep 7, 2020

@mikem8361 thanks for the additional info and FAQ link.

@tmds it looks like this is caused by the other issue you opened - #41584. Can we close this issue and reopen it if the solution there doesn't fix this issue? cc @tannergooding

I'll close this now.

@tmds tmds closed this as completed Sep 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants