Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json.JsonSerializer.Deserialize crashes when parsing large string using Blazor #41604

Closed
vsfeedback opened this issue Aug 28, 2020 · 24 comments
Assignees
Labels
arch-wasm WebAssembly architecture area-VM-meta-mono bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@vsfeedback
Copy link

This issue has been moved from a ticket on Developer Community.


[severity:Other] [regression] [worked-in:somewhere before 16.7.1]
Using Blazor/Wasm (3.2.0) and trying to parse a large json-string (135123600 chars) loaded from a file on my local windows machine, System.Text.Json.JsonSerializer.Deserialize crashes with the following internal exception (or similar, varies somewhat from run to run):

Microsoft.AspNetCore.Components.WebAssembly.Rendering.WebAssemblyRenderer[100]
Unhandled exception rendering component: 'r' is an invalid start of a property name. Expected a '"'. Path: $[4888].ID | LineNumber: 63547 | BytePositionInLine: 0.
System.Text.Json.JsonException: 'r' is an invalid start of a property name. Expected a '"'. Path: $[4888].ID | LineNumber: 63547 | BytePositionInLine: 0. ---> System.Text.Json.JsonReaderException: 'r' is an invalid start of a property name. Expected a '"'. LineNumber: 63547 | BytePositionInLine: 0.
at System.Text.Json.ThrowHelper.ThrowJsonReaderException (System.Text.Json.Utf8JsonReader& json, System.Text.Json.ExceptionResource resource, System.Byte nextByte, System.ReadOnlySpan`1[T] bytes) <0x8575f78 + 0x00020> in :0
at System.Text.Json.Utf8JsonReader.ConsumeNextToken (System.Byte marker) <0x285e1d8 + 0x00264> in :0
at System.Text.Json.Utf8JsonReader.ConsumeNextTokenOrRollback (System.Byte marker) <0x285dfa8 + 0x0003a> in :0
at System.Text.Json.Utf8JsonReader.ReadSingleSegment () <0x2851dd8 + 0x001de> in :0
at System.Text.Json.Utf8JsonReader.Read () <0x2851960 + 0x0000e> in :0
at System.Text.Json.JsonSerializer.ReadCore (System.Text.Json.JsonSerializerOptions options, System.Text.Json.Utf8JsonReader& reader, System.Text.Json.ReadStack& readStack) <0x28799e0 + 0x00060> in :0

Line number of the file and position varies where it fails /(and therefore what is expected (number, string or property name).

The string contains a list of objects like:

[
{
"ID": 0,
"DateTime": "2018-08-09T08:57:34",
"SiteWrapper": null,
"SiteWrapperID": 1,
"Lev": [
40.3,
0,
78.9,
65.6,
55.5
]
},
{
"ID": 0,
"DateTime": "2018-08-09T08:57:35",
"SiteWrapper": null,
"SiteWrapperID": 1,
"Lev": [
70.4,
72.1,
71.2,
0,
64.5
]
},

and so on. (all in all 604621 instances).
(so far tested on Chrome 84.0.4147.135 and Microsoft Edge 44.18362.449.0)


Original Comments

Feedback Bot on 8/27/2020, 01:09 AM:

We have directed your feedback to the appropriate engineering team for further evaluation. The team will review the feedback and notify you about the next steps.


Original Solutions

(no solutions)

@NTaylorMullen
Copy link

@mkArtakMSFT could you label/assign this appropriately.

@benaadams
Copy link
Member

Is the property starting with r at line 63547 in quotes?

@erikthysell
Copy link

No, not at all. Every time it reports something that is not there.

@erikthysell
Copy link

erikthysell commented Aug 28, 2020

@benaadams : Row # 63547 (first row =1) is an "ID" property and row 63547 (first row =0) is a "DateTime" property.
It reaches different positions each run, with same kind of error but always incorrect info about whats on that row.

@erikthysell
Copy link

It is just an array of instances all with the same properties as shown above.

@mkArtakMSFT mkArtakMSFT transferred this issue from dotnet/aspnetcore Aug 31, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Aug 31, 2020
@layomia
Copy link
Contributor

layomia commented Sep 1, 2020

@erikthysell are you able to share a small repro app (sanitized if necessary) that highlights the issue? This would include all types being (de)serialized, any custom converters, and the JSON payload being passed to the serializer.

Also, can you try using .NET 5 preview 8 and see if you still face the issue?

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Sep 1, 2020
@layomia layomia added this to the 5.0.0 milestone Sep 1, 2020
@erikthysell
Copy link

erikthysell commented Sep 2, 2020

@layomia : Sure..
Here is a zip with the test data
Testdata-2018-08-09-15.zip
And here is a test solution/project - it needs the Tewr Blazor FileReader

BlazorTesting.zip

I hope that is all that is needed...
Thanks for all the good you guys do!!

@erikthysell
Copy link

erikthysell commented Sep 3, 2020

@layomia : I can confirm that it happens using .NET 5.0 preview 8 also. But it seems to be working faster and reaches higher row/line numbers before it throws an error.

Unable to convert file content:The JSON value could not be converted to System.Double[]. Path: $[8282].Lev[1] | LineNumber: 107674 | BytePositionInLine: 22.

@layomia
Copy link
Contributor

layomia commented Sep 3, 2020

@erikthysell thanks a lot. I'll take a look.

@ericstj ericstj added arch-wasm WebAssembly architecture bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.) labels Sep 3, 2020
@ericstj
Copy link
Member

ericstj commented Sep 8, 2020

Just grabbed this data file and checked it out. The offending double array looks like this:

"Lev": [
      77.2,
      66.5,
      64.6,
      76,
      70.9
    ]

Looks fine to me. This isn't the first occurrence of a double without a decimal point in the file. @layomia were you able to repro? @erikthysell did this only repro in blazor? When it repros is it always on the same position in the JSON?

@erikthysell
Copy link

@ericstj : I have only tried with blazor wasm (client side) core 3.1 and .net 5 preview 8, chrome 85ish and edge 44. Always different position, sometimes the Lev array othertimes other properties.

@layomia
Copy link
Contributor

layomia commented Sep 9, 2020

I was able to repro this with in a blazor wasm app as described by @erikthysell and also in a test CI run (#42004) where the scenario fails for blazor wasm and Mono interpreter builds. The issue doesn't repro in a plain console app (CoreCLR JIT).

Trying to root-cause this now. The JSON payload, while very large, is fine and shouldn't cause exceptions to be thrown here. Might be an error with the IndexOfQuoteOrAnyControlOrBackSlash method, similar to the issue in #41582 (comment).

fwiw, the deserialization appeared to work as expected when I refactored the repro code slightly to use the JsonSerializer.DeserializeAsync<T> overload instead of Deserialize<T>.

@layomia layomia self-assigned this Sep 9, 2020
@steveharter
Copy link
Member

@layomia in order to help determine interpreter issue vs. IndexOfQuoteOrAnyControlOrBackSlash issue, running the repro on CoreCLR with Vector.IsHardwareAccelerated == false may help. This can be done by modifying the source of IndexOfQuoteOrAnyControlOrBackSlash or by running with environment variable COMPlus_EnableHWIntrinsic=0. If it fails on CoreCLR, then it's an issue in IndexOfQuoteOrAnyControlOrBackSlash.

FWIW for performance, this method was one that was considered for intrinsifying in the interpreter or re-write in C# for mono but that never happened. See #41097, #40705 and #39733.

@erikthysell
Copy link

erikthysell commented Sep 10, 2020

@layomia thanks for the tip about the async version! I can just confirm that it works.

@layomia
Copy link
Contributor

layomia commented Sep 10, 2020

@steveharter thanks! I ran the tests with COMPlus_EnableHWIntrinsic=0 on CoreCLR and verified that there's no issue, IndexOfQuoteOrAnyControlOrBackSlash doesn't seem to be the issue for now. Which reach out offline to discuss further.

@steveharter
Copy link
Member

steveharter commented Sep 15, 2020

Based on additional testing by @layomia by using a simple implementation of IndexOfQuoteOrAnyControlOrBackSlash, this is now considered an issue in the mono\wasm area and not STJ.

Async may be working because it allocates a lot less memory than the non-async version. The async mode starts with a 16K buffer and will re-use that for all JSON from the Stream until there is a property value that is too large to fit in the remaining space left in the buffer, then it will double the buffer, etc. So a 110MB payload will not allocate anywhere near 110MB in async mode unless the 110MB is a single property value...

@BrzVlad BrzVlad self-assigned this Sep 16, 2020
@layomia layomia removed their assignment Sep 16, 2020
@ghost
Copy link

ghost commented Sep 17, 2020

Tagging subscribers to this area: @CoffeeFlux
See info in area-owners.md if you want to be subscribed.

@SamMonoRT
Copy link
Member

@lewing @marek-safar - possibly RC candidate

@lewing
Copy link
Member

lewing commented Sep 18, 2020

@SamMonoRT agreed

@BrzVlad
Copy link
Member

BrzVlad commented Sep 21, 2020

This doesn't look related to Json or the runtime, therefore not a release blocker. It seems to be an issue on the JS interop layer in a third party library. Submitted bug Tewr/BlazorFileReader#161.

@CoffeeFlux
Copy link
Contributor

I believe this was fixed by #42486, which should be backported soon. Can you validate that fix if you have the repro handy?

@BrzVlad
Copy link
Member

BrzVlad commented Sep 21, 2020

@CoffeeFlux it was not fixed

@CoffeeFlux
Copy link
Contributor

Gotcha. If so, we probably should close the issue or move it off 5.0.

@lewing
Copy link
Member

lewing commented Sep 23, 2020

This was an issue in the library not the runtime and discussion has moved to Tewr/BlazorFileReader#161. Closing.

@lewing lewing closed this as completed Sep 23, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-VM-meta-mono bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

No branches or pull requests