Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown in JSON3 from 1.8 to 1.9 #48229

Open
KristofferC opened this issue Jan 11, 2023 · 12 comments
Open

Slowdown in JSON3 from 1.8 to 1.9 #48229

KristofferC opened this issue Jan 11, 2023 · 12 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@KristofferC
Copy link
Member

This is a write up from the issue reported in https://twitter.com/mberesewicz/status/1612721251209908224.

To repro add version 1.12.0 of JSON3::

Repro script:

using JSON3
import Downloads
import ZipFile

kprm_file = "https://github.com/SKNEstymator/kprm-job-offers/raw/main/data-raw/kprm-with-salary-update-20230102.json.zip"
kprm_file_zip = basename(kprm_file)
kprm_file_json = replace(kprm_file_zip, ".zip" => "")
if !isfile(kprm_file_zip)
    Downloads.download(kprm_file, kprm_file_zip)
end

if !isfile(kprm_file_json)
    zarchive = ZipFile.Reader(kprm_file_zip)
    out =  open(kprm_file_json,"w")
    write(out,read(zarchive.files[1], String))
    close(out)
end

col_names = ["id", "poz", "prac", "miej", "adr_urz", "adr_miej", "adr_urz_str", "adr_gdzie", 
             "data_um", "data_kon", "data_zgl", "wynik", "wyn", "wakaty", "etat", "status", 
             "zakres", "wykszt", "wymagania", "wymagania_dod", "kontakt", "liczba_odw"]


function json3_readlines(kprm_file_json)
    for line in eachline(kprm_file_json)
        JSON3.read(line)
    end
end

@time json3_readlines(kprm_file_json)

On 1.9 this takes about:

julia> @time json3_readlines(kprm_file_json)
  2.537686 seconds (6.47 M allocations: 1019.936 MiB, 2.50% gc time, 0.46% compilation time)

while on 1.8 it takes:

julia> @time json3_readlines(kprm_file_json)
  0.599276 seconds (1.07 M allocations: 665.072 MiB, 21.89% gc time)

Looking at the profile, we have on 1.9 a bunch of subtyping that comes in the call to read:

     ╎ 1262 @JSON3/src/read.jl:64; read(str::String; jsonlines::Bool, numbertype::Nothing, kw::Base.Pairs{Symbol, Union{}, Tup...
  2╎      ╎  1231 @JSON3/src/JSON3.jl:22; Array
     ╎    ╎   1217 /Users/kristoffercarlsson/julia1.9/src/gf.c:2828; ijl_apply_generic
  2╎      ╎    1217 /Users/kristoffercarlsson/julia1.9/src/gf.c:0; _jl_invoke
  1╎      ╎     1085 /Users/kristoffercarlsson/julia1.9/src/builtins.c:1599; jl_f__compute_sparams
     ╎    ╎    ╎ 1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1682; jl_inst_arg_tuple_type
     ╎    ╎    ╎  1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1907; jl_wrap_Type
     ╎    ╎    ╎   1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1079; ijl_instantiate_unionall
     ╎    ╎    ╎    1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1863; inst_type_w_
     ╎    ╎    ╎     106  /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1464; inst_datatype_inner
     ╎    ╎    ╎    ╎ 54   /Users/kristoffercarlsson/julia1.9/src/subtype.c:1986; ijl_types_equal
     ╎    ╎    ╎    ╎  54   /Users/kristoffercarlsson/julia1.9/src/subtype.c:1432; forall_exists_subtype
     ╎    ╎    ╎    ╎   54   /Users/kristoffercarlsson/julia1.9/src/subtype.c:1396; exists_subtype

which seems to come from the constructor in https://github.com/quinnj/JSON3.jl/blob/faa521e37317826a38bd211898b512be592be5c7/src/JSON3.jl#L22. This is not present in 1.8:

     ╎ 263 @JSON3/src/read.jl:30; read
     ╎  207 @JSON3/src/read.jl:56; read(str::String; jsonlines::Bool, numbertype::Nothing, kw...
     ╎   207 @JSON3/src/read.jl:86; read!
     ╎    207 @JSON3/src/read.jl:90; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, len...
     ╎     207 @JSON3/src/read.jl:311; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,), T...
 1   ╎    ╎ 203 @JSON3/src/read.jl:332; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, l...
     ╎    ╎  202 @JSON3/src/read.jl:86; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,), T...
     ╎    ╎   142 @JSON3/src/read.jl:90; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, ...
     ╎    ╎    142 @JSON3/src/read.jl:311; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,),...

An interesting thing is that the performance issue has been somewhat fixed on the main branch of JSON3 by commit quinnj/JSON3.jl@faa521e:

# With JSON3#main
julia> @time json3_readlines(kprm_file_json)
  0.700113 seconds (1.07 M allocations: 663.462 MiB, 6.81% gc time)

However, the difference this commit made what I can see is actually worse type inference https://github.com/quinnj/JSON3.jl/blob/faa521e37317826a38bd211898b512be592be5c7/src/read.jl#L65 now gets inferred as (looking at optimized IR):

(1.12.0) (slow)

   │    %147 = Core.apply_type(JSON3.Array, %143, Base.CodeUnits{UInt8, String}, Vector{UInt64})::Type{JSON3.Array{_A, Base.CodeUnits{UInt8, String}, Vector{UInt64}}} where _A│         
   │    %148 = %new(%147, %1, %45, %139)::JSON3.Array{_A, Base.CodeUnits{UInt8, String}, Vector{UInt64}} where _A                                                             ││╻          Array
65 │           JSON3.populateinds!(%148)       

main (fast)

   │     %257 = Core.apply_type(JSON3.Array, %256)::Type{JSON3.Array{_A}} where _A                                                                        │          
   └──── %258 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 0, 0))::Vector{Int64}           ││╻          Array
   90 ──        nothing                                                                                                                                   │          
   91 ──        goto #92                                                                                                                                  │╻          getindex
   92 ── %261 = (%257)(%61, %122, %258)::JSON3.Array{T, S, Vector{UInt64}} where {T, S<:AbstractVector{UInt8}}                                            │          
66 │            JSON3.populateinds!(%261) 

So in the fast case we only know that the second type parameter S is <: AbstractVector{UInt8} but in the slow case we know that S == Base.CodeUnits{UInt8, String}.

It seems bad here that more precise type information seems to cause there to be a bunch of subtyping that slows down the code.

@KristofferC KristofferC added performance Must go faster regression Regression in behavior compared to a previous version labels Jan 11, 2023
@quinnj
Copy link
Member

quinnj commented May 30, 2023

I can confirm that running the original issue on JSON3 1.12 is fixed on master:

julia> @time json3_readlines(kprm_file_json)
  0.776450 seconds (3.40 M allocations: 759.053 MiB, 8.05% gc time, 1.24% compilation time)

but not on the current release-1.9 branch; @N5N3 or @vtjnash, maybe we could request one of you to help backport a fix for this? At RelationalAI, we've tracked one of our last regressions for 1.9 down to this issue, so it would be excellent if we could figure out a 1.9 solution for this.

@quinnj
Copy link
Member

quinnj commented May 31, 2023

(I could probably do the bisection process again if helpful; i.e. identify which commit introduced the regression and then which commit fixed, if that would be helpful for trying to figure out a backport solution here)

@N5N3
Copy link
Member

N5N3 commented May 31, 2023

I’ll have time to take a look this weekend. Both bisecting and subtyping MWE would be appreciated.

@quinnj
Copy link
Member

quinnj commented Jun 1, 2023

Here's the smallest repro I could come up with; going to try and do a bisection using it today:

using Pkg
Pkg.add(Pkg.PackageSpec(;name="JSON3", version="1.12.0"))
using JSON3
JSON3.read("[]"); # warmup
t = @elapsed JSON3.read("[]");
if t > 0.0001
    # bad
    exit(1)
else
    exit(0)
end

@quinnj
Copy link
Member

quinnj commented Jun 1, 2023

Bisect identifies 19f44b6 as the first bad commit

@KristofferC
Copy link
Member Author

There's been other regressions bisected to that, #48612, #49145.

@quinnj
Copy link
Member

quinnj commented Jun 1, 2023

Ooof, so this is really probably just a dup of #48612, where the answer is "won't fix" for 1.9?

@quinnj
Copy link
Member

quinnj commented Jun 1, 2023

Hmmmm, I actually identify 6337cf6 as a different fix commit though (in #48612, #48144 was identified as the fix), so maybe not totally the same issue. Maybe @vtjnash can comment on whether 6337cf6 is backportable?

@quinnj
Copy link
Member

quinnj commented Jun 1, 2023

Hmmm, hold up just a bit; it looks like the 633 commit helped, but on latest master, I'm actually still seeing a fairly sizeable regression. Let me dig in a bit more.

@N5N3
Copy link
Member

N5N3 commented Jun 2, 2023

The runtime sparam computation is still there on master. I tried some benchmark locally, which shows that the regression got migrated due to some performance tuning on

jl_datatype_t *tt = jl_inst_arg_tuple_type(args[0], &args[1], nargs, 0);

But the cost of the following subtyping keeps the same.

@quinnj
Copy link
Member

quinnj commented Jun 2, 2023

Thanks for the help investigating @N5N3; I'm a little unclear what your concluding here though. Are you confirming what I mentioned? That this still seems regressed on master, even though it's slightly better than current release-1.9?

Also curious if you have thoughts on what to do now? It sounds like further bisection isn't really valuable; I could probably come up with a smaller reproduction if helpful? Anything else I could do? Or anyone else we could ping to have help here?

@N5N3
Copy link
Member

N5N3 commented Jun 2, 2023

That this still seems regressed on master, even though it's slightly better than current release-1.9?

Yes.

Also curious if you have thoughts on what to do now?

StaticArrays.jl tried to fix this problem by remove unneeded sparam from method definition. I’m not familiar with the optimization pipeline, but since it is caused by inlining, perhaps we can form a cost model for runtime sparam computation? @aviatesk @ianatol

quinnj added a commit to quinnj/JSON3.jl that referenced this issue Aug 2, 2023
Due to several 1.9 performance regressions that heavily impact JSON3,
I decided to just dig in and see if we can fix in on our side since
the JuliaLang changes seem pretty involved. (see issues
[here](JuliaLang/julia#50762) and
[here](JuliaLang/julia#48229) for context).

From what I can tell, the issue really boils down to this one `getvalue`
internal method on `JSON3.Array` where it was having to do some expensive
"sparams" computation to instantiate the `JSON3.Array` type.
By manually unrolling this method a bit and fully specifying all the type
parameters this seems to resolve the performance issues. Benchmarks look
like:

Julia 1.8:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  31.625 μs … 140.959 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     32.250 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.602 μs ±   1.562 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂█▃▂▃
  ▁▂▅██████▆▄▅▂▂▂▂▂▃▂▂▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  31.6 μs         Histogram: frequency by time         36.6 μs <

 Memory estimate: 6.70 KiB, allocs estimate: 84.
```

Julia 1.9:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  165.584 μs …  10.519 ms  ┊ GC (min … max): 0.00% … 94.46%
 Time  (median):     168.166 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   170.886 μs ± 103.652 μs  ┊ GC (mean ± σ):  0.58% ±  0.94%

     ▄█▆▃
  ▁▂▇████▇▇▅▄▄▄▄▃▃▂▂▂▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  166 μs           Histogram: frequency by time          185 μs <

 Memory estimate: 10.11 KiB, allocs estimate: 166.
```

This PR:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  8.431 μs … 497.806 μs  ┊ GC (min … max): 0.00% … 95.49%
 Time  (median):     8.820 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.157 μs ±   9.056 μs  ┊ GC (mean ± σ):  1.92% ±  1.91%

   ▂▄▃▂  ▅██▆▄▂▁▃▅▆▅▄▃▂▂▂▂▂▂▁▁▁                               ▂
  ▅████▇▆██████████████████████▇▇▇▇▇▇▇▇▇▇▆▇▇▅▆▆▅▆▆▆▅▄▅▅▅▄▄▅▅▄ █
  8.43 μs      Histogram: log(frequency) by time      10.8 μs <

 Memory estimate: 6.20 KiB, allocs estimate: 72.
```
quinnj added a commit to quinnj/JSON3.jl that referenced this issue Aug 2, 2023
* Refactor internal getvalue for JSON3.Array to improve performance

Due to several 1.9 performance regressions that heavily impact JSON3,
I decided to just dig in and see if we can fix in on our side since
the JuliaLang changes seem pretty involved. (see issues
[here](JuliaLang/julia#50762) and
[here](JuliaLang/julia#48229) for context).

From what I can tell, the issue really boils down to this one `getvalue`
internal method on `JSON3.Array` where it was having to do some expensive
"sparams" computation to instantiate the `JSON3.Array` type.
By manually unrolling this method a bit and fully specifying all the type
parameters this seems to resolve the performance issues. Benchmarks look
like:

Julia 1.8:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  31.625 μs … 140.959 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     32.250 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.602 μs ±   1.562 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂█▃▂▃
  ▁▂▅██████▆▄▅▂▂▂▂▂▃▂▂▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  31.6 μs         Histogram: frequency by time         36.6 μs <

 Memory estimate: 6.70 KiB, allocs estimate: 84.
```

Julia 1.9:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  165.584 μs …  10.519 ms  ┊ GC (min … max): 0.00% … 94.46%
 Time  (median):     168.166 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   170.886 μs ± 103.652 μs  ┊ GC (mean ± σ):  0.58% ±  0.94%

     ▄█▆▃
  ▁▂▇████▇▇▅▄▄▄▄▃▃▂▂▂▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  166 μs           Histogram: frequency by time          185 μs <

 Memory estimate: 10.11 KiB, allocs estimate: 166.
```

This PR:
```julia
julia> @benchmark JSON3.defaultminimum(x)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  8.431 μs … 497.806 μs  ┊ GC (min … max): 0.00% … 95.49%
 Time  (median):     8.820 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.157 μs ±   9.056 μs  ┊ GC (mean ± σ):  1.92% ±  1.91%

   ▂▄▃▂  ▅██▆▄▂▁▃▅▆▅▄▃▂▂▂▂▂▂▁▁▁                               ▂
  ▅████▇▆██████████████████████▇▇▇▇▇▇▇▇▇▇▆▇▇▅▆▆▅▆▆▆▅▄▅▅▅▄▄▅▅▄ █
  8.43 μs      Histogram: log(frequency) by time      10.8 μs <

 Memory estimate: 6.20 KiB, allocs estimate: 72.
```

* Remove print-related test that breaks post 1.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

3 participants