Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package loading time regression #128

Closed
daviehh opened this issue Jan 22, 2023 · 6 comments · Fixed by #132
Closed

package loading time regression #128

daviehh opened this issue Jan 22, 2023 · 6 comments · Fixed by #132

Comments

@daviehh
Copy link
Contributor

daviehh commented Jan 22, 2023

Looks like it's due to #121 and #127

With version 1.1.1

@time_imports using NonlinearSolve

gives

142.4 ms NonlinearSolve

while with the 1.2.0 version, this shot up to ~13 seconds,

13191.1 ms NonlinearSolve

Tested w/

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores

By reverting the changes in #127, the long using NonlinearSolve time is fixed. So maybe it would be better to revert #127 until it can be figured out where it went wrong?

Thanks!

@ChrisRackauckas
Copy link
Member

Can you check v1.9-beta3 and test a full TTFX example? using time needs to be put into context as increasing using time can be associated with decreased first call times.

@daviehh
Copy link
Contributor Author

daviehh commented Jan 22, 2023

With 1.9-beta3, the TTFX does not appear to be affected much by including or not including the TrustRegion() in, but loading the new 1.9 cached pkgimage also takes a long time with TrustRegion() in the precompile.

@time begin
    prob = NonlinearProblem{false}((u, p) -> u .* u .- p, 0.1, 2)
    solve(prob, TrustRegion(), abstol = 1e-2)
end

With the precompile_algs = (NewtonRaphson(), TrustRegion()) in the precompile script, the problem solving example above gives

0.177028 seconds (778.01 k allocations: 55.147 MiB, 5.30% gc time)

and the total using time (after precompile with a fresh start of julia) is

14.901253 seconds (71.82 M allocations: 14.932 GiB, 8.68% gc time, 0.04% compilation time)


Without the TrustRegion() algo in precompile: precompile_algs = (NewtonRaphson(),), the example takes

0.174154 seconds (791.75 k allocations: 56.127 MiB, 5.85% gc time)

and the total using time

2.448613 seconds (8.03 M allocations: 578.580 MiB, 6.01% gc time, 0.18% compilation time)

(The one-time precompile time is actually fine, it's the using time every time julia loads NonlinearSolve that is suffering.)


One side effect is that it also impacts packages that depends on NonlinearSolve such as OrdinaryDiffEq.jl, the package loading time is now

julia> @time using OrdinaryDiffEq
 15.894831 seconds (74.76 M allocations: 15.169 GiB, 8.63% gc time, 0.03% compilation time)

compared w/ ] pin [email protected]

julia> @time using OrdinaryDiffEq
  3.735147 seconds (10.99 M allocations: 822.078 MiB, 6.35% gc time, 0.13% compilation time)

Only glanced through the OrdinaryDiffEq.jl a long time ago, but iiuc for simple stepping/non-stiff odes the NonlinearSolve is not actually used there?

@ChrisRackauckas
Copy link
Member

Hmm I wonder why that one in particular seems to have a huge effect. That's pointing to some kind of type instability we should probably handle.

Only glanced through the OrdinaryDiffEq.jl a long time ago, but iiuc for simple stepping/non-stiff odes the NonlinearSolve is not actually used there?

DAE initialization

@Deltadahl
Copy link
Contributor

I can try to find the type instability.
Maybe I don't get enough time today to find it, but in that case, I'll do my best tomorrow :)

@daviehh
Copy link
Contributor Author

daviehh commented Jan 22, 2023

Thanks! Not sure if it will help with the debugging, but looks like the using time is fine if one only precompiles for the Float64 u, but bad if Float32 is included

SnoopPrecompile.@precompile_all_calls begin for T in (Float32, Float64)

@daviehh
Copy link
Contributor Author

daviehh commented Jan 23, 2023

@CCsimon123 I had some time to do a bit digging, with the help of JET.jl, the type instability with Float32 types can be tracked with

T = Float32
alg = TrustRegion()
prob = NonlinearProblem{false}((u, p) -> u .* u .- p, T(0.1), T(2))
@report_opt target_modules=(NonlinearSolve,) skip_noncompileable_calls=false solve(prob, alg, abstol = T(1e-2))

Screenshot 2023-01-22 at 8 59 41 PM

You can see that with T=Float64 JET reports no errors, but with T=Float32 it points out a runtime dispatch at

return TrustRegionCache{iip}(f, alg, u, fu, p, uf, linsolve, J, jac_config,

of the function

function SciMLBase.__init(prob::NonlinearProblem{uType, iip}, alg::TrustRegion,

Looks like with Float32 or Vector{Float32} uType, the initial_trust_radius or max_trust_radius can be tyep-unstable.

max_trust_radius = alg.max_trust_radius
initial_trust_radius = alg.initial_trust_radius
if max_trust_radius == 0.0
max_trust_radius = max(norm(fu), maximum(u) - minimum(u))
end
if initial_trust_radius == 0.0
initial_trust_radius = max_trust_radius / 11

Not sure how best to tackle this though...

avik-pal pushed a commit that referenced this issue Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants