Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7 on 64-bit Windows four times slower than 0.6.2 #25329

Closed
PetrKryslUCSD opened this issue Dec 29, 2017 · 16 comments
Closed

0.7 on 64-bit Windows four times slower than 0.6.2 #25329

PetrKryslUCSD opened this issue Dec 29, 2017 · 16 comments
Labels
compiler:latency Compiler latency

Comments

@PetrKryslUCSD
Copy link

PetrKryslUCSD commented Dec 29, 2017

When I run tests for FinEtools I find that with the current 0.7 the tests take four times as long as with 0.6.2. I haven't tried to track down a particular source of the slowdown. First I wanted to find out if there was a known reason for this, something that perhaps was planned to be fixed soon.

@KristofferC
Copy link
Member

Spending some work trying to pinpoint this more than "things got slower" would be very helpful.

@StefanKarpinski
Copy link
Member

It's probably compilation time.

@PetrKryslUCSD
Copy link
Author

Whatever it is the hit is painful. There are around 400 tests, organized into groups. Each group is timed separately. The slowdown varies, for one group of tests the slowdown of 0.7 is TWENTY TIMES compared to 0.6.2.

julia> t0 = time(); Pkg.test("FinEtools"); time()-t0
[ Info: Testing FinEtools @ Base.Pkg.Entry entry.jl:723
WARNING: Method definition replace(AbstractString, Any, Any) in module Base at deprecated.jl:55 overwritten in module MeshImportModule at C:\Users\PetrKrysl\.julia\v0.7\FinEtools\src\MeshImportModule.jl:15.
Test Summary: | Pass  Total
Miscellaneous |  100    100
timing = 28.41700005531311
Test Summary: | Pass  Total
Acoustics     |   20     20
timing = 262.54500007629395
Test Summary:  | Pass  Total
Heat diffusion |   25     25
timing = 338.8789999485016
Test Summary:      | Pass  Total
Linear deformation |  159    159
timing = 519.7249999046326
Test Summary: | Pass  Total
Meshing       |   97     97
timing = 891.2630000114441
Test Summary: | Pass  Total
Voxel box     |   27     27
timing = 2.931999921798706
[ Info: FinEtools tests passed @ Base.Pkg.Entry entry.jl:737
2045.1859998703003

julia> versioninfo()
Julia Version 0.7.0-DEV.3207
Commit 4d166fa76d* (2017-12-28 21:38 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:

versus

julia> t0 = time(); Pkg.test("FinEtools"); time()-t0
INFO: Testing FinEtools
Test Summary: | Pass  Total
Miscellaneous |  100    100
timing = 21.27300000190735
Test Summary: | Pass  Total
Acoustics     |   20     20
timing = 30.319000005722046
Test Summary:  | Pass  Total
Heat diffusion |   25     25
timing = 14.526000022888184
Test Summary:      | Pass  Total
Linear deformation |  159    159
timing = 74.09899997711182
Test Summary: | Pass  Total
Meshing       |   97     97
timing = 156.40899991989136
Test Summary: | Pass  Total
Voxel box     |   27     27
timing = 2.9030001163482666
INFO: FinEtools tests passed
300.05400013923645

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17* (2017-12-13 18:08 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

Is there any value in pursuing this further? What is the plan concerning compilation, if that is what it is?

Thanks.

@KristofferC
Copy link
Member

  • Is it one function in particular or is it uniform slowdown?
  • Is it only first call slowness or is it slower for repeated calls?
  • If you find a function that is much slower, can you profile it and see if there is something specific that is slower?

@PetrKryslUCSD
Copy link
Author

Interesting: it appears to be somehow related to the operations on a sparse matrix. In particular, memory allocation went through the roof.

Closer investigation then revealed that the sparse matrix was in fact not sparse, it was a two-dimensional array in 0.7.. So instead of SparseMatrixCSC{Float64,Int64} I got Array{Float64,2}. It is not clear to me why this should happen.

@KristofferC
Copy link
Member

Alright, that makes sense, I think compilation regressions like this would have been spotted earlier. So I guess the interesting thing now is to find the place that used to give a sparse matrix.

@PetrKryslUCSD
Copy link
Author

It was S = S+S': this resulted in a dense matrix. Changing this to S = S+transpose(S); preserved the sparse character.

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Dec 30, 2017

Apparently this is not the whole story yet.

A. Every single test set is now uniformly around 50% slower than for 0.6.2.
B. One particular test set is still around six times slower for 0.7 than for 0.6.2. I will investigate some more.

@PetrKryslUCSD
Copy link
Author

The culprit of the remaining major slowdown is intersect().

@PetrKryslUCSD
Copy link
Author

The operations that caused MAJOR slowdowns have been reported separately. What remains is the universal increase of runtime (compilation?) of around 50%.

@StefanKarpinski
Copy link
Member

Should this stay open to track the overall slowdown?

@PetrKryslUCSD PetrKryslUCSD reopened this Dec 30, 2017
@ihnorton
Copy link
Member

xref #25254

@JeffBezanson
Copy link
Member

I believe #25254 is about load time regressions from 0.5.x to 0.6.x, which for now persist in 0.7 but should be no worse.

@ViralBShah ViralBShah added the compiler:latency Compiler latency label May 25, 2018
@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Aug 25, 2018

Is FinEtools updated for v1.0 to see if this is still an issue? I think a lot of the early compile time regressions have been addressed so I'd be interested to see if it still exists for this package.

@PetrKryslUCSD
Copy link
Author

Unfortunately at this point I don't have a version of the package that would run both with 0.6 and 1.0. However most of the tests are one-to-one, even when the number of tests changed. Overall I would say 1.0 runs the tests as fast as 0.6, with some exceptions: on one test set 1.0 is considerably faster, and on one test set 1.0 is about 20% slower. I might try to find out what precisely is slower.

@KristofferC
Copy link
Member

Doesn't seem like there is anything concrete to do here anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

No branches or pull requests

7 participants