-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dramatic performance degradation in Julia 0.5 #290
Comments
Solving linear systems will use multiple cores unless you call. |
As mentioned above, external calls from Julia to another library will not honor '-p'. Are you absolutely sure it is the same version of Optim? |
Yes, I use the same fork with both Julia 0.4 and 0.5. The performance degradation situation actually reproduces over a range of Optim commits from the last 2 months. Trying to look into BLAS threads now... |
julia> versioninfo() |
Tried with |
Correction: Julia 0.5 also writes
Tried removing all Perhaps there is more to it than just forcing Optim to use a single worker core. |
An interesting inverse scaling:
|
Can you post some sample code that I can test with? |
@amitmurthy Here is a simple mockup of what I am running: Note that in this particular case the list that Julia v0.4 runs (second @time call): ~/julia04 pmap_optim_time.jl ~/julia04 -p 8 pmap_optim_time.jl Julia v0.5 runs (second @time call): julia pmap_optim_time.jl julia -p 8 pmap_optim_time.jl Note also that simd-warnings are generated for the second function call as well, as opposed to v0.4 behaviour. Version specifics: ~/julia04 -e "versioninfo()" julia -e "versioninfo()" |
The numbers above are the first-time compilation overhead. I just added a few more runs of Ntimes=3
Ntimes=8
In 0.5 all the workers are being used in a round-robin fashion. With Ntimes=3, the first runs primes workers 1-3, the second 4-6 and the third 7-8. Only from the fourth run onwards you see the optimized timings. With Ntimes=8, the first run itself causes all the workers to be warmed up. In 0.4 only the first 3 workers are being used everytime and hence you see fast run times from the second run itself.
How much longer? There is a slight slowdown due to changes in closure serialization. A http://docs.julialang.org/en/latest/stdlib/parallel/#Base.CachingPool may help if that is the cause. Also see JuliaLang/julia#16508 |
This explains a lot of my confusion, thanks @amitmurthy After extensive testing taking this into account, Julia 0.5 slowdown (compared to 0.4) is about 5 to 10% extra time. Interestingly, the memory allocation is half of what it was in 0.4. Win on memory, loose on closures I guess... I guess I'll stay with 0.4 in production to avoid additional computing costs, for as long as package dependencies work with 0.4. Closing the issue since apparently nothing was found specific to |
Hm, you say closures... Is this related to JuliaLang/julia#15276 as well? Maybe when that is fixed, performance will make v0.5 interesting for you again. |
Could be, frankly I had not had a chance to dig into closure issues for 0.5 in detail. Will definitely time with each subsequent 0.5.* update. Looking into another performance degradation possibly related to |
Moving this to a new issue |
Use case:
pmap
with a function that uses Optim within its body.Problem:
Julia 0.4
@time
results for pmap-call:Julia 0.5:
Thus, the multicore pmap call is 18 times slower on Julia 0.5.
It appears that the problem is related to
Optim.jl
; other parts of the function body applied viapmap
do not produce this slowdown. Starting Julia with even a single core (julia
, notjulia -p 8
) and tracking CPU load I noticed that "single core" run still loads multiple CPU cores, intermittently. Not sure if this is some type of multi-threading in Julia 0.5 or something else that was disabled in 0.4.Tried
fminbox
withcg
, tried also unconstrained Nelder-Mead. Same problem. Tried removing all@simd
in Optim sources without much change. Note that Julia printsWARNING: could not attach metadata for @simd loop.
#157 (comment)
Any suggestions on how to force Optim to use a single worker core? Or did I misunderstand what is happening?
The text was updated successfully, but these errors were encountered: