-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1400% performance regression for custom project on julia 0.5 #16047
Comments
Just a guess, If you are using closures or comprehensions a lot, you might be hitting #15276 |
That might be it, there are certainly some closures that are affected by this. I'll rerun my test here once #15276 is fixed. |
This is 0.5.x not 0.5.0 – it's not a release blocker, although it is important. |
Yes, that's right. However, it would be nice not to release 0.5 with known perf regressions, especially if many will have a slower experience. |
I just reran with the latest master
I had hoped that #16057 and #16060 might have improved things, but no luck so far, if anything the julia 0.5 version got a bit slower than the previous build I had tried:
I think @JeffBezanson wanted to try one more thing on the closure side of things to address #16050, although I see that also on julia 0.4, so not sure that is going to help narrow the difference between 0.4 and 0.5. But this could of course also be related to something entirely unrelated to closures... Maybe the best thing is to wait until @JeffBezanson has done all the things he wants to do about closures, and then I'll rerun this and see how things look? |
If you have the time --- does a profile reveal anything? Comparing profiles in the two versions might help narrow it down. |
I've profiled the code under julia 0.4 and 0.5, the profiles are here. I'm not really sure what I'm looking for, though, or how to go about this... One thing that I did notice is that the 0.4 version shows simdloop.jl a couple of times, whereas the 0.5 never shows that. Not sure that indicates anything, though... |
Can you try 0.5 with -O3? |
I just tried with the latest nightly, and with that performance is completely gone, the whole algorithm now takes something like 2 hours. This run was without the -O3 flag. The version I tried today was:
So some commit between 0.5.0-dev+3782 and 0.5.0-dev+3839 seems to have caused a really massive perf regression. |
Could be b0c54f0, just a guess. |
Just for completenes, the exact timing I get is:
So a little over two hours with the latest master. gc time seems also up relative to the 0.5 build from last week. Let me know if there is anything else I can provide in terms of diagnostics at this point to help. |
Can you give me access to the repo? Running |
@vtjnash I gave you access to the two repos that make up this problem. The readme here has the instructions on how to get this running. Having said that, right now the whole thing doesn't even run on 0.5, some recent change either in 0.5 or in the Requires package prevents the code from running on 0.5. I'm unable to figure out what is wrong. The whole thing does work on 0.4... As an aside, the Requires package is pretty remarkable, it seems to have a lot of users (i.e. I'm not even using it directly, but some of the packages that I depend on apparently do), and at the same time the package has not a single test... |
What that package does is also not really compatible with precompilation. Most users of Requires should consider refactoring to avoid it. Optional dependencies at the moment are better handled by moving functionality into separate packages that depend on both. |
just a good guess :) I think the problem child is ForwardDiff.jl. Its seems to have a number of questionable functions that rely heavily on code-generation (such as generated functions over 63 @inline function promote_eltype{F<:ForwardDiffNumber}(::Type{F}, types::DataType...)
64 return switch_eltype(F, promote_type(eltype(F), types...))
65 end |
I think ForwardDiff.jl actually has performance benchmarks, but I wasn't able to run them, I seem to be missing a package. @jrevels Any chance you could run the ForwardDiff.jl benchmarks on julia 0.4 and the latest nightly and see whether there is a performance regression? |
Just a heads up that I don't think ForwardDiff.jl is the culprit. I modified my code to use an old version of ForwardDiff (0.0.3) that doesn't have any of the fancy optimizations of the current version. That old ForwardDiff.jl just uses DualNumbers.jl internally, and all of that seems entirely straightforward, I don't see any of the things that @vtjnash mentioned as problems there. The whole thing is still running, but so far it looks likely that everything is about a factor 2 slower on julia 0.5. I'll report back once the runs are done. So, @jrevels probably no need to run those benchmarks. |
Ok, here are the final numbers: julia 0.4: 9478.776417 seconds (4.41 G allocations: 14.830 TB, 3.82% gc time) So, overall julia 0.5 takes about 75% longer to finish this. The julia 0.5 build is 0.5.0-dev+3993 (bf73102). These numbers use ForwardDiff.jl v0.0.3. Overall things are much slower with that old version of ForwardDiff.jl, but the relative difference between julia 0.4 and 0.5 is large, and I believe ForwardDiff.jl v0.0.3 is not using any of the problematic things that @vtjnash mentioned. Just running this is tricky now: I had to patch ForwardDiff.jl v0.0.3 slightly to run on jl 0.5, I also had to patch Requires.jl to run on 0.5... So to run my example one should use this branch of ForwardDiff, and on julia 0.5 this branch of Requires.jl. But, long story short: ForwarDiff.jl is not the culprit, and there is an even more severe performance regression in julia 0.5 than the title of this issue indicates. No clue what the root cause might be. |
Unless the performance benchmarks were written very carefully, I'm not going to put any stock in them. It's way too easy to craft benchmarks that harm application performance. |
@davidanthoff How easy is it to run your code if someone were to dig in? |
@vtjnash This is not just a benchmark - as I understand. It is a user code that was fast in 0.4 and has slowed down. |
Are you basically saying that it doesn't matter if the code written in the wild gets slower with increasing version numbers, as long as microbenchmarks are good? Or are you talking about the |
@pkofod I think he precisely meant the contrary: that benchmarks can often be misleading, and that actual useful code is the best way to measure performance. |
Ah, okay. I misunderstood then. My bad. |
Thanks @davidanthoff for the reduced case. I think I found at least another chunk of the problem. |
This fixes a bug plus optimizes more variables. Should help #16047.
@JeffBezanson I tried the latest master with #16386. It improves my repo case a lot! I'm now starting to add back more of the complexity of my "real" problem to the repo case. I updated the gist with the repo here with a new version that is now again much slower on 0.5 than 0.4. What I added is one I'm not sure this is the most efficient way to go about this. I could also give you access to the full problem, but it is involved and messy and I probably have an easier time isolating these relatively straight forward repo cases than someone who isn't familiar with the full problem... So unless you have a better idea, I'll just continue with this setup. |
@davidanthoff – thanks for having the patience to go through this process. It's a huge benefit to everyone that we have these use cases to figure out where the performance regressions are. |
Any news on this? Just a warning, the current example that shows a slowdown is still not up to the complexity level of the whole problem, so we might need a couple more iterations until this is fully resolved. |
Bump, what is the plan for this? This is a really bad performance regression, and I am worried that it will take a whole bunch of further back and forth until this is completely resolved. This clearly doesn't have to be resolved before a feature freeze, but I do think it should be resolved before a 0.5 release. |
Yes, I'm still ruminating on how to address this for 0.5. We do at least have a workaround now, which is to declare types for variables used by inner functions (e.g. |
I guess I could try to fix all the type instabilities by declaring types that are used in inner functions, as you suggest, and then rerun my performance comparison. That would at least give us a sense whether there are some other perf problems lurking somewhere, or whether these type instabilities we have identified so far are the whole story. Not sure when I'll have time for this, but it is on my list. |
I reran this yesterday. Here are my new numbers:
For a while we thought #15276 (or #16048) might be the culprit, but I don't think that is the case. I changed the repo case so that it avoids the #15276 issue, and then I get these numbers:
So while #15276 doesn't help performance, it doesn't explain the huge regression from 0.4 to 0.5. I have a much simpler repo story now, but after a
|
If you run with --track-allocation, do you see where all those extra allocations come from? |
I'm seeing something similar. I'm running a simple Computational Physics problem (the Ising Model) as a benchmark, found here: https://github.com/aristotle2497/SimpleJuliaTests/blob/master/Ising.jl The code is simple function declarations and in place array manipulations, no packages are imported. I'm using the Ubuntu PPAs to get releases and nightly builds. The code run times: A comparision of the mem files produced by --track-allocation=user shows that the extra memory is being generate at the end statement of the outer for loop in the step_time function? Thanks |
Note that the PPA is more than a thousand commits behind master. |
And on current master ( |
@aristotle2497 If you still see some problems, probably better to open a new issue for your case. |
But please test a recent version before opening a new issue. |
Wait, ppa:staticfloat/julianightlies is a thousand commits behind master? Someone should edit this page http://julialang.org/downloads/platform.html, which currently indicates that the ppa is a build of master from last night. Anyway, I'll pull master, recompile, and let you know if I still see any problems. Thanks. I've been following Julia for a couple of years, but I'm just starting to use it seriously. |
We've mentioned at many places that those should be removed and I thought we already did but apparently not..... |
yes, we need to edit that page, the ppa is not recommended any more because it's not actively maintained. there are generic linux binaries that are up to date. |
We are also seeing roughly 30% slowdown in our real-world julia app. Any tips for debugging this? |
Hi Jack. I would look at
|
I tracked down the primary cause of the slowdown @jminardi mentioned. Here is an abbreviated example: @time begin
scale = Any[1,2,3]
data = [Vector{Float64}(rand(3)) for i in 1:47000]
Vector{Float64}[Vector{Float64}(v .* scale) for v in data]
end
I get the output
Oddly the slowdown is not linear - changing the size of
Changing
While the fix is simple I am surprised to see a non-linear performance impact. |
Thanks @maxvoxel8, great to have this reduced example. Glad there is a workaround and that 0.5 is faster in some cases. |
I suspect this is ok to close and that if these issues are still relevant in the 0.7 release cycle, we can have new issues. |
UPDATE 7/15/16: These numbers are outdated. Current results and info is here.
I brought this up on julia-dev: I'm seeing a 30% increase in run time for one internal project on julia 0.5 relative to the latest julia 0.4, all on Windows:
The repo code is here https://github.com/davidanthoff/4sD.jl. The repo is private, but I'd be happy to give access to any core developer who wants to diagnose this.
This is a full end-to-end code, i.e. I haven't isolated the root case. Unfortunately I don't have the time (and probably not the expertise) to narrow it down. I know these sorts of reports are not ideal, but I guess still better than ignoring this regression...
Just pinging @tkelman: this is the bellman equation solving code we developed last summer with your help. Just for your info, no expectation that you do anything about this!
The text was updated successfully, but these errors were encountered: