Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: mark REPL and Doc code as non-specializeable #28065

Merged
merged 1 commit into from
Jul 20, 2018

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jul 11, 2018

Here, we introduce a new meaning to the @nospecialize macro and allow it to be used at global scope and then apply to an entire module (and mostly any type signature).

When used without arguments, it applies to all arguments of the parent.
In local scope, this means the containing function.
In global scope, this means all methods subsequently defined.

When used in a function body, the macro must occur in statement position and
before any code.

When used without arguments, it applies to all arguments of the parent scope.
In local scope, this means all arguments of the containing function.
In global (top-level) scope, this means all methods subsequently defined in the current module.
Copy link
Member

@stevengj stevengj Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more natural to me to use a syntax like

@nospecialize foo(...) = ... # applies to all arguments
@nospecialize begin ... end # applies to all methods defined in this block
@nospecialize module Foo ... end # applies to all methods in module Foo

rather than acting like an imperative side-effect, and to have a @nospecialize false begin ... end antonym.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, all of those can be added later to the macro. They'd just be syntax transforms to one of these forms, since this form – as an imperative side-effect – represents where that information needs to be present in the processing order (since method declaration is itself also imperative).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, @nospecialize false is double negation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem clearer for the low-level control to be @specialize! true|false defaulting to true to avoid the double negative. Then @nospecialize block forms like @stevengj outlined can be built on top of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the module-level form @specialize is better, but I'd keep @nospecialize as well since it's much more convenient when annotating a single argument. Then we could potentially use those instead of the true/false argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

@nospecialize [begin]
define() = functions
@nospecialize [end]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t tell if you’re trolling

@KristofferC
Copy link
Member

KristofferC commented Jul 13, 2018

Any numbers to show the impact of this change?

@@ -26,7 +28,7 @@ end
print_ssa(io::IO, @nospecialize(val), argnames) = Base.show(io, val)


function print_node(io::IO, idx::Int, @nospecialize(stmt), used, argnames, maxsize; color = true, print_typ=true)
function print_node(io::IO, idx::Int, @nospecialize(stmt), used, argnames, maxsize; color::Bool=true, print_typ::Bool=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the @nospecialize be removed here since it is in a @nospecialize true block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I like the reminders that this code will run very slowly if you aren't careful

@vtjnash vtjnash force-pushed the jn/nospecialize-module branch 3 times, most recently from b3538a5 to 483a668 Compare July 18, 2018 19:11
Implicit return is bad for compiler performance (and sometimes runtime
performance) and can adversely affect code readability,
so every function which does _not_ return a value should end in a `return` statement.

Here, we also introduce a new meaning to the `@nospecialize` macro, and
a new macro `@specialize` to reverse its effect.
When used without arguments, it applies to all arguments of the parent.
In local scope, this means the containing function.
In global scope, this means all methods subsequently defined.
@vtjnash vtjnash force-pushed the jn/nospecialize-module branch from 483a668 to 03dfc52 Compare July 19, 2018 16:30
@vtjnash vtjnash merged commit 9ed6287 into master Jul 20, 2018
@vtjnash vtjnash deleted the jn/nospecialize-module branch July 20, 2018 04:01
@JeffBezanson
Copy link
Member

So far I don't see any improvement from this in the sysimg build or using CSV or using PyPlot. It seems to make corecompiler.ji take ~10 seconds longer to build. Delays for actions in the REPL seem to improve a bit but that's harder to measure.

@@ -466,7 +468,7 @@ add_tfunc(<:, 2, 2,
return Bool
end, 0)

function const_datatype_getfield_tfunc(sv, fld)
function const_datatype_getfield_tfunc(@nospecialize(sv), @nospecialize(fld))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fld should be an Int.

@JeffBezanson
Copy link
Member

Ok, the 10 seconds is not reliable. Maybe closer to 5 seconds, and seems to possibly be caused by the fld argument to const_datatype_getfield_tfunc.

@JeffBezanson
Copy link
Member

This increased the time to get to the prompt (timed by modifying the repl to exit as soon as it gets to the prompt) from 0.7 seconds to 1.18 seconds. It's probably preventing us from recursively finding more code during precompilation. I assume it will be fixed by #28118.

@KristofferC
Copy link
Member

Could you share the timing code you use and I can compare.

@JeffBezanson
Copy link
Member

I added exit() to the beginning of write_prompt(terminal, p::Prompt) in LineEdit.jl, and ran time ./julia -q.

@KristofferC
Copy link
Member

KristofferC commented Jul 25, 2018

Master: 1.05s, #28118: 0.18s.

@StefanKarpinski
Copy link
Member

Why was this merged? There was a request for some performance numbers which was completely ignored. Jeff has timed it and it turns out it had a significant detrimental impact on startup time.

@vtjnash
Copy link
Member Author

vtjnash commented Jul 26, 2018

I think KristofferC just posted that this’ll bring startup time down to 0.25s. This causes too many transient effects (precompile changes) to give a useful demo. But it’s just another API to trigger an existing optimization, so timing isn’t important. Just don’t use it if it doesn’t help you.

@StefanKarpinski
Copy link
Member

This increased the time to get to the prompt (timed by modifying the repl to exit as soon as it gets to the prompt) from 0.7 seconds to 1.18 seconds.

?? 0.25s is not mentioned anywhere that I can see.

@KristofferC
Copy link
Member

But it’s just another API to trigger an existing optimization, so timing isn’t important. Just don’t use it if it doesn’t help you.

I don't understand this comment. The discussions here are not about the new API itself but rather where this was used and the effects on timing this had. If the REPL is slower to start now than before, the what was the point?

@martinholters
Copy link
Member

Assuming we get #28118 into 0.7 (fingers crossed), the relevant question seems to be what the timing of #28118 vs. #28118 with this reverted is.

@vtjnash
Copy link
Member Author

vtjnash commented Jul 27, 2018

@KristofferC The point is to make compilation more effective, since this guarantees that we can precompile the entirety of this module. We can always regenerate the precompile statements for whatever specific cases are not running fast enough, and alter the precompilation heuristics to make more effective use of this information later for specific use cases, so performance numbers of specific examples aren't that meaningful.

@StefanKarpinski
Copy link
Member

Much of the confusion on this issue could have been headed off by simply saying that.

@JeffBezanson
Copy link
Member

Yes, on the one hand it's true that the purpose of this is to introduce a new tool that can be used to address latency, but it's important to have an example use case that demonstrates the utility.

As for compilation itself, I think what's happening here is that the REPL code calls other code (e.g. in Base) that does need to be specialized, but since the REPL code isn't specialized we don't find those signatures until run time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants