Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow chunksize choice in autodiff setup? #84

Closed
ChrisRackauckas opened this issue Jan 23, 2017 · 13 comments
Closed

Allow chunksize choice in autodiff setup? #84

ChrisRackauckas opened this issue Jan 23, 2017 · 13 comments

Comments

@ChrisRackauckas
Copy link
Contributor

I was wondering if you think allowing the chunksize choice in the autodiff setup is a good idea. I have a working implementation that I am using since it's currently not allowed:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/misc_utils.jl#L33

But the issue is, if I was to make a PR, I can't think of a good way to pass it in. You'd want to make it a kwarg like autodiff, but then it wouldn't be type stable since you would need to pass it by value-type for the type-stability. I handle this with a "pre-stage" where I can use a @pure constructor, but I'm not sure that applies here.

@sglyon
Copy link
Collaborator

sglyon commented Jan 24, 2017

I'm not up to speed on the latest best practices on the use of the autodiff packages, but I think I recall seeing that chunksize is or will soon be deprecated. Is that possible, or am I remembering wrong?

cc @jrevels

@jrevels
Copy link

jrevels commented Jan 24, 2017

I think I recall seeing that chunksize is or will soon be deprecated.

Yeah, that keyword argument is deprecated - chunk size is now specified by a type parameter of the *Config types. You can ask the user to pass you a *Config type, so that you have the preallocated work buffers and predetermined chunk size. (I'm sure @ChrisRackauckas knows this already, though, since I've seen him use the new API before.)

You'd want to make it a kwarg like autodiff, but then it wouldn't be type stable since you would need to pass it by value-type for the type-stability.

Keyword argument performance is unfortunately quite a pain point for Julia right now...type-inferable/fast keyword arguments are one of my most wanted features. I'm really hoping something like JuliaLang/julia#16580 makes it into 1.0. I struggled with keyword argument performance when I first started working on ForwardDiff, and it's the reason I refactored ForwardDiff's API to get rid of keyword arguments and use an inferable configuration type instead.

@ChrisRackauckas
Copy link
Contributor Author

Yeah, that keyword argument is deprecated - chunk size is now specified by a type parameter of the *Config types. You can ask the user to pass you a *Config type, so that you have the preallocated work buffers and predetermined chunk size. (I'm sure @ChrisRackauckas knows this already, though, since I've seen him use the new API before.)

What I am proposing is a way to pass the chosen chunksize to the type parameter of the *Config type. Specifically, from the code I linked:

Base.@pure function autodiff_setup{CS}(f!, initial_x::Vector,chunk_size::Type{Val{CS}})

    permf! = (fx, x) -> f!(x, fx)

    fx2 = copy(initial_x)
    jac_cfg = ForwardDiff.JacobianConfig{CS}(initial_x, initial_x)
    g! = (x, gx) -> ForwardDiff.jacobian!(gx, permf!, fx2, x, jac_cfg)

    fg! = (x, fx, gx) -> begin
        jac_res = DiffBase.DiffResult(fx, gx)
        ForwardDiff.jacobian!(jac_res, permf!, fx2, x, jac_cfg)
        DiffBase.value(jac_res)
    end

    return DifferentiableMultivariateFunction(f!, g!, fg!)
end

In NLsolve.jl there is currently no way to set the config unless you make the DifferentiableMultivariateFunction itself, which is why I have this setup in my own package but was wondering about a good way to simply add it to NLsolve itself.

@sglyon
Copy link
Collaborator

sglyon commented Jan 24, 2017

Ahh I see, ok sounds like you guys are the ones to figure out how to implement this!

@ChrisRackauckas
Copy link
Contributor Author

ChrisRackauckas commented Jan 24, 2017

Well, that code would just add to this:

https://github.com/EconForge/NLsolve.jl/blob/master/src/autodiff.jl#L1

Chunksize would have to be an optional arg which would call the ForwardDiff default chunksize function when it's not specified. My question before was how to set this up all type-stable, but it sounds like from the other thread that I shouldn't worry about that and instead should just pass the number as a keyword argument here @KristofferC ?

@jrevels
Copy link

jrevels commented Jan 24, 2017

Luckily, if your API uses ForwardDiff's in-place methods (like in your code), then it's not as big of a deal for the chunk size to be dynamic (since the result type is known regardless). It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.

chunksize would have to be an optional arg which would call the ForwardDiff default chunksize function when it's not specified

Why not use autodiff_setup(f!, initial_x::Vector, cfg::JacobianConfig = JacobianConfig(initial_x, initial_x))? That allows the chunk size setting to have the behaviors you want. It also allows preallocation of the work buffers.

I'm not super familiar with NLSolve, but why can't the code you linked simply be the API for using AD with NLSolve? In other words, remove autodiff = true and replace it with an easy-to-use constructor for making a DifferentiableMultivariateFunction with AD. IMO, this is a better API because it inverts control to the caller and allows for more flexibility. I feel like most APIs that use AD underneath should adopt this kind of convention rather than catch-all options like autodiff = true.

@KristofferC
Copy link
Collaborator

It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.

I am surprised about this. Isn't it the case that as long as the function doesn't get inlined, then the generated code for the keyword argument function will be exactly the same as the one without keywords:

For example:

julia> @inline f(;x=1.0) = (z = 0.0; for i in 1:10^5; z += sin(x); end; z)
f (generic function with 2 methods)

julia> @noinline g(;x=1.0) = (z = 0.0; for i in 1:10^5; z += sin(x); end; z)
g (generic function with 1 method)

julia> @btime f(x = 0.5)
  3.228 ms (200001 allocations: 3.05 MiB)
47942.55386052721

julia> @btime g(x = 0.5)
  642.804 μs (2 allocations: 112 bytes)
47942.55386052721

@jrevels
Copy link

jrevels commented Jan 25, 2017

It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.
I am surprised about this...

This statement doesn't (directly) have anything to do with keyword arguments - apologies if I was unclear. I was saying that certain optimizations aren't going to be available to you if the chunk size is purely dynamic, i.e. isn't known at compile time. SIMD on dual numbers is a good example, because AFAIK, the tuple length has to be statically inferable for the SLP vectorizer to work.

My earlier problem with keyword arguments is that arguments passed in via keyword arguments are inferred poorly. For example:

julia> f{N}(; x::Type{Val{N}} = Val{3}) = Val{N}()
f (generic function with 1 method)

# yes, this is fine because the default 
# value for `x` is well inferred
julia> @code_warntype f()
Variables:
  #self#::#f

Body:
  begin
      return $(QuoteNode(Val{3}()))
  end::Val{3}

# ...but as soon as you actually use
# the kwarg, you're poisoning the 
# well. Note this happens 
# regardless of whether
# or not f is inlined (on v0.6).
julia> @code_warntype f(x = Val{2})
Variables:
  #unused#::#kw##f
  #temp#@_2::Array{Any,1}
  ::#f
  #temp#@_4::Int64
  #temp#@_5::Int64
  #temp#@_6::Any
  #temp#@_7::Int64
  #temp#@_8::Bool
  x::Any

Body:
  begin
      NewvarNode(:(x::Any))
      #temp#@_8::Bool = true
      SSAValue(2) = (Base.arraylen)(#temp#@_2::Array{Any,1})::Int64
      SSAValue(3) = (Base.select_value)((Base.sle_int)(0,1)::Bool,(Base.ashr_int)(SSAValue(2),(Base.bitcast)(UInt64,1))::Int64,(Base.shl_int)(SSAValue(2),(Base.bitcast)(UInt64,(Base.neg_int)(1)::Int64))::Int64)::Int64
      SSAValue(4) = (Base.select_value)((Base.sle_int)(1,SSAValue(3))::Bool,SSAValue(3),(Base.sub_int)(1,1)::Int64)::Int64
      #temp#@_7::Int64 = 1
      7:
      unless (Base.not_int)((#temp#@_7::Int64 === (Base.add_int)(SSAValue(4),1)::Int64)::Bool)::Bool goto 23
      SSAValue(5) = #temp#@_7::Int64
      SSAValue(6) = (Base.add_int)(#temp#@_7::Int64,1)::Int64
      #temp#@_4::Int64 = SSAValue(5)
      #temp#@_7::Int64 = SSAValue(6)
      #temp#@_5::Int64 = (Base.sub_int)((Base.mul_int)(#temp#@_4::Int64,2)::Int64,1)::Int64
      #temp#@_6::Any = (Core.arrayref)(#temp#@_2::Array{Any,1},#temp#@_5::Int64)::Any
      unless (#temp#@_6::Any === :x)::Bool goto 19
      x::Any = (Core.arrayref)(#temp#@_2::Array{Any,1},(Base.add_int)(#temp#@_5::Int64,1)::Int64)::Any
      #temp#@_8::Bool = false
      goto 21
      19:
      (Base.throw)($(Expr(:new, :(Base.MethodError), :((Core.getfield)((Core.getfield)((Core.getfield)(#f,:name)::TypeName,:mt),:kwsorter)), :((Core.tuple)(#temp#@_2,)::Tuple{Array{Any,1},#f}), :($(QuoteNode(0xffffffffffffffff))))))::Union{}
      21:
      goto 7
      23:
      unless #temp#@_8::Bool goto 26
      x::Any = Val{3}
      26:
      return (Main.#f#1)(x::Any,::#f)::Any
  end::Any

@KristofferC
Copy link
Collaborator

KristofferC commented Jan 25, 2017

because AFAIK, the tuple length has to be statically inferable for the SLP vectorizer to work.

It has to be available at compilation time of the function. My point is that there is a function barrier between the keyword argument function and the function actually doing the work. For example, in the generated code you linked, all of that is just setting up the keyword argument sorting stuff until finally calling (Main.#f#1)(x::Any,::#f) which is where the actual work is done. At this point, julia will do dynamic dispatch to figure out the type of x and then compile an optimized version of that function that will be called. So you pay for the dynamic dispatch but not in the generated code of the function (as long as it is not inlined).

Note that I am not talking about type inference of the return value, only about the generated code of the function.

@KristofferC
Copy link
Collaborator

This also describes what my understanding is: JuliaLang/julia#10443 (comment)

@jrevels
Copy link

jrevels commented Jan 26, 2017

Ah, that makes sense. What's the right way to test whether or not the operations below the barrier are getting SIMD vectorized? For example:

julia> using ForwardDiff: Dual

julia> f(x::Dual) = x + x
f (generic function with 1 method)

julia> @code_llvm f(Dual(2,3,4,5))

define void @julia_f_65900(%Dual* noalias sret, %Dual*) #0 !dbg !5 {
top:
  %2 = bitcast %Dual* %1 to <4 x i64>*
  %3 = load <4 x i64>, <4 x i64>* %2, align 8
  %4 = shl <4 x i64> %3, <i64 1, i64 1, i64 1, i64 1>
  %5 = bitcast %Dual* %0 to <4 x i64>*
  store <4 x i64> %4, <4 x i64>* %5, align 8
  ret void
}

julia> f2(; x::Dual=Dual(1,1)) = x + x

# If my understanding is correct, this call 
# should dynamically dispatch to a compiled
# representation that is essentially equivalent
# to the compiled representation of 
#`f(::Dual{4,Int})`, which we confirmed
# above uses SIMD.
#
# How do we confirm this? Just using 
# `@code_llvm` etc. will only show
# the dispatch instructions, yes? 
julia> f2(x=Dual(1,2,3,4))

Of course, a few points still stand; for keyword arguments to be considered "performant", the work being done below the function barrier must be expensive enough that the cost of dynamic dispatch is negligible, and the caller's return type can't depend on the keyword argument if we want it to be inferable.

@KristofferC
Copy link
Collaborator

Of course, a few points still stand; for keyword arguments to be considered "performant", the work being done below the function barrier must be expensive enough that the cost of dynamic dispatch is negligible, and the caller's return type can't depend on the keyword argument if we want it to be inferable.

Agree on all points.

I'm not sure how to "look into" the inner function though...

@pkofod
Copy link
Member

pkofod commented Jan 27, 2018

@pkofod pkofod closed this as completed Jan 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants