Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly interesting features of Julia v1.8 #1075

Open
ranocha opened this issue Feb 28, 2022 · 9 comments
Open

Possibly interesting features of Julia v1.8 #1075

ranocha opened this issue Feb 28, 2022 · 9 comments

Comments

@ranocha
Copy link
Member

ranocha commented Feb 28, 2022

From https://github.com/JuliaLang/julia/blob/v1.8.0-beta1/NEWS.md:

  • Mutable struct fields may now be annotated as const to prevent changing them after construction, providing for greater clarity and optimization ability of these objects (#43305).

    May be interesting for stuff like MHD

  • Type annotations can now be added to global variables to make accessing them type stable (#43671).

    May be interesting to avoid global constants using Refs.

  • @inline and @noinline annotations can now be applied to a function call site or block to enforce the involved function calls to be (or not to be) inlined (#41312).

    See Think about callsite inlining when it's shipped officially #836

  • Base.ifelse is now defined as a generic function rather than a builtin one, allowing packages to extend its definition (#37343).

    Should allow us to get rid of our dependency IfElse.jl.

  • Inference now tracks various effects such as side-effectful-ness and nothrow-ness on a per-specialization basis. Code heavily dependent on constant propagation should see significant compile-time performance improvements and certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance improvements. Effects may be overwritten manually with the @Base.assume_effects macro (#43852).

  • The LazyString and the lazy"str" macro were added to support delayed construction of error messages in error paths (#33711).

  • New macro @time_imports for reporting any time spent importing packages and their dependencies (#41612).

    Mostly for development

  • The standard library LinearAlgebra.jl is now completely independent of SparseArrays.jl, both in terms of the source code as well as unit testing (#43127). As a consequence, sparse arrays are no longer (silently) returned by methods from LinearAlgebra applied to Base or LinearAlgebra objects. Specifically, this results in the following breaking changes
    ...
    New sparse concatenation functions sparse_hcat, sparse_vcat, and sparse_hvcat return SparseMatrixCSC output independent from the types of the input arguments. They make concatenation behavior available, in which the presence of some special "sparse" matrix argument resulted in sparse output by multiple dispatch. This is no longer possible after making LinearAlgebra.jl independent from SparseArrays.jl (#43127).

    We should check whether we rely on this changed behavior somewhere (DGMulti, I'm looking at you)

  • CPU profiling now records sample metadata including thread and task. Profile.print() has a new groupby kwarg that allows grouping by thread, task, or nested thread/task, task/thread, and threads and tasks kwargs to allow filtering. Further, percent utilization is now reported as a total or per-thread, based on whether the thread is idle or not at each sample. Profile.fetch() includes the new metadata by default. For backwards compatibility with external profiling data consumers, it can be excluded by passing include_meta=false (#41742).

    Should be helpful for investigating multithreaded performance

@sloede
Copy link
Member

sloede commented Feb 28, 2022

Thanks a lot for this summary! You preempted me adding this as a topic to tomorrow's agenda ;-)

  • Type annotations can now be added to global variables to make accessing them type stable (#43671).

    May be interesting to avoid global constants using `Ref`s.
    

How would that work? I understand that adding type annotations may make the code faster due to type stability, but what's the role of Refs here?

@ranocha
Copy link
Member Author

ranocha commented Feb 28, 2022

what's the role of Refs here?

Our way of doing this right now, e.g.,

const MPI_INITIALIZED = Ref(false)
const MPI_RANK = Ref(-1)
const MPI_SIZE = Ref(-1)
const MPI_IS_PARALLEL = Ref(false)
const MPI_IS_SERIAL = Ref(true)
const MPI_IS_ROOT = Ref(true)

@sloede
Copy link
Member

sloede commented Feb 28, 2022

Ah, you mean with the upcoming change we can get rid of the const VAR = Ref(something) for global variables whose values are not constant, changing them to something like VAR::TYPE = something, e.g., MPI_INITIALIZED::Bool = false?

@ranocha
Copy link
Member Author

ranocha commented Feb 28, 2022

Yes, that's how I understand it.

@jlchan
Copy link
Contributor

jlchan commented Feb 28, 2022

Re: sparse behavior. I don't think we use that but I'll check

@ranocha
Copy link
Member Author

ranocha commented Feb 28, 2022

I also think that we sparsify the arrays explicitly, but checking is always better

@ranocha
Copy link
Member Author

ranocha commented Mar 3, 2022

If JuliaLang/julia#44359 gets backported to Julia v1.8, we should check our new CI times with code coverage.

@giordano
Copy link

giordano commented Mar 3, 2022

As far as I understand, actually constant globals are still the best option. The ability to type-annotate globals should be not to make non-constant globals horribly slow, but if you care about performance a lot you shouldn't get rid of const. See for example

julia> const A = 3.14
3.14

julia> f_A() = A + 1.0
f_A (generic function with 1 method)

julia> B::Float64 = 3.14
3.14

julia> f_B() = B + 1.0
f_B (generic function with 1 method)

julia> C = 3.14
3.14

julia> f_C() = C + 1.0
f_C (generic function with 1 method)

julia> @code_llvm debuginfo=:none f_A()
define double @julia_f_A_149() #0 {
top:
  ret double 0x40108F5C28F5C290
}

julia> @code_llvm debuginfo=:none f_B()
define double @julia_f_B_167() #0 {
top:
  %0 = load atomic double*, double** inttoptr (i64 139850997151064 to double**) unordered, align 8
  %1 = load double, double* %0, align 8
  %2 = fadd double %1, 1.000000e+00
  ret double %2
}

julia> @code_llvm debuginfo=:none f_C()
define nonnull {}* @julia_f_C_171() #0 {
top:
  %0 = alloca [2 x {}*], align 8
  %gcframe2 = alloca [3 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 0
  %1 = bitcast [3 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %1, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #3
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %2 = bitcast [3 x {}*]* %gcframe2 to i64*
  store i64 4, i64* %2, align 16
  %3 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
  %4 = bitcast {}** %3 to {}***
  %5 = load {}**, {}*** %pgcstack, align 8
  store {}** %5, {}*** %4, align 8
  %6 = bitcast {}*** %pgcstack to {}***
  store {}** %gcframe2.sub, {}*** %6, align 8
  %7 = load atomic {}*, {}** inttoptr (i64 139850997203800 to {}**) unordered, align 8
  %8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
  store {}* %7, {}** %8, align 16
  store {}* %7, {}** %.sub, align 8
  %9 = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 1
  store {}* inttoptr (i64 139850977822832 to {}*), {}** %9, align 8
  %10 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 139850795186000 to {}*), {}** nonnull %.sub, i32 2)
  %11 = load {}*, {}** %3, align 8
  %12 = bitcast {}*** %pgcstack to {}**
  store {}* %11, {}** %12, align 8
  ret {}* %10
}

julia> @benchmark f_A()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.370 ns … 15.942 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.387 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.438 ns ±  0.607 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █ ▆                                                        
  ██▄█▃▂▁▂▆▃▅▂▁▁▁▁▂▃▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂ ▂
  1.37 ns        Histogram: frequency by time        1.66 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f_B()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.734 ns … 21.955 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.815 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.933 ns ±  0.980 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █                                                        
  ▄▁▄█▂▃▄▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▁▂▁▂▁▂ ▂
  2.73 ns        Histogram: frequency by time        4.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f_C()
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  19.486 ns …  1.241 μs  ┊ GC (min … max): 0.00% … 97.75%
 Time  (median):     20.803 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.922 ns ± 17.157 ns  ┊ GC (mean ± σ):  1.01% ±  1.38%

  ▄█▇▇▅   ▁▁▃▅▅▄▄▂▁       ▁▁▁▁                                ▂
  █████▆▅▅█████████▇▆▆▆▆▇▇████▇▅▄▄▅▅▇▇▇▇▇▆▅▄▂▄▄▄▅▃▅▆▆▅▃▄▄▄▄▄▅ █
  19.5 ns      Histogram: log(frequency) by time      45.1 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

Edit: however a constant Ref is probably not much different from a type-annotate global:

julia> const D = Ref(3.14)
Base.RefValue{Float64}(3.14)

julia> f_D() = D[] + 1.0
f_D (generic function with 1 method)

julia> @code_llvm debuginfo=:none f_D()
define double @julia_f_D_743() #0 {
top:
  %0 = load double, double* inttoptr (i64 140703666509008 to double*), align 16
  %1 = fadd double %0, 1.000000e+00
  ret double %1
}

julia> @benchmark f_D()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.727 ns … 22.539 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.903 ns ±  0.897 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▅    ▁                                                  
  █▄▂▁█▂▄▁▁█▆▃▂▁▁▂▃▂▂▁▂▂▁▂▂▁▂▃▂▁▂▂▁▁▁▂▁▂▁▁▁▂▂▁▂▂▁▁▁▁▂▂▂▂▁▂▂▂ ▂
  2.73 ns        Histogram: frequency by time        3.64 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The LLVM IR is similar (but not identical, access to the non-constant global is atomic).

@ranocha
Copy link
Member Author

ranocha commented Mar 4, 2022

Yeah, the few global variables that we use are constant global Refs. So the basic difference is whether we have a plain load or an atomic load of them, but these first benchmarks seem to indicate that this does not matter a lot, does it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants