Make sure bivariate_normals vectorizes properly #483

Keno · 2016-12-19T19:55:09Z

The loops in bivariate_normals showed up as a hot spot in profiling. We should make sure the compiler takes adequate advantage of the constancy of the trip counts, etc.

I'll be using this as a benchmark:

using Celeste
using StaticArrays
using BenchmarkTools

bvn_derivs = Celeste.Model.BivariateNormalDerivatives{Float64}()
sig_sf = Celeste.Model.GalaxySigmaDerivs(1.0,1.0,1.0,
    zeros(SMatrix{2,2,Float64,4}))

const gal_shape_ids = Celeste.Model.gal_shape_ids


function benchmark(sig_sf, bvn_derivs)
    bvn_s_d = bvn_derivs.bvn_s_d
    bvn_sig_d = bvn_derivs.bvn_sig_d
    bvn_ss_h = bvn_derivs.bvn_ss_h
    bvn_us_h = bvn_derivs.bvn_us_h

    fill!(bvn_s_d, 0.0)
    @inbounds for shape_id in 1:length(gal_shape_ids), sig_id in 1:3
        bvn_s_d[shape_id] += bvn_sig_d[sig_id] * sig_sf.j[sig_id, shape_id]
    end

    fill!(bvn_ss_h, 0.0)
    fill!(bvn_us_h, 0.0)

    @inbounds for shape_id2 in 1:length(gal_shape_ids), shape_id1 in 1:shape_id2
      @inbounds for sig_id1 in 1:3
        bvn_ss_h[shape_id1, shape_id2] +=
          bvn_sig_d[sig_id1] * sig_sf.t[sig_id1, shape_id1, shape_id2]
      end
    end

    bvn_sigsig_h = bvn_derivs.bvn_sigsig_h
    @inbounds for sig_id1 in 1:3, sig_id2 in 1:3,
                  shape_id2 in 1:length(gal_shape_ids)
      inner_term =
        bvn_sigsig_h[sig_id1, sig_id2] * sig_sf.j[sig_id2, shape_id2]
      @inbounds for shape_id1 in 1:shape_id2
        bvn_ss_h[shape_id1, shape_id2] +=
          inner_term * sig_sf.j[sig_id1, shape_id1]
      end
    end

    @inbounds for shape_id2 in 1:length(gal_shape_ids), shape_id1 in 1:shape_id2
      bvn_ss_h[shape_id2, shape_id1] = bvn_ss_h[shape_id1, shape_id2]
    end
end

benchmark(sig_sf, bvn_derivs)
@benchmark benchmark(sig_sf, bvn_derivs)

The text was updated successfully, but these errors were encountered:

Keno · 2016-12-19T20:17:18Z

Current results:
On KNL: 36383.779 ns
On 6 year old Westmere: 35935.601 ns

[NOTE: Not actual numbers due to restriction of publication of performance numbers. See slack for how to derive actual numbers]

Keno · 2016-12-19T22:16:54Z

Ok, the problem here turns out to be that LLVM thinks those arrays may alias.

Keno · 2017-01-26T18:23:44Z

JuliaLang/julia#20257 has a WIP for this.

jeff-regier · 2017-02-14T18:30:11Z

@Keno @andreasnoack Is this going to be ready in time for GB? There's only about a month left for doing all the runs.

Keno mentioned this issue Dec 19, 2016

RFC: Macro for expression noalias hints JuliaLang/julia#19658

Open

jeff-regier assigned Keno Dec 21, 2016

jeff-regier added this to the pre-January hackathon milestone Jan 11, 2017

jeff-regier removed this from the pre-January hackathon milestone Mar 23, 2017

jeff-regier closed this as completed Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure bivariate_normals vectorizes properly #483

Make sure bivariate_normals vectorizes properly #483

Keno commented Dec 19, 2016 •

edited

Loading

Keno commented Dec 19, 2016

Keno commented Dec 19, 2016

Keno commented Jan 26, 2017

jeff-regier commented Feb 14, 2017

Make sure bivariate_normals vectorizes properly #483

Make sure bivariate_normals vectorizes properly #483

Comments

Keno commented Dec 19, 2016 • edited Loading

Keno commented Dec 19, 2016

Keno commented Dec 19, 2016

Keno commented Jan 26, 2017

jeff-regier commented Feb 14, 2017

Keno commented Dec 19, 2016 •

edited

Loading