Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure bivariate_normals vectorizes properly #483

Closed
Keno opened this issue Dec 19, 2016 · 4 comments
Closed

Make sure bivariate_normals vectorizes properly #483

Keno opened this issue Dec 19, 2016 · 4 comments
Assignees

Comments

@Keno
Copy link
Collaborator

Keno commented Dec 19, 2016

The loops in bivariate_normals showed up as a hot spot in profiling. We should make sure the compiler takes adequate advantage of the constancy of the trip counts, etc.

I'll be using this as a benchmark:

using Celeste
using StaticArrays
using BenchmarkTools

bvn_derivs = Celeste.Model.BivariateNormalDerivatives{Float64}()
sig_sf = Celeste.Model.GalaxySigmaDerivs(1.0,1.0,1.0,
    zeros(SMatrix{2,2,Float64,4}))

const gal_shape_ids = Celeste.Model.gal_shape_ids


function benchmark(sig_sf, bvn_derivs)
    bvn_s_d = bvn_derivs.bvn_s_d
    bvn_sig_d = bvn_derivs.bvn_sig_d
    bvn_ss_h = bvn_derivs.bvn_ss_h
    bvn_us_h = bvn_derivs.bvn_us_h

    fill!(bvn_s_d, 0.0)
    @inbounds for shape_id in 1:length(gal_shape_ids), sig_id in 1:3
        bvn_s_d[shape_id] += bvn_sig_d[sig_id] * sig_sf.j[sig_id, shape_id]
    end

    fill!(bvn_ss_h, 0.0)
    fill!(bvn_us_h, 0.0)

    @inbounds for shape_id2 in 1:length(gal_shape_ids), shape_id1 in 1:shape_id2
      @inbounds for sig_id1 in 1:3
        bvn_ss_h[shape_id1, shape_id2] +=
          bvn_sig_d[sig_id1] * sig_sf.t[sig_id1, shape_id1, shape_id2]
      end
    end

    bvn_sigsig_h = bvn_derivs.bvn_sigsig_h
    @inbounds for sig_id1 in 1:3, sig_id2 in 1:3,
                  shape_id2 in 1:length(gal_shape_ids)
      inner_term =
        bvn_sigsig_h[sig_id1, sig_id2] * sig_sf.j[sig_id2, shape_id2]
      @inbounds for shape_id1 in 1:shape_id2
        bvn_ss_h[shape_id1, shape_id2] +=
          inner_term * sig_sf.j[sig_id1, shape_id1]
      end
    end

    @inbounds for shape_id2 in 1:length(gal_shape_ids), shape_id1 in 1:shape_id2
      bvn_ss_h[shape_id2, shape_id1] = bvn_ss_h[shape_id1, shape_id2]
    end
end

benchmark(sig_sf, bvn_derivs)
@benchmark benchmark(sig_sf, bvn_derivs)
@Keno
Copy link
Collaborator Author

Keno commented Dec 19, 2016

Current results:
On KNL: 36383.779 ns
On 6 year old Westmere: 35935.601 ns

[NOTE: Not actual numbers due to restriction of publication of performance numbers. See slack for how to derive actual numbers]

@Keno
Copy link
Collaborator Author

Keno commented Dec 19, 2016

Ok, the problem here turns out to be that LLVM thinks those arrays may alias.

@jeff-regier jeff-regier added this to the pre-January hackathon milestone Jan 11, 2017
@Keno
Copy link
Collaborator Author

Keno commented Jan 26, 2017

JuliaLang/julia#20257 has a WIP for this.

@jeff-regier
Copy link
Owner

@Keno @andreasnoack Is this going to be ready in time for GB? There's only about a month left for doing all the runs.

@jeff-regier jeff-regier removed this from the pre-January hackathon milestone Mar 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants