Add patch to set `-O2` for suitesparse build #20165

staticfloat · 2017-01-20T23:31:43Z

This fixes the compiler regression noticed in GCC 6.3.0 in #20123 by disabling -ftree-slp-vectorize as a side-effect of moving from -O3 to -O2. We prefer to compile with -O2 by default anyway, so let's do so.

tkelman · 2017-01-21T00:23:13Z

We prefer to compile with -O2 by default anyway, so let's do so.

Where are you getting that from? I see a heck of a lot more instances of -O3 in our makefiles, only a handful of -O2.

@nanosoldier runbenchmarks("sparse", vs = ":master")

tkelman · 2017-01-21T00:24:07Z

deps/suitesparse.mk

 ifeq ($(USE_SYSTEM_BLAS), 0)
 $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/build-compiled: | $(build_prefix)/manifest/openblas
 else ifeq ($(USE_SYSTEM_LAPACK), 0)
 $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/build-compiled: | $(build_prefix)/manifest/lapack
 endif
-$(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/build-compiled: $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/source-extracted $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/SuiteSparse-winclang.patch-applied
+$(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/build-compiled: $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/source-extracted $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/SuiteSparse-O2.patch-applied 


run make check-whitespace locally if you ever do ci skip

nanosoldier · 2017-01-21T02:26:57Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

tkelman · 2017-01-21T02:31:10Z

Ah some might be under linalg... @nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-01-21T06:39:45Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

ViralBShah · 2017-01-21T12:05:45Z

As @nalimilan pointed out elsewhere, we really ought to be using -O2 as the default except in cases where we can prove that -O3 does better. This is the policy in many linux distros apparently.

staticfloat · 2017-02-11T20:34:34Z

@tkelman So it looks like we've got significant performance regressions with this changeset, huh.

tkelman · 2017-02-11T20:39:48Z

I'm not so sure those were related, but I'd also prefer this be targeted to the specific architectures where it's known to cause a problem. And, you know, report the gcc bug and see how long it takes them to fix it (they can be very quick on some bugs).

In this case I'm not sure the benchmarks on nanosoldier include good test cases for suitesparse functionality. A cholfact or lufact of some large known sparse matrix (say from the formerly-UF collection) would be worth timing.

pabloferz · 2017-02-11T21:00:45Z

Those regressions might be due to #19976

tkelman

need to fix the whitespace, and benchmark on something that exercises suitesparse thoroughly

tkelman · 2017-02-12T19:56:37Z

-O3 definitely enables a whole range of tree vectorization passes in gcc that should be improving performance on code like suitesparse. Since we've been using O3 for so long, the burden should be on proving that changing to O2 doesn't hurt.

staticfloat · 2017-02-24T23:53:04Z

On this branch (newly rebased on top of master), built on x86_64 using gcc 5.4.0:

Test (Worker)             | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
sparse/umfpack (5)        |    9.74  |  0.16  |  1.6 | 325.47     | 241.17
sparse/cholmod (5)        |   21.69  |  0.39  |  1.8 | 636.43     | 313.36
sparse/spqr (5)           |    2.47  |  0.04  |  1.5 | 86.65      | 313.36
sparse/sparse (2)         |   85.97  | 17.31  | 20.1 | 2012.92    | 387.27
sparse/higherorderfns (4) |   87.18  |  1.44  |  1.7 | 3554.29    | 413.45
sparse/sparsevector (3)   |   87.83  |  1.05  |  1.2 | 2588.59    | 381.42

On master (d321300):

Test (Worker)             | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
sparse/umfpack (5)        |    9.60  |  0.17  |  1.8 | 325.55     | 235.16
sparse/cholmod (5)        |   19.08  |  0.36  |  1.9 | 636.44     | 309.36
sparse/spqr (5)           |    2.49  |  0.03  |  1.4 | 86.65      | 309.36
sparse/sparse (2)         |   82.86  | 17.63  | 21.3 | 2012.94    | 387.06
sparse/sparsevector (3)   |   84.38  |  0.97  |  1.2 | 2581.58    | 384.82
sparse/higherorderfns (4) |   84.73  |  1.40  |  1.7 | 3554.21    | 411.90

@ViralBShah are there any other tests that it might be instructive to run to make sure that suitesparse isn't getting severely slowed down? So far it looks like cholmod is ~10% slower.

I should note that the only difference between these two builds is to switch branches and make -C deps distclean-suitesparse. The speed of the compiler itself is not changing here.

staticfloat · 2017-02-25T01:28:53Z

Here are the same results on the same machine but using gcc 6.3.0, like we use on the buildbots. (All hail Docker 🐳 )

With -O3 on master:

Test (Worker)             | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
sparse/umfpack (5)        |    9.39  |  0.18  |  1.9 | 326.13     | 238.03
sparse/cholmod (5)        |   18.44  |  0.34  |  1.9 | 636.91     | 312.15
sparse/spqr (5)           |    2.42  |  0.03  |  1.4 | 86.65      | 312.15
sparse/sparse (2)         |   80.55  | 17.08  | 21.2 | 2015.10    | 390.14
sparse/sparsevector (3)   |   82.44  |  0.98  |  1.2 | 2590.26    | 385.08
sparse/higherorderfns (4) |   83.13  |  1.38  |  1.7 | 3553.82    | 410.05

With -O2 on this branch:

Test (Worker)             | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
sparse/umfpack (5)        |    8.75  |  0.20  |  2.2 | 302.62     | 228.37
sparse/cholmod (5)        |   17.34  |  0.30  |  1.7 | 603.76     | 302.43
sparse/spqr (5)           |    2.38  |  0.03  |  1.4 | 88.33      | 302.43
sparse/sparse (2)         |   77.48  | 16.12  | 20.8 | 1993.22    | 391.83
sparse/higherorderfns (4) |   79.89  |  1.28  |  1.6 | 3236.99    | 416.92
sparse/sparsevector (3)   |   81.31  |  1.13  |  1.4 | 2754.84    | 357.80

I.....huh. Looks like GCC 6.3.0's -O2 has improved.

tkelman · 2017-02-25T01:46:23Z

test runtime isn't as useful as a perf benchmark run via BenchmarkTools, the tests are likely driven by julia JIT compile time rather than library function runtime

staticfloat · 2017-02-28T04:48:28Z

I ran the sparse perf benchmark on x86_64 with both -O2 and -O3 versions of suitesparse built. I tested the performance using the following script:

# sparse_perf.jl
using BaseBenchmarks
using BenchmarkTools

benchmarks = BaseBenchmarks.load!("sparse")

warmup(benchmarks)
results = run(benchmarks; verbose = true)
BenchmarkTools.save("../master.jld", "results", results)

With an identical one that output to ../pr.jld for the -O2 version of Julia. I analyzed with the following script:

# compare.jl
using BenchmarkTools, BaseBenchmarks, JLD

pr = load("pr.jld", "results")
master = load("master.jld", "results")
pairs = leaves(regressions(judge(minimum(pr), minimum(master))))

Which found no consistent regressions. The full results are available as a .jld file, if you're interested.

tkelman · 2017-02-28T08:51:42Z

what does that benchmark run though? is it actually doing anything that exercises suitesparse, or only the pure-julia sparse code?

this needs to be reported as a gcc bug.

staticfloat · 2017-02-28T21:15:38Z

this needs to be reported as a gcc bug.

As I'm assuming I would have to boil this down into an MWE, I do not have the time or energy to report this as a GCC bug at the moment.

is it actually doing anything that exercises suitesparse, or only the pure-julia sparse code?

I downloaded ~~three~~ two matrices (apparently we can't decompose integer-valued matrices) from the University of Florida dataset and ran factorize() on them. Here's the script:

# sparse_perf.jl
using BenchmarkTools
using MatrixMarket

#M1 = MatrixMarket.mmread("rosen2/rosen2.mtx")
M2 = MatrixMarket.mmread("bayer09/bayer09.mtx")
M3 = MatrixMarket.mmread("bcsstk23/bcsstk23.mtx")

#t1 = @benchmark factorize(M1)
t2 = @benchmark factorize(M2)
t3 = @benchmark factorize(M3)


println("Real unsymmetric (bayer09)")
display(t2)
println("\nReal symmetric (bcsstk23)")
display(t3)
println()

Here's the -O3 output:

Real unsymmetric (bayer09)
BenchmarkTools.Trial: 
  memory estimate:  3.17 MiB
  allocs estimate:  64
  --------------
  minimum time:     4.811 ms (0.00% GC)
  median time:      4.878 ms (0.00% GC)
  mean time:        4.946 ms (0.74% GC)
  maximum time:     16.998 ms (1.51% GC)
  --------------
  samples:          1004
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
Real symmetric (bcsstk23)
BenchmarkTools.Trial: 
  memory estimate:  9.49 MiB
  allocs estimate:  48
  --------------
  minimum time:     11.374 ms (0.00% GC)
  median time:      11.447 ms (0.00% GC)
  mean time:        11.701 ms (1.97% GC)
  maximum time:     62.680 ms (81.45% GC)
  --------------
  samples:          425
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Here's the -O2 output:

Real unsymmetric (bayer09)
BenchmarkTools.Trial: 
  memory estimate:  3.17 MiB
  allocs estimate:  64
  --------------
  minimum time:     4.752 ms (0.00% GC)
  median time:      4.833 ms (0.00% GC)
  mean time:        4.884 ms (0.78% GC)
  maximum time:     7.559 ms (3.49% GC)
  --------------
  samples:          1018
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
Real symmetric (bcsstk23)
BenchmarkTools.Trial: 
  memory estimate:  9.49 MiB
  allocs estimate:  48
  --------------
  minimum time:     11.782 ms (0.00% GC)
  median time:      11.830 ms (0.00% GC)
  mean time:        12.111 ms (1.98% GC)
  maximum time:     63.003 ms (81.00% GC)
  --------------
  samples:          411
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

It looks to me that we don't have too much difference here.

tkelman · 2017-02-28T21:55:06Z

We already have the Julia-side example here, right? #20123 (comment) Translating that into ccalls shouldn't be that hard. If upstream defaults to -O3 then we should consider this temporary until the GCC bug gets fixed, or convince SuiteSparse to change their defaults.

staticfloat · 2017-03-04T07:48:57Z

I'd like to get this merged for 0.6 so that building with the latest GCC doesn't cause problems.

[ci skip]

tkelman · 2017-03-04T08:50:45Z

I still don't like doing this on platforms where it doesn't cause problems, or merging without an upstream bug report to gcc.

ViralBShah · 2017-03-04T10:09:44Z

Sorry @tkelman I am going ahead with this.

ViralBShah · 2017-03-04T10:11:18Z

Agree about the upstream bug report to gcc and we should perhaps also notify Tim Davis.

tkelman · 2017-03-04T15:08:37Z

And if we don't do these things before merging, they never happen.

tkelman · 2017-03-04T19:08:24Z

The passes enabled by -O3 probably can't do much unless you also let gcc use newer instructions with -march=native.

staticfloat · 2017-03-05T05:05:40Z

And if we don't do these things before merging, they never happen.

Here's a ccall-based MWE with a dockerfile that sets up the environment for the segfault.

tkelman · 2017-03-05T08:03:51Z

deps/suitesparse.mk

@@ -37,12 +37,17 @@ $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/SuiteSparse-winclang.patch-applied: $
 	cd $(dir $@) && patch -p0 < $(SRCDIR)/patches/SuiteSparse-winclang.patch
 	echo 1 > $@

+$(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/SuiteSparse-O2.patch-applied: $(BUILDDIR)/SuiteSparse-$(SUITESPARSE_VER)/SuiteSparse-winclang.patch-applied


this is also wrong, these belong in srccache

Should we move all the .patch-applied files into srccache then? I was just following what was already in the file.

ah on second thought it depends whether the dependency is capable of being built out of tree. we could use comments in these makefiles that explain inconsistencies like this

This reverts commit 145eae8. This was actually a bug in the power kernels in OpenBLAS, not a GCC or SuiteSparse problem

tkelman reviewed Jan 21, 2017

View reviewed changes

ararslan added building Build system, or building Julia or its dependencies sparse Sparse arrays labels Jan 23, 2017

tkelman mentioned this pull request Feb 11, 2017

[release-0.5] Add patch to set -O2 for suitesparse build #20166

Closed

tkelman suggested changes Feb 12, 2017

View reviewed changes

staticfloat force-pushed the sf/suitesparse_o2_master branch from 238b9cc to 8dcf9ca Compare February 24, 2017 23:44

Add patch to set -O2 for suitesparse build

2c62f07

[ci skip]

staticfloat force-pushed the sf/suitesparse_o2_master branch from 8dcf9ca to 2c62f07 Compare March 4, 2017 07:50

ViralBShah merged commit 145eae8 into master Mar 4, 2017

tkelman deleted the sf/suitesparse_o2_master branch March 4, 2017 15:06

tkelman reviewed Mar 5, 2017

View reviewed changes

tkelman added a commit that referenced this pull request Mar 19, 2017

Revert "Add patch to set -O2 for suitesparse build (#20165)"

d1cd951

This reverts commit 145eae8. This was actually a bug in the power kernels in OpenBLAS, not a GCC or SuiteSparse problem

tkelman mentioned this pull request Mar 19, 2017

Carry an openblas patch to fix bug in power kernels #21091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add patch to set `-O2` for suitesparse build #20165

Add patch to set `-O2` for suitesparse build #20165

staticfloat commented Jan 20, 2017

tkelman commented Jan 21, 2017

tkelman Jan 21, 2017

nanosoldier commented Jan 21, 2017

tkelman commented Jan 21, 2017

nanosoldier commented Jan 21, 2017

ViralBShah commented Jan 21, 2017

staticfloat commented Feb 11, 2017

tkelman commented Feb 11, 2017

pabloferz commented Feb 11, 2017

tkelman left a comment

tkelman commented Feb 12, 2017

staticfloat commented Feb 24, 2017 •

edited

Loading

staticfloat commented Feb 25, 2017

tkelman commented Feb 25, 2017

staticfloat commented Feb 28, 2017 •

edited

Loading

tkelman commented Feb 28, 2017 •

edited

Loading

staticfloat commented Feb 28, 2017

tkelman commented Feb 28, 2017

staticfloat commented Mar 4, 2017

tkelman commented Mar 4, 2017

ViralBShah commented Mar 4, 2017

ViralBShah commented Mar 4, 2017

tkelman commented Mar 4, 2017

tkelman commented Mar 4, 2017

staticfloat commented Mar 5, 2017

tkelman Mar 5, 2017

staticfloat Mar 5, 2017

tkelman Mar 5, 2017

Add patch to set -O2 for suitesparse build #20165

Add patch to set -O2 for suitesparse build #20165

Conversation

staticfloat commented Jan 20, 2017

tkelman commented Jan 21, 2017

tkelman Jan 21, 2017

Choose a reason for hiding this comment

nanosoldier commented Jan 21, 2017

tkelman commented Jan 21, 2017

nanosoldier commented Jan 21, 2017

ViralBShah commented Jan 21, 2017

staticfloat commented Feb 11, 2017

tkelman commented Feb 11, 2017

pabloferz commented Feb 11, 2017

tkelman left a comment

Choose a reason for hiding this comment

tkelman commented Feb 12, 2017

staticfloat commented Feb 24, 2017 • edited Loading

staticfloat commented Feb 25, 2017

tkelman commented Feb 25, 2017

staticfloat commented Feb 28, 2017 • edited Loading

tkelman commented Feb 28, 2017 • edited Loading

staticfloat commented Feb 28, 2017

tkelman commented Feb 28, 2017

staticfloat commented Mar 4, 2017

tkelman commented Mar 4, 2017

ViralBShah commented Mar 4, 2017

ViralBShah commented Mar 4, 2017

tkelman commented Mar 4, 2017

tkelman commented Mar 4, 2017

staticfloat commented Mar 5, 2017

tkelman Mar 5, 2017

Choose a reason for hiding this comment

staticfloat Mar 5, 2017

Choose a reason for hiding this comment

tkelman Mar 5, 2017

Choose a reason for hiding this comment

Add patch to set `-O2` for suitesparse build #20165

Add patch to set `-O2` for suitesparse build #20165

staticfloat commented Feb 24, 2017 •

edited

Loading

staticfloat commented Feb 28, 2017 •

edited

Loading

tkelman commented Feb 28, 2017 •

edited

Loading