Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow the use of openblas #475

Closed
mikestillman opened this issue Jun 2, 2016 · 15 comments · Fixed by #3461
Closed

allow the use of openblas #475

mikestillman opened this issue Jun 2, 2016 · 15 comments · Fixed by #3461
Assignees
Labels
build issue platform specific issues involving compiling M2, generating examples, or running tests enhancement Linear Algebra

Comments

@mikestillman
Copy link
Member

Word on the street is that this blas is far better than the default blas, on Ubuntu at least, and probably other linuxes too. After compiling it, I think that the new library just needs to be added on the link library list. It was recommended to me that we compile it from source, to better use the facilities on each target machine. However, we should probably also allow the use of the ubuntu openblas package for building distributions.

It would be nice to allow the use of openblas, and if it is actually far superior, make it the default. On mac's though, we currently use the Accelerate framework, which seems to be very good. Even there though, it might be good to compare them.

There are several reasons for this request, but my main interest right at the moment is to improve the speed of rank computations in ffpack (which is used in the fast non-minimal free resolution code). Currently, if I compare across machines, I find that Ubuntu is perhaps 5-10 times slower at such computations than on my mac laptop, which is a year or two old.

I will add in benchmarks to check this, and so we can see what any actual improvement is.

@DanGrayson DanGrayson self-assigned this Jun 2, 2016
@DanGrayson DanGrayson added this to the version 2.0 milestone Jun 2, 2016
@DanGrayson
Copy link
Member

It might be good to include a benchmark written in fortran that could be run
immediately after compiling openblas or another blas. Give it to me and I'll put it somewhere
appropriate.

@DanGrayson
Copy link
Member

Here's some info on a possibly useful debian/ubuntu package for testing blas:

Package: libblas-test
Priority: optional
Section: universe/libs
Installed-Size: 1882
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Debian Science Team <[email protected]>
Architecture: amd64
Source: lapack
Version: 3.6.0-2ubuntu2
Depends: libblas3 | libblas.so.3, libc6 (>= 2.4), libgfortran3 (>= 4.6)
Filename: pool/universe/l/lapack/libblas-test_3.6.0-2ubuntu2_amd64.deb
Size: 303704
MD5sum: 45894116ac90759bd2c8fbb965aeaa31
SHA1: a58cbfca37a5885ba8a519d3ad65a114ff2c59f2
SHA256: 805804aa6844249da5acbc508273d52a43f25db55fcb0cfec2e6a5c027351a8e
Description-en: Basic Linear Algebra Subroutines 3, testing programs
 BLAS (Basic Linear Algebra Subroutines) is a set of efficient
 routines for most of the basic vector and matrix operations.
 They are widely used as the basis for other high quality linear
 algebra software, for example lapack and linpack.  This
 implementation is the Fortran 77 reference implementation found
 at netlib.
 .
 This package contains a set of programs which test the integrity of an
 installed blas-compatible shared library. These programs may therefore be used
 to test the libraries provided by the blas package as well as those provided
 by the libatlas3-base and libopenblas-base packages. The programs are
 dynamically linked -- one can explicitly select a library to test by setting
 the libblas.so.3 alternative, or by using the LD_LIBRARY_PATH or LD_PRELOAD
 environment variables. Likewise, one can display the library selected using
 the ldd program in an identical environment.
Description-md5: 7e697a3bd80892afd85df0f1b0596433
Homepage: http://www.netlib.org/lapack/
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu

@dimpase
Copy link
Contributor

dimpase commented Jun 3, 2016

Sage uses Atlas, which is pain to install (automatic tuning is very slow), but is reasonably fast on Linux and OSX. It would be interesting to compare how it compares with openblas, once M2 on Sage is working...

@mikestillman
Copy link
Member Author

Here is an example, using M2, that hopefully indicates that using a better blas would have a significant effect on these computations (note: for one example of free resolutions, a similar rank computation (on SL) took 4 days, so improving by a factor would be excellent!)

restart
debug Core
kk = ZZp(32003, Strategy=>"Ffpack")
kk1 = ZZp(32003, Strategy=>"Flint")
elapsedTime M = random(ZZ^4000, ZZ^4000, Height=>32000, Density=>.2);
time M0 = mutableMatrix promote(M,kk);
time M1 = mutableMatrix promote(M,kk1);
time rank M0  -- this line uses the blas heavily
time rank M1  -- this line doesn't use the blas as far as I know.

elapsedTime M = random(ZZ^6000, ZZ^6000, Height=>32000, Density=>.2);
time M2 = mutableMatrix promote(M,kk);
time M3 = mutableMatrix promote(M,kk1);
time rank M2  -- this line uses the blas heavily
time rank M3  -- this line doesn't use the blas as far as I know.

-- the times for the 4 rank commands
-- MacBookPro, running 10.10.5, 16 GB ram, Mid 2014 Retina MacBookPro.
time rank M0 -- 2.27 sec
time rank M1 -- 7.82 sec
time rank M2 -- 7.01 sec
time rank M3 -- 40.99 sec

-- On an SL machine, which seems to be about the same speed (perhaps a bit faster) than
-- my mac:
time rank M0 -- 16.72 sec
time rank M1 -- 7.85 sec
time rank M2 -- 52.32 sec
time rank M3 -- 23.9 sec

-- the blas code appears to be running somewhat more than 7 times slower on
-- SL than on the mac.  I think ubuntu is similar to SL in speed here.
-- perhaps openblas can improve this?

@mikestillman
Copy link
Member Author

By the way, about my code in the previous post: sorry, I chose a time inefficient manner to create these matrices.

@mikestillman
Copy link
Member Author

@DanGrayson this one is important :)

@dimpase
Copy link
Contributor

dimpase commented Jun 6, 2017

by the way, Sage has switched to openblas.

@DanGrayson DanGrayson modified the milestones: version 1.11, version 1.12 Feb 19, 2018
@DanGrayson DanGrayson modified the milestones: version 1.13, version 1.14 Jan 1, 2019
@DanGrayson DanGrayson modified the milestones: version 1.14, version 1.15 May 23, 2019
@DanGrayson DanGrayson modified the milestones: version 1.15, version 1.16 Dec 2, 2019
@mahrud mahrud self-assigned this Apr 19, 2020
@mahrud
Copy link
Member

mahrud commented May 30, 2020

Generic lapack and blas don't take advantage of cpu cores and CPU vectorization (e.g. SSE2, which is ubiquitous now). Here's some information about benchmarking in numpy: https://markus-beuckelmann.de/blog/boosting-numpy-blas.html
On that note, Eigen's API is different, so we would have to change our code, but it seems to be a great contender: http://eigen.tuxfamily.org/index.php?title=Benchmark

@dimpase
Copy link
Contributor

dimpase commented May 30, 2020

By the way, Sage switched to openblas years ago.

@mahrud
Copy link
Member

mahrud commented May 30, 2020

With the CMake build, we have, too! Hopefully the autotools build is next.

@mahrud mahrud added the build issue platform specific issues involving compiling M2, generating examples, or running tests label Jun 18, 2020
@mahrud
Copy link
Member

mahrud commented Jul 2, 2020

Here's a quick benchmark. First I had to comment out everything after line 320 of quarantine/lapack.m2 since an engine routine is failing for matrices with zero rows or columns.

Using OpenBLAS:

[mahrud@noether build]$ ctest -R lapack --repeat-until-fail 10
Test project /home/mahrud/Projects/M2/M2/M2/BUILD/build
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.72 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.78 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.77 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.89 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.84 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.93 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.92 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    2.01 sec
    Start 3243: quarantine/lapack.m2
    Test #3243: quarantine/lapack.m2 .............   Passed    1.99 sec
    Start 3243: quarantine/lapack.m2
1/1 Test #3243: quarantine/lapack.m2 .............   Passed    2.09 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  19.18 sec

Compared with LAPACK/BLAS:

[mahrud@noether blas]$ ctest -R lapack --repeat-until-fail 10
Test project /home/mahrud/Projects/M2/M2/M2/BUILD/blas
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.04 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.12 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.17 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.20 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.16 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    3.13 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    2.93 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    2.96 sec
    Start 522: quarantine/lapack.m2
    Test #522: quarantine/lapack.m2 .............   Passed    2.97 sec
    Start 522: quarantine/lapack.m2
1/1 Test #522: quarantine/lapack.m2 .............   Passed    2.94 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  30.72 sec

That's a 37% improvement.

@mahrud
Copy link
Member

mahrud commented Jul 2, 2020

The effect on gb tests is even more significant with 58% improvement.

OpenBLAS:

[mahrud@noether build]$ ctest -R normal/gb
Test project /home/mahrud/Projects/M2/M2/M2/BUILD/build
      Start 2974: normal/gb-matrix-lift.m2
 1/14 Test #2974: normal/gb-matrix-lift.m2 .........   Passed    0.51 sec
      Start 2975: normal/gb-skew-ZZ.m2
 2/14 Test #2975: normal/gb-skew-ZZ.m2 .............   Passed    0.53 sec
      Start 2976: normal/gb-snapp-bug.m2
 3/14 Test #2976: normal/gb-snapp-bug.m2 ...........   Passed    0.50 sec
      Start 2977: normal/gb2.m2
 4/14 Test #2977: normal/gb2.m2 ....................   Passed    0.51 sec
      Start 2978: normal/gbQQbug.m2
 5/14 Test #2978: normal/gbQQbug.m2 ................   Passed    0.57 sec
      Start 2979: normal/gbZZ-2.m2
 6/14 Test #2979: normal/gbZZ-2.m2 .................   Passed    0.52 sec
      Start 2980: normal/gbZZ-mingens.m2
 7/14 Test #2980: normal/gbZZ-mingens.m2 ...........   Passed    0.56 sec
      Start 2981: normal/gbZZ13.m2
 8/14 Test #2981: normal/gbZZ13.m2 .................   Passed    0.78 sec
      Start 2982: normal/gbZZautoreduction.m2
 9/14 Test #2982: normal/gbZZautoreduction.m2 ......   Passed    0.52 sec
      Start 2983: normal/gbZZbug.m2
10/14 Test #2983: normal/gbZZbug.m2 ................   Passed    1.33 sec
      Start 2984: normal/gbZZbug2-a.m2
11/14 Test #2984: normal/gbZZbug2-a.m2 .............   Passed    0.61 sec
      Start 2985: normal/gbZZbug2.m2
12/14 Test #2985: normal/gbZZbug2.m2 ...............   Passed    0.62 sec
      Start 2986: normal/gbinhom.m2
13/14 Test #2986: normal/gbinhom.m2 ................   Passed    0.52 sec
      Start 2987: normal/gblimits.m2
14/14 Test #2987: normal/gblimits.m2 ...............   Passed    0.54 sec

100% tests passed, 0 tests failed out of 14

Total Test time (real) =   8.84 sec

BLAS/LAPACK:

[mahrud@noether blas]$ ctest -R normal/gb
Test project /home/mahrud/Projects/M2/M2/M2/BUILD/blas
      Start 3007: normal/gb-matrix-lift.m2
 1/14 Test #3007: normal/gb-matrix-lift.m2 .........   Passed    1.33 sec
      Start 3008: normal/gb-skew-ZZ.m2
 2/14 Test #3008: normal/gb-skew-ZZ.m2 .............   Passed    1.32 sec
      Start 3009: normal/gb-snapp-bug.m2
 3/14 Test #3009: normal/gb-snapp-bug.m2 ...........   Passed    1.35 sec
      Start 3010: normal/gb2.m2
 4/14 Test #3010: normal/gb2.m2 ....................   Passed    1.33 sec
      Start 3011: normal/gbQQbug.m2
 5/14 Test #3011: normal/gbQQbug.m2 ................   Passed    1.38 sec
      Start 3012: normal/gbZZ-2.m2
 6/14 Test #3012: normal/gbZZ-2.m2 .................   Passed    1.35 sec
      Start 3013: normal/gbZZ-mingens.m2
 7/14 Test #3013: normal/gbZZ-mingens.m2 ...........   Passed    1.42 sec
      Start 3014: normal/gbZZ13.m2
 8/14 Test #3014: normal/gbZZ13.m2 .................   Passed    1.73 sec
      Start 3015: normal/gbZZautoreduction.m2
 9/14 Test #3015: normal/gbZZautoreduction.m2 ......   Passed    1.41 sec
      Start 3016: normal/gbZZbug.m2
10/14 Test #3016: normal/gbZZbug.m2 ................   Passed    2.25 sec
      Start 3017: normal/gbZZbug2-a.m2
11/14 Test #3017: normal/gbZZbug2-a.m2 .............   Passed    1.59 sec
      Start 3018: normal/gbZZbug2.m2
12/14 Test #3018: normal/gbZZbug2.m2 ...............   Passed    1.55 sec
      Start 3019: normal/gbinhom.m2
13/14 Test #3019: normal/gbinhom.m2 ................   Passed    1.45 sec
      Start 3020: normal/gblimits.m2
14/14 Test #3020: normal/gblimits.m2 ...............   Passed    1.37 sec

100% tests passed, 0 tests failed out of 14

Total Test time (real) =  21.09 sec

@DanGrayson DanGrayson modified the milestones: version 1.16, version 1.17 Jul 3, 2020
@mahrud mahrud removed their assignment Aug 6, 2020
@DanGrayson DanGrayson removed their assignment Sep 28, 2020
@DanGrayson
Copy link
Member

What remains is to switch the autotools build over to openblas.

@dimpase
Copy link
Contributor

dimpase commented Sep 28, 2020

If you don't want to build your own openblas,
openblas comes with openblas.pc, i.e. you can get info about it via pkg-config, or rather, PKG_CHECK_MODULES etc.
Here is what we do in Sage https://github.com/sagemath/sage/blob/develop/build/pkgs/openblas/spkg-configure.m4

Admittedly, complicated - the problem is that different Linux distros package openblas differently, sometimes you need a separate libcblas, etc (but please ask questions about it, I wrote an initial version of that monster after all :-))

@d-torrance
Copy link
Member

Fixed in #3461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build issue platform specific issues involving compiling M2, generating examples, or running tests enhancement Linear Algebra
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants