Skip to content

Commit

Permalink
Add Elmar Peise's ReLAPACK
Browse files Browse the repository at this point in the history
  • Loading branch information
martin-frbg authored Jun 28, 2017
1 parent 482015f commit 9b7b5f7
Show file tree
Hide file tree
Showing 82 changed files with 20,579 additions and 0 deletions.
22 changes: 22 additions & 0 deletions relapack/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The MIT License (MIT)

Copyright (c) 2016 Elmar Peise

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

64 changes: 64 additions & 0 deletions relapack/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
TOPDIR = ..
include $(TOPDIR)/Makefile.system



SRC = $(wildcard src/*.c)
OBJS = $(SRC:%.c=%.o)

TEST_SUITS = \
slauum dlauum clauum zlauum \
spotrf dpotrf cpotrf zpotrf \
spbtrf dpbtrf cpbtrf zpbtrf \
ssygst dsygst chegst zhegst \
ssytrf dsytrf csytrf chetrf zsytrf zhetrf \
sgetrf dgetrf cgetrf zgetrf \
sgbtrf dgbtrf cgbtrf zgbtrf \
strsyl dtrsyl ctrsyl ztrsyl \
stgsyl dtgsyl ctgsyl ztgsyl \
sgemmt dgemmt cgemmt zgemmt
TESTS = $(TEST_SUITS:%=test/%.pass) # dummies
TEST_EXES = $(TEST_SUITS:%=test/%.x)

LINK_TEST = -L$(TOPDIR) -lopenblas -lgfortran -lm

.SECONDARY: $(TEST_EXES)
.PHONY: test

# ReLAPACK compilation

libs: $(OBJS)
@echo "Building ReLAPACK library $(LIBNAME)"
$(AR) -r $(TOPDIR)/$(LIBNAME) $(OBJS)
$(RANLIB) $(TOPDIR)/$(LIBNAME)

%.o: %.c config.h
$(CC) $(CFLAGS) -c $< -o $@


# ReLAPACK testing

test: $(TEST_EXES) $(TESTS)
@echo "passed all tests"

test/%.pass: test/%.x
@echo -n $*:
@./$< > /dev/null && echo " pass" || (echo " FAIL" && ./$<)

test/s%.x: test/x%.c test/util.o $(TOPDIR)/$(LIBNAME) test/config.h test/test.h
$(CC) $(CFLAGS) -DDT_PREFIX=s $< test/util.o -o $@ $(LINK_TEST) $(TOPDIR)/$(LIBNAME) $(LINK_TEST)

test/d%.x: test/x%.c test/util.o $(TOPDIR)/$(LIBNAME) test/config.h test/test.h
$(CC) $(CFLAGS) -DDT_PREFIX=d $< test/util.o -o $@ $(LINK_TEST) $(TOPDIR)/$(LIBNAME) $(LINK_TEST)

test/c%.x: test/x%.c test/util.o $(TOPDIR)/$(LIBNAME) test/config.h test/test.h
$(CC) $(CFLAGS) -DDT_PREFIX=c $< test/util.o -o $@ $(LINK_TEST) $(TOPDIR)/$(LIBNAME) $(LINK_TEST)

test/z%.x: test/x%.c test/util.o $(TOPDIR)/$(LIBNAME) test/config.h test/test.h
$(CC) $(CFLAGS) -DDT_PREFIX=z $< test/util.o -o $@ $(LINK_TEST) $(TOPDIR)/$(LIBNAME) $(LINK_TEST)


# cleaning up

clean:
rm -f $(OBJS) test/util.o test/*.x
68 changes: 68 additions & 0 deletions relapack/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
ReLAPACK
========

[![Build Status](https://travis-ci.org/HPAC/ReLAPACK.svg?branch=master)](https://travis-ci.org/HPAC/ReLAPACK)

[Recursive LAPACK Collection](https://github.com/HPAC/ReLAPACK)

ReLAPACK offers a collection of recursive algorithms for many of LAPACK's
compute kernels. Since it preserves LAPACK's established interfaces, ReLAPACK
integrates effortlessly into existing application codes. ReLAPACK's routines
not only outperform the reference LAPACK but also improve upon the performance
of tuned implementations, such as OpenBLAS and MKL.


Coverage
--------
For a detailed list of covered operations and an overview of operations to which
recursion is not efficiently applicable, see [coverage.md](coverage.md).


Installation
------------
To compile with the default configuration, simply run `make` to create the
library `librelapack.a`.

### Linking with MKL
Note that to link with MKL, you currently need to set the flag
`COMPLEX_FUNCTIONS_AS_ROUTINES` to `1` to avoid problems in `ctrsyl` and
`ztrsyl`. For further configuration options see [config.md](config.md).


### Dependencies
ReLAPACK builds on top of [BLAS](http://www.netlib.org/blas/) and unblocked
kernels from [LAPACK](http://www.netlib.org/lapack/). There are many optimized
and machine specific implementations of these libraries, which are commonly
provided by hardware vendors or available as open source (e.g.,
[OpenBLAS](http://www.openblas.net/)).


Testing
-------
ReLAPACK's test suite compares its routines numerically with LAPACK's
counterparts. To set up the tests (located int `test/`) you need to specify
link flags for BLAS and LAPACK (version 3.5.0 or newer) in `make.inc`; then
`make test` runs the tests. For details on the performed tests, see
[test/README.md](test/README.md).


Examples
--------
Since ReLAPACK replaces parts of LAPACK, any LAPACK example involving the
covered routines applies directly to ReLAPACK. A few separate examples are
given in `examples/`. For details, see [examples/README.md](examples/README.md).


Citing
------
When referencing ReLAPACK, please cite the preprint of the paper
[Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection](http://arxiv.org/abs/1602.06763):

@article{relapack,
author = {Elmar Peise and Paolo Bientinesi},
title = {Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection},
journal = {CoRR},
volume = {abs/1602.06763},
year = {2016},
url = {http://arxiv.org/abs/1602.06763},
}
208 changes: 208 additions & 0 deletions relapack/config.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
#ifndef RELAPACK_CONFIG_H
#define RELAPACK_CONFIG_H

// ReLAPACK configuration file.
// See also config.md


///////////////////////////////
// BLAS/LAPACK obect symbols //
///////////////////////////////

// BLAS routines linked against have a trailing underscore
#define BLAS_UNDERSCORE 1
// LAPACK routines linked against have a trailing underscore
#define LAPACK_UNDERSCORE BLAS_UNDERSCORE

// Complex BLAS/LAPACK routines return their result in the first argument
// This option must be enabled when linking to MKL for ctrsyl and ztrsyl to
// work.
#define COMPLEX_FUNCTIONS_AS_ROUTINES 0
#ifdef F_INTERFACE_INTEL
#define COMPLEX_FUNCTIONS_AS_ROUTINES 1
#endif
#define BLAS_COMPLEX_FUNCTIONS_AS_ROUTINES COMPLEX_FUNCTIONS_AS_ROUTINES
#define LAPACK_BLAS_COMPLEX_FUNCTIONS_AS_ROUTINES COMPLEX_FUNCTIONS_AS_ROUTINES

// The BLAS-like extension xgemmt is provided by an external library.
#define HAVE_XGEMMT 0


////////////////////////////
// Use malloc in ReLAPACK //
////////////////////////////

#define ALLOW_MALLOC 1
// allow malloc in xsygst for improved performance
#define XSYGST_ALLOW_MALLOC ALLOW_MALLOC
// allow malloc in xsytrf if the passed work buffer is too small
#define XSYTRF_ALLOW_MALLOC ALLOW_MALLOC


////////////////////////////////
// LAPACK routine replacement //
////////////////////////////////
// The following macros specify which routines are included in the library under
// LAPACK's symbol names: 1 included, 0 not included

#define INCLUDE_ALL 1

#define INCLUDE_XLAUUM INCLUDE_ALL
#define INCLUDE_SLAUUM INCLUDE_XLAUUM
#define INCLUDE_DLAUUM INCLUDE_XLAUUM
#define INCLUDE_CLAUUM INCLUDE_XLAUUM
#define INCLUDE_ZLAUUM INCLUDE_XLAUUM

#define INCLUDE_XSYGST INCLUDE_ALL
#define INCLUDE_SSYGST INCLUDE_XSYGST
#define INCLUDE_DSYGST INCLUDE_XSYGST
#define INCLUDE_CHEGST INCLUDE_XSYGST
#define INCLUDE_ZHEGST INCLUDE_XSYGST

#define INCLUDE_XTRTRI INCLUDE_ALL
#define INCLUDE_STRTRI INCLUDE_XTRTRI
#define INCLUDE_DTRTRI INCLUDE_XTRTRI
#define INCLUDE_CTRTRI INCLUDE_XTRTRI
#define INCLUDE_ZTRTRI INCLUDE_XTRTRI

#define INCLUDE_XPOTRF INCLUDE_ALL
#define INCLUDE_SPOTRF INCLUDE_XPOTRF
#define INCLUDE_DPOTRF INCLUDE_XPOTRF
#define INCLUDE_CPOTRF INCLUDE_XPOTRF
#define INCLUDE_ZPOTRF INCLUDE_XPOTRF

#define INCLUDE_XPBTRF INCLUDE_ALL
#define INCLUDE_SPBTRF INCLUDE_XPBTRF
#define INCLUDE_DPBTRF INCLUDE_XPBTRF
#define INCLUDE_CPBTRF INCLUDE_XPBTRF
#define INCLUDE_ZPBTRF INCLUDE_XPBTRF

#define INCLUDE_XSYTRF INCLUDE_ALL
#define INCLUDE_SSYTRF INCLUDE_XSYTRF
#define INCLUDE_DSYTRF INCLUDE_XSYTRF
#define INCLUDE_CSYTRF INCLUDE_XSYTRF
#define INCLUDE_CHETRF INCLUDE_XSYTRF
#define INCLUDE_ZSYTRF INCLUDE_XSYTRF
#define INCLUDE_ZHETRF INCLUDE_XSYTRF
#define INCLUDE_SSYTRF_ROOK INCLUDE_SSYTRF
#define INCLUDE_DSYTRF_ROOK INCLUDE_DSYTRF
#define INCLUDE_CSYTRF_ROOK INCLUDE_CSYTRF
#define INCLUDE_CHETRF_ROOK INCLUDE_CHETRF
#define INCLUDE_ZSYTRF_ROOK INCLUDE_ZSYTRF
#define INCLUDE_ZHETRF_ROOK INCLUDE_ZHETRF

#define INCLUDE_XGETRF INCLUDE_ALL
#define INCLUDE_SGETRF INCLUDE_XGETRF
#define INCLUDE_DGETRF INCLUDE_XGETRF
#define INCLUDE_CGETRF INCLUDE_XGETRF
#define INCLUDE_ZGETRF INCLUDE_XGETRF

#define INCLUDE_XGBTRF INCLUDE_ALL
#define INCLUDE_SGBTRF INCLUDE_XGBTRF
#define INCLUDE_DGBTRF INCLUDE_XGBTRF
#define INCLUDE_CGBTRF INCLUDE_XGBTRF
#define INCLUDE_ZGBTRF INCLUDE_XGBTRF

#define INCLUDE_XTRSYL INCLUDE_ALL
#define INCLUDE_STRSYL INCLUDE_XTRSYL
#define INCLUDE_DTRSYL INCLUDE_XTRSYL
#define INCLUDE_CTRSYL INCLUDE_XTRSYL
#define INCLUDE_ZTRSYL INCLUDE_XTRSYL

#define INCLUDE_XTGSYL INCLUDE_ALL
#define INCLUDE_STGSYL INCLUDE_XTGSYL
#define INCLUDE_DTGSYL INCLUDE_XTGSYL
#define INCLUDE_CTGSYL INCLUDE_XTGSYL
#define INCLUDE_ZTGSYL INCLUDE_XTGSYL

#define INCLUDE_XGEMMT 0
#define INCLUDE_SGEMMT INCLUDE_XGEMMT
#define INCLUDE_DGEMMT INCLUDE_XGEMMT
#define INCLUDE_CGEMMT INCLUDE_XGEMMT
#define INCLUDE_ZGEMMT INCLUDE_XGEMMT


/////////////////////
// crossover sizes //
/////////////////////

// default crossover size
#define CROSSOVER 24

// individual crossover sizes
#define CROSSOVER_XLAUUM CROSSOVER
#define CROSSOVER_SLAUUM CROSSOVER_XLAUUM
#define CROSSOVER_DLAUUM CROSSOVER_XLAUUM
#define CROSSOVER_CLAUUM CROSSOVER_XLAUUM
#define CROSSOVER_ZLAUUM CROSSOVER_XLAUUM

#define CROSSOVER_XSYGST CROSSOVER
#define CROSSOVER_SSYGST CROSSOVER_XSYGST
#define CROSSOVER_DSYGST CROSSOVER_XSYGST
#define CROSSOVER_CHEGST CROSSOVER_XSYGST
#define CROSSOVER_ZHEGST CROSSOVER_XSYGST

#define CROSSOVER_XTRTRI CROSSOVER
#define CROSSOVER_STRTRI CROSSOVER_XTRTRI
#define CROSSOVER_DTRTRI CROSSOVER_XTRTRI
#define CROSSOVER_CTRTRI CROSSOVER_XTRTRI
#define CROSSOVER_ZTRTRI CROSSOVER_XTRTRI

#define CROSSOVER_XPOTRF CROSSOVER
#define CROSSOVER_SPOTRF CROSSOVER_XPOTRF
#define CROSSOVER_DPOTRF CROSSOVER_XPOTRF
#define CROSSOVER_CPOTRF CROSSOVER_XPOTRF
#define CROSSOVER_ZPOTRF CROSSOVER_XPOTRF

#define CROSSOVER_XPBTRF CROSSOVER
#define CROSSOVER_SPBTRF CROSSOVER_XPBTRF
#define CROSSOVER_DPBTRF CROSSOVER_XPBTRF
#define CROSSOVER_CPBTRF CROSSOVER_XPBTRF
#define CROSSOVER_ZPBTRF CROSSOVER_XPBTRF

#define CROSSOVER_XSYTRF CROSSOVER
#define CROSSOVER_SSYTRF CROSSOVER_XSYTRF
#define CROSSOVER_DSYTRF CROSSOVER_XSYTRF
#define CROSSOVER_CSYTRF CROSSOVER_XSYTRF
#define CROSSOVER_CHETRF CROSSOVER_XSYTRF
#define CROSSOVER_ZSYTRF CROSSOVER_XSYTRF
#define CROSSOVER_ZHETRF CROSSOVER_XSYTRF
#define CROSSOVER_SSYTRF_ROOK CROSSOVER_SSYTRF
#define CROSSOVER_DSYTRF_ROOK CROSSOVER_DSYTRF
#define CROSSOVER_CSYTRF_ROOK CROSSOVER_CSYTRF
#define CROSSOVER_CHETRF_ROOK CROSSOVER_CHETRF
#define CROSSOVER_ZSYTRF_ROOK CROSSOVER_ZSYTRF
#define CROSSOVER_ZHETRF_ROOK CROSSOVER_ZHETRF

#define CROSSOVER_XGETRF CROSSOVER
#define CROSSOVER_SGETRF CROSSOVER_XGETRF
#define CROSSOVER_DGETRF CROSSOVER_XGETRF
#define CROSSOVER_CGETRF CROSSOVER_XGETRF
#define CROSSOVER_ZGETRF CROSSOVER_XGETRF

#define CROSSOVER_XGBTRF CROSSOVER
#define CROSSOVER_SGBTRF CROSSOVER_XGBTRF
#define CROSSOVER_DGBTRF CROSSOVER_XGBTRF
#define CROSSOVER_CGBTRF CROSSOVER_XGBTRF
#define CROSSOVER_ZGBTRF CROSSOVER_XGBTRF

#define CROSSOVER_XTRSYL CROSSOVER
#define CROSSOVER_STRSYL CROSSOVER_XTRSYL
#define CROSSOVER_DTRSYL CROSSOVER_XTRSYL
#define CROSSOVER_CTRSYL CROSSOVER_XTRSYL
#define CROSSOVER_ZTRSYL CROSSOVER_XTRSYL

#define CROSSOVER_XTGSYL CROSSOVER
#define CROSSOVER_STGSYL CROSSOVER_XTGSYL
#define CROSSOVER_DTGSYL CROSSOVER_XTGSYL
#define CROSSOVER_CTGSYL CROSSOVER_XTGSYL
#define CROSSOVER_ZTGSYL CROSSOVER_XTGSYL

// sytrf helper routine
#define CROSSOVER_XGEMMT CROSSOVER_XSYTRF
#define CROSSOVER_SGEMMT CROSSOVER_XGEMMT
#define CROSSOVER_DGEMMT CROSSOVER_XGEMMT
#define CROSSOVER_CGEMMT CROSSOVER_XGEMMT
#define CROSSOVER_ZGEMMT CROSSOVER_XGEMMT

#endif /* RELAPACK_CONFIG_H */
Loading

0 comments on commit 9b7b5f7

Please sign in to comment.