Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the NO_WARMUP and NO_AFFINITY options in Makefile.rule #439

Closed
jgpallero opened this issue Aug 26, 2014 · 4 comments
Closed

About the NO_WARMUP and NO_AFFINITY options in Makefile.rule #439

jgpallero opened this issue Aug 26, 2014 · 4 comments

Comments

@jgpallero
Copy link

Hello:

Inspecting the 0.2.11 and the old 0.2.8 OpenBLAS versions I've noted that the NO_WARMUP and NO_AFFINITY options in Makefile.rule have not the same values. In OpenBLAS 0.2.8 both are commented in the Makefile.rule:

NO_WARMUP = 1

NO_AFFINITY = 1

but in the new 0.2.11 they are activated as:
NO_WARMUP = 1
NO_AFFINITY = 1

I remember some comments in the mailing list or here in the issues section related to differences in performance between versions and I remember also someone said that it could updated using these variables.

Why the variables are uncommented in the Makefile.rule in recent OpenBLAS versions?

@wernsaar
Copy link
Contributor

@jgpallero
there are no simple answers for your questions.
AFFINITY may be usefull for older NUMA machines, but will seriously slow down newer
machines with BULLDOZER, PILEDRIVER or HASWELL processors and big NUMA machines.
Also R has problems, if AFFINITY is used in OpenBLAS. So the default is now not to use
affinity. On linux you can set the affinity using programs like taskset or numactl.

WARMUP has no or a negative effect on newer machines with processors for AMD.
On machines with SANDYBRIDGE processor, WARMUP may increase the performance a little,
but test it on your platform.

The performance difference is the result of very comprehensive testing, that showed that some
blas routines computed wrong results and we had to fallback to slower but correct routines.
I'am working hard to write benchmark programs, compare the results against MKL,
ACML and ATLAS and then rewrite such blas functions. OpenBLAS v0.2.12 will be faster
than v0.2.11 and v0.2.13 faster than v0.2.12.

@wernsaar
Copy link
Contributor

closed this issue

@jgpallero
Copy link
Author

@wernsaar Thank you very much for your answer. About the benchmarks, last year I wrote a small program to do some tests using (S|D)GEMM, (S|D)TRMM and (S|D)SYRK from BLAS and (S|D)GETRF and (S|D)GEQRF from Lapack. Some plots are created using matplotlib. The code is here: https://bitbucket.org/jgpallero/pblb

@wernsaar
Copy link
Contributor

On 27.08.2014 11:26, jgpallero wrote:

@wernsaar Thank you very much for your response. About the benchmarks, last year I wrote a small program to do some tests using (S|D)GEMM, (S|D)TRMM and (S|D)SYRK from BLAS and (S|D)GETRF and (S|D)GEQRF from Lapack. Some plots are created using matplotlib. The code is here: https://bitbucket.org/jgpallero/pblb


Reply to this email directly or view it on GitHub:
#439 (comment)
HI,

Thank you. I will look at the codes.
You will find my codes in the directory benchmark and I publish my plots
at http://sourceforge.net/p/slurm-roll/code/HEAD/tree/branches/benchmark/

Best regards
Werner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants