Skip to content
/ cyme Public
forked from BlueBrain/cyme

Framework to facilitate SIMD programming, without any tedious SIMD intrinsics.

License

Notifications You must be signed in to change notification settings

timocafe/cyme

 
 

Repository files navigation

cyme

Framework to facilitate SIMD programming, without any tedious SIMD intrinsics.

Installation (need boost, doxygen, gcc >= 4.4, the doc provides tutorials, etc ... ). There are a few variables to adjust Makefile generation with cmake (can be combined), see next examples

mkdir b
cd b
cmake .. # basic
cmake .. # your machine supports slurm, specify the environment variable SLURM_ACCOUNT (export SLURM_ACCOUNT="--account=proj16") before !
cmake .. -DSLURM_FOUND=false # your machine supports slurm but you do not care
cmake .. -DCOMMON_LIBRARY_TYPE=STATIC # you prefer static library (boost lib, e.g. BG/Q)
cmake .. -DCYME_POWER_VMX=true # force VMX backend for on BG/Q frontend ambiguity
make doxygen

Compilation:

 -D__CYME_SIMD_VALUE__=sse -msse4.1 + optional -D__FMA__
 -D__CYME_SIMD_VALUE__=avx -msvx+ optional -D__FMA__
 -D__CYME_SIMD_VALUE__=qpx -D__FMA__
 -D__CYME_SIMD_VALUE__=vmx -D__FMA__ -mcpu=power7 -maltivec -mvsx

Know issues:

  • BG/Q: bug Newton-Raphson algorithm (division/sqrt).

DEBUG mode:

  1. the core engine test: core_operator_bracket_torture - Newton-Raphson algo for the division fails:

    it_AoS = block_a.begin(); for(; it_AoS != block_a.end(); ++it_AoS) (*it_AoS)[0] = (((((*it_AoS)[1] /(*it_AoS)[2])/(*it_AoS)[3])/(*it_AoS)[4])/(*it_AoS)[5]);

    it_AoSoA_w = block_b.begin(); it_AoSoA_r = block_b.begin(); for(; it_AoSoA_r != block_b.end(); ++it_AoSoA_w,++it_AoSoA_r) (*it_AoSoA_w)[0] = (((((*it_AoSoA_r)[1]/(*it_AoSoA_r)[2])/(*it_AoSoA_r)[3])/(*it_AoSoA_r)[4])/(*it_AoSoA_r)[5]);

It fails for an unroll of 2 and double, only if the number of successive division is larger than 3. I check the corresponding wrapper __mm__rec() and _mm_nfma(), I do not find any issues. It works for float. Test works if I perform the normal division

  1. the validity test also fail for an unroll of 2

RELEASE:

The serial test unroll 4 fails, again for the division. It is very wierd as the operator /= call the system division and not Newton-Raphson

Release 1.7.0

  • Optimize the backend AVX to AVX2 if the processor supports it
  • Tune log(x) and exp(x) solver using AVX2
  • Complete and Optimize all inequalities <, >, <=, >= for the DAG and SSE/AVX backend, with corresponding tests
  • Add support for bitwise operator for 32 bits integer SSE/AVX only, with corresponding tests
  • Optimize operator ==
  • Clang format and typos

Release 1.6.0

  • More support of inequalities for SSE/AVX backend
  • Support masking for inequalities for SSE/aVX backend
  • Better compatibility with Intel compiler
  • Implementation of power(x,y) (node in the DAG + backend), all backends
  • Minor fixes
  • Swap boost by c++11

Release 1.5.0

  • Merge rvec and wvec into vec
  • fix bugs in the exponential all backends
  • boundary limite for the exp with the new corresponding wrappers for integer manipulation
  • Prepare work for the next release !

Release 1.4.0

  • Google Summer code merge: work of Kai Langen
  • 2^x, 10^x, log_2 and log_10 solver
  • backend power7/power8 (with litle endian support)
  • backend ARM
  • trigonometric function backend SSE/AVX/NEON and VMX (no mic and not BG/Q)
  • fabs
  • printing functions

Release 1.3.0

  • New license lgpl 3.0
  • Open Source Standard
  • Add new elementary function (1/sqrt)
  • Tiny optimization in the tree

Release 1.2.0

  • Add new elementary function (sqrt)
  • Fix bug in log
  • Better DAG construction for operations with scalar operations
  • Better catch of the FMA
  • Simplification of the template expression when I mix float/double, implement unity concept
  • Prepare for open source release
  • Ameliorate CMake during the installation, configuration file is created with the needed define

Release 1.1.0

  • Add new elementary function (pow/log)
  • Ameliorate the tree creation by distinction Read/Write operators - const/ non const
  • better DAG construction for --

Release 1.0.0

  • First release of cyme

About

Framework to facilitate SIMD programming, without any tedious SIMD intrinsics.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 96.8%
  • CMake 3.1%
  • Shell 0.1%