Releases: intel/yask
Version 2.15.09
New features:
- Added ability to use memkind library to allocate some grid vars in "pmem" memory devices (build with
pmem=1
). - Overlapping MPI communications now works when using wave-front tiling and/or temporal block tiling. On by default; turn off with
-no-overlap_comms
. - MPI between ranks on the same node can now use shared memory to avoid buffer copies. Off by default; turn on with
-use_shm
.
Version 2.14.03
Adds mini-block hierarchy level below blocks and above sub-blocks.
Separates unit-of-work for OpenMP threads and cache-block size:
- Blocks, as before, are units-of-work for top-level OpenMP threads. Blocks are evaluated in parallel in each region.
- Mini-blocks are evaluated sequentially within each block and are typically sized for L2 caches.
By default, mini-blocks are the same size as blocks, so most users will see no difference.
It is possible to apply temporal blocking to both blocks and mini-blocks. Using '-bt' will set both by default.
Also removes loop-grouping parameters because they have not shown performance gains and are confusing to users.
Version 2.13.02
Added temporal tiling at the cache-block level and ability to specify temporal conditions on equations; more stats reported.
Temporal tiling in this version works only up to 3D spatial dims. Use version 2.14.03 for 4D and higher.
Version 2.11.00
Added ability to overlap MPI comms with computation. Disable with -no-overlap_comms
.
Version 2.10.02
- Overhaul of Makefiles; build with
make YASK_OUTPUT_DIR=
dirname to specify output location. - Fixed some bugs with scratch-grids + wave-fronts + MPI.
Version 2.9.0
Improvements to decrease compile time and binary size.
Important change that may require your intervention: examples in src/stencils
are now in .cpp
(not .hpp
) files. Running git pull
will likely fail if any existing .hpp
files have been modified.
- If you do not need any of your local changes, just run
git stash
. - If you have modified any example stencils and wish to keep the changes, commit them to your local repository before running
git pull
. - If you have any new stencils, just change their suffixes to
.cpp
to make sure they are added to the YASK compiler.
Version 2.8.3
Provide API operator overloading for all operations in the YASK compiler.
- This resulted in changing some return types of new-node operations in the compiler API to more generic types. This should not affect Python code or any C++ code using auto types.
Updated best-known settings on "Skylake" Xeon Scalable processors for several example stencils.
Version 2.7.3
MPI improvements, esp. with temporal tiling and/or scratch grids.
Added compiler APIs to create full grid-index expressions.
Version 2.6.2
Add compiler APIs for creating sub-domains and manual dependency graphs.
Several fixes for MPI halo exchanges with sub-domains and/or scratch-grids.
Version 2.5.4
Added ability to specify NUMA node for each grid separately via an API.
Several bug fixes for corner cases such as unaligned data when using MPI and temporal wave-fronts.