Skip to content

Commit

Permalink
Merge pull request E4S-Project#93 from mkstoyanov/alexa_2019_update
Browse files Browse the repository at this point in the history
Alexa 2019 update
  • Loading branch information
maherou authored Nov 4, 2019
2 parents 423de9a + 3008282 commit d358476
Show file tree
Hide file tree
Showing 4 changed files with 151 additions and 36 deletions.
2 changes: 1 addition & 1 deletion ECP-ST-CAR.tex
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ \subsection{\mathlibs}
\input{projects/2.3.3-MathLibs/2.3.3.13-CLOVER/2.3.3.13-SLATE}
\newpage
\input{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/2.3.3.14-ALExa}
\input{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/2.3.3.14-ForTrilinos}
% \input{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/2.3.3.14-ForTrilinos} % merged with ALExa
\newpage


Expand Down
185 changes: 150 additions & 35 deletions projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/2.3.3.14-ALExa.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,29 @@ \subsubsection{\stid{3.14} ALExa}
\paragraph{Overview}

The ALExa project ({\sl Accelerated Libraries for Exascale}) focuses on
preparing the DTK and Tasmanian libraries for exascale platforms and
integrating these libraries into ECP applications. These libraries deliver
capabilities identified as needs of ECP applications: (1) the ability to
transfer computed solutions between grids with differing layouts on parallel
accelerated architectures, enabling multiphysics projects to seamlessly
combine results from different computational grids to perform their required
simulations (DTK); and
preparing the ArborX, DTK, Tasmanian, and ForTrilinos libraries for exascale
platforms and integrating these libraries into ECP applications.
These libraries deliver capabilities identified as needs of ECP applications:
%
(2) the ability to construct fast and memory efficient surrogates to large
(1) the ability the perform performance portable spatial searches between
arbitrary sets of distributed geometric objects (ArborX);
%
(2) the ability to transfer computed
solutions between grids with differing layouts on parallel accelerated
architectures, enabling multiphysics projects to seamlessly combine results
from different computational grids to perform their required simulations
(DTK); and
%
(3) the ability to construct fast and memory efficient surrogates to large
scale engineering models with multiple inputs and large number of outputs,
enabling uncertainty quantification (both forward and inverse) as well as
optimization and efficient multi-physics simulations in projects such as
ExaStar (Tasmanian).
ExaStar (Tasmanian); and
%
(4) the ability to automatically interface Fortran-based codes to existing
large and complex C/C++ solftware libraries, such as Trilinos advanced solvers
that can utilize next-generation platforms.


These capabilities are being developed through ongoing interactions with our
ECP application project collaborators to ensure they will satisfy requirements
Expand All @@ -27,6 +37,26 @@ \subsubsection{\stid{3.14} ALExa}
the xSDK4ECP project.


{\bf ArborX}

{\it Purpose:} ArborX is an open-source library designed to provide
performance portable algorithms for geometric search.

{\it Significance:} General geometric search capabilities are needed in a wide
variety of applications including the generation of neighbor lists in
particle-based applications (e.g. molecular dynamics or general N-body
dynamics simulations) and mesh-mesh interactions including contact in
computational mechanics and solution transfer in multiphysics simulations.

{\it Performance portable search capabilities:} Shared memory and GPU
implementations of spatial tree construction; shared memory and GPU
implementations of various spatial tree queries; MPI front-end for
coordinating distributed spatial searches between sets of geometric objects
with different decompositions; communication plan generation based on spatial
search results.

{\it URL:} https://github.com/arborx/ArborX

{\bf DTK} (Data Transfer Kit)

{\it Purpose:} Transfers computed solutions between grids with differing
Expand All @@ -43,13 +73,6 @@ \subsubsection{\stid{3.14} ALExa}
common applications include conjugate heat transfer, fluid structure
interaction, and mesh deformation.

{\it Performance portable search capabilities:} shared memory and GPU
implementations of spatial tree construction; shared memory and GPU
implementations of various spatial tree queries; MPI front-end for
coordinating distributed spatial searches between sets of geometric objects
with different decompositions; communication plan generation based on spatial
search results.

{\it URL:} https://github.com/ORNL-CEES/DataTransferKit


Expand Down Expand Up @@ -79,18 +102,49 @@ \subsubsection{\stid{3.14} ALExa}

{\it URL:} http://tasmanian.ornl.gov


{\bf ForTrilinos} (Fortran for Trilinos)

{\it Purpose:} ForTrilinos provides a seamless pathway for large and complex
Fortran-based codes to access Trilinos. In addition, the developed SWIG/Fortran
allows automatic generation of the interfaces to any C/C++ library for seamless
interaction with Fortran application.

{\it Significance:} The Exascale Computing Project (ECP) requires the successful
transformation and porting of many Fortran application codes in preparation for
ECP platforms. A significant number of these codes rely upon the scalable
solution of linear and nonlinear equations. The Trilinos Project contains
a large and growing collection of solver capabilities that can utilize
next-generation platforms, in particular scalable multicore, manycore,
accelerator and heterogeneous systems. Trilinos is primarily written in C++,
including its user interfaces. While C++ is advantageous for gaining access to
the latest programming environments, it limits Trilinos usage via Fortran.

{\it SWIG capabilities:} The SWIG/Fortran interface generator, based on the
original SWIG, creates the object-oriented Fotran wrapper code that users can
access directly. This language translation will occur in both directions,
allowing, for example, an inversion of control functionality in ForTrilinos
that enables custom extensions of the Trilinos solvers that are Fortran-based.
Once the ForTrilinos project is complete, a functional, extensible suite of
capabilities to access Trilinos on next-generation computing systems will be
provided.

{\it URL:} https://github.com/trilinos/ForTrilinos

\paragraph{Key Challenges}

\indent

{\bf ArborX:} Search procedures to locate neighboring points, mesh cells, or
other geometric objects require tree search methods difficult to optimize on
modern accelerated architectures due to vector lane or thread divergence.

{\bf DTK:} General data transfer between grids of unrelated applications
requires many-to-many communication which is increasingly challenging as
communication to computation ratios are decreasing on successive HPC systems.
Search procedures to locate neighboring points and mesh cells require tree
search methods difficult to optimize on modern accelerated architectures due
to vector lane or thread divergence. Maintaining high accuracy for the
transfer requires careful attention to the mathematical properties of the
interpolation methods and is highly application-specific.
Maintaining high accuracy for the transfer requires careful attention to the
mathematical properties of the interpolation methods and is highly
application-specific.

{\bf Tasmanian:} Complex models usually have significant variability in
execution time for different model inputs, which leads to massive down-time
Expand All @@ -99,13 +153,31 @@ \subsubsection{\stid{3.14} ALExa}
analysis (or multi-physics simulations) requires a massive number of basis
evaluations and many sparse and dense linear operations.

{\bf ForTrilinos:} Developing the interfaces to the C++ libraries that provide
access to cutting-edge research, such as Trilinos, is of significant benefit
to Fortran community. However, such interfaces must be well documented,
sustainable and extensible, which would require significant amount of resources
and investment. This is further complicated by the requirements to support
heterogeneous platforms (e.g., GPUs) and inversion-of-control functionality.
The manual approach to such interfaces has been shown to be unsustainable as it
requires interface developers to have in-depth expertise in multiple languages
and the peculiarities in their interaction on top of the time commitment to
update the interfaces with changes in the library.

\paragraph{Solution Strategy}

\nobreak


\indent

{\bf ArborX:} ArborX builds on a Kokkos+MPI programming model to deploy to all
DOE HPC architectures. Extensive performance engineering has yielded
implementations that are both as performant in serial as state-of-the-art
libraries while also expanding on the capability provided by other libraries
by demonstrating thread scalability on both GPU and multi-core CPU
architectures..

{\bf DTK:} State-of-the-art, mathematically rigorous methods are used in DTK
to preserve accuracy of interpolated solutions. Algorithms are implemented in
a C++ code base with extensive unit testing on multiple platforms. Trilinos
Expand All @@ -119,28 +191,50 @@ \subsubsection{\stid{3.14} ALExa}
SLATE/MAGMA capabilities to ensure performance portability across relevant
platforms.

{\bf ForTrilinos:} Develop a Fortran module for the Simplified Wrapper
Interface Generator (SWIG) utility~\cite{beazley1996swig}. SWIG is used to parse
C++ code and generate wrapper code, and was already used for this purpose for
several dozen target languages, most notably Python. The project took the
approach of adding a Fortran 2003 wrapper generator for SWIG in order to
fulfill many of the critical feature requirements for ForTrilinos.
The developed SWIG/Fortran functionality allowed to proceed with automatic
generation of Fortran interfaces to selected Trilinos libraries. The work is
conducted in phases, with each phase increasing the number of wrapped of
Trilinos packages.

%----------------------------------------

\paragraph{Recent Progress}

\indent

{\bf DTK:} Extensive optimization work has yielded significant performance
improvements on accelerated and heterogeneous architectures. Work with partner
application ExaAM (WBS 2.2.1.05) created a preliminary multiphysics driver
capability for additive manufacturing simulations.
{\bf ArborX:} Extensive optimization work has yielded significant performance
improvements on accelerated and heterogeneous architectures. Recent results on
Summit show effective use of the Power9 hyperthreading capability as well as
excellent performance on the V100 GPU.

\begin{figure}[htb]
\centering
\includegraphics[width=3.0in]{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/dtk-gpu}
\caption{\label{fig:dtk-gpu}DTK search performance relative to Boost with Intel Xeon E5-2698 and Nvidia P100. 10M points randomly distributed in a unit cube, 1M queries. The time is in seconds. Speedup in bold.}
\centering \includegraphics[width=3.0in]{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/arborx_summit.png} \caption{\label{fig:arborx-gpu}
ArborX search performance on a single Summit node. One V100 GPU gives
similar performance as the entire Power9 CPU performance on a node.}
\end{figure}

{\bf DTK:} Work with partner application ExaAM (WBS 2.2.1.05) created a
preliminary multiphysics driver capability for additive manufacturing
simulations using a parallel-in-time coupling strategy.

\begin{figure}[htb]
\centering \includegraphics[width=3.0in]{projects/2.3.3-MathLibs/2.3.3.14-ALExa-ForTrilinos/dtk_exaam_pit.png} \caption{\label{fig:dtk-exaam-pit}
New ExaAM parallel-in-time coupling. Each arrow represents a DTK
transfer between different grids at different time points. Thousands
of DTK maps will be used to interpolate and communicate information
between steps. Image courtesy of Matt Bement (ExaAM Team) }
\end{figure}

{\bf Tasmanian:} The infrastructure of Tasmanian has been upgraded to support
the broader ECP focus of the work. GPU acceleration of sparse grid surrogates
has been implemented. Tasmanian recently enabled the ExaStar project to
reduce the size of a large-memory table of neutrino opacities by 100X while
reduce the size of a large-memory table of neutrino opacities by 10X while
still preserving accuracy.

\begin{figure}[htb]
Expand All @@ -149,17 +243,38 @@ \subsubsection{\stid{3.14} ALExa}
\caption{\label{fig:tasmanian-gpu}Tasmanian approximation (right) of neutrino capacities (left).}
\end{figure}

{\bf ForTrilinos:} ForTrilinos was updated to support nonlinear solvers. The
list of backends supported by ForTrilinos was expanded to include serial,
OpenMP and CUDA. A thorough unit testing suite was added.

%----------------------------------------

\paragraph{Next Steps}

\indent

{\bf DTK:} DTK search and communication capabilities will deployed in a new,
lightweight library, ArborX, to provide these ECP investments to a broader
user base.

{\bf Tasmanian:} Work will continue with the development of the asynchronous
construction methods that exploit the native sparse grids DAG hierarchy.
{\bf ArborX:} Continue performance engineering campaign and extend to
distributed search capabilities and communication and deploy in a variety of
applications.

{\bf DTK:} Support parallel-in-time coupling in the ExaAM project for
simulations of metal powder bed additive manufacturing.

{\bf Tasmanian:} Work will continue with the development of mixed precision
algorithms for fast surrogate evaluations and integratinv the capability within
the ExaStar project.

{\bf ForTrilinos:} the next efforts will include
\begin{enumerate}
\item \textbf{Integrate developed capabilities into applications:} E3SM-MMF
is an Earth system model development and simulation project. It relies on
Trilinos for its implicit capabilities. The ForTrilinos project will
integrate the developed nonlinear solver with IoC into E3SM-MMF to provide
path forward to heterogeneous stack.
\item \textbf{Integrate ForTrilinos with upstream Trilinos project:}
ForTrilinos will develop a support model to interact with upstream Trilinos
developers. This will allow a faster integration of new capabilities, and
more robust testing infrastructure.
\end{enumerate}

%----------------------------------------
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d358476

Please sign in to comment.