Skip to content

Component Overview

Tyler McDaniel edited this page Feb 2, 2021 · 29 revisions

Component Overview/Notes (as of v0.4.0)

These notes are meant to provide a brief summary of ASGarD’s major code components. The goal in writing these is to document the code, but also to provide a place where math/physics professionals can add their insight to improve the team’s overall understanding of how the code works. So, this is not a closed writeup – please use your expertise to add detail and fix misconceptions.

adapt

overview

“Adaptivity” describes ASGarD’s capability to add/remove element grid points as needed to more efficiently and accurately model a solution. Elements responsible for relatively large coefficients in the solution may be “refined” (have child elements added to the grid). Elements whose coefficients are very small may be “coarsened” (removed from the grid). The thresholds for coarsening and refinement are controlled by an input argument to ASGarD (--thresh).

primary classes/functions

  • class distributed_grid

The distributed_grid is composed of the element table (from elements component) and the distribution_plan (distribution component) that maps each MPI rank in the simulation to the elements for which the rank must compute coefficients. In short, this class bundles information about the elements and how they are distributed, and provides functions to adapt the grid given the solution vector with elements’ coefficient values.

The main routines driving this functionality (distributed_grid::refine and distributed_grid::coarsen) refine/coarsen the element grid by adding/deleting elements from its member elements::table, and then redistributing responsibility for the new element grid across ranks using functions in the distribution component.

notes for improvement

  • It is reasonable to encapsulate the element table with the distribution plan, since the adaptivity functions require altering the grid and rebalancing the problem across ranks, but providing const & getters for these members is not ideal. It would be better to provide a stricter interface allowing outside components to access the functionality they need without exposing (for read) the objects themselves.

  • Currently, there is no limit (beyond a level cap) on refinement/coarsening. This introduces two related problems; refinement to the extent that some rank’s portion of the solution vector spills out of RAM (virtually unlimited growth) and coarsening to the extent that fewer elements remain than ranks. We will need to gracefully handle both situations.

basis

overview

Most computations within ASGarD are performed in wavelet space, but the PDEs we want to solve are defined in real space. So, we require a means of transforming vectors/matrices into wavelet space. Most of this functionality is provided in the transformations component, but the basis component handles the operator required for transformations. Originally, this operator was explicitly formed (as a matrix), with application via matrix multiplication as appropriate to transform values. However, because the transform matrix is sparse with large zero regions and a regular pattern of dense blocks, we now form and store only the dense blocks. Functions are provided in the basis component to apply the transform matrix using these blocks, skipping the unnecessary arithmetic large zero regions would introduce.

primary classes/functions

  • function operator_two_scale

Deprecated. Explicitly forms the transform matrix. Uses generate_multi_wavelets, which is conceptually private but exposed in the interface for testing purposes.

  • class wavelet_transform

Constructor forms the dense blocks of the transform matrix. apply methods provide the result of matrix multiplication using these dense blocks.

notes for improvement

The dense blocks required for the transform matrix are relatively small and their formation is not costly. So, rather than constructing one instance of this class and passing it around for transforms as needed, we could create instances as needed at function scope to declutter APIs requiring wavelet_transform parameters.

Both the deprecated operator_two_scale and generate_multi_wavelets functions are lengthy and obscure in places. The C++ developers are not necessarily familiar with the underlying math performed by these functions; this makes optimization or clarification of the MATLAB code that served as a template for these routines a challenge. There is an opportunity for someone with math insight to clarify the code here.

batch

overview

“Kronmult” (Kronecker product of matrices followed by matrix-vector multiplication , Ax=b, where A is a tensor product) is one of the core computational kernels in ASGarD. We often avoid explicit formation of the Kronecker product by instead multiplying each Kronecker operand (in turn) directly to the vector x conceptually reshaped as a small matrix. High dimensional/high resolution problems may require O(10^6) or more small kronmults to move the simulation forward at each timestep.

A prior implementation of the time stepping code performed these many kronmults by constructing dimension-many lists (batches) of matrix-multiplication operands and invoking vendor-supplied batched_gemm routines. The batch component was designed to support constructing, storing, and passing pointer lists for batched_gemm.

The batch component is no longer used to drive explicit timestepping in time_advance; the kronmult component now serves this purpose. However, some functions within the component are used for implicit timestepping, and the “kronmult as batched gemm” technique is employed for wavelet to realspace transforms in the transformations component.

primary classes/functions

  • class batch

Provides storage for fixed-size list of pointers to same-sized matrices for batched_gemm. assign_entry interface stores pointer to fk::matrix with asserts/size checking; assign_raw stores a pointer without such checks. assign_raw was added after performance measurements showed that creating/destroying many matrix views can become costly (mostly due to reference counting using smart pointers in fk::matrix class).

  • class batch_chain

Given the operands for a kronmult (a list of matrices and a vector), as well as some externally allocated temporary workspace, the batch_chain class internally constructs batch members for batched_gemm. The execute member actually performs the multiplications by calling into the lib_dispatch component. Currently, only used for real space transform.

  • function batched_gemm / batched_gemv

Thin layer over lib_dispatch functions with the same name; lib_dispatch accepts BLAS-style arguments, so these functions translate/setup for those calls.

  • function build_system_matrix

The implicit time advance is driven by solving the system matrix. Given the elements::table describing which elements are active, as well as the 1D coefficient matrices from the PDE class, this function explicitly constructs the system matrix.

notes for improvement

This entire component is likely to be removed in the future. build_system_matrix belongs in the time_advance component or elsewhere; the batch_chain/batch classes are only used for the wavelet to real space transforms in transformations. The current implementation for this transform is a proof of concept that does not scale well and is not distributed across nodes.

boundary_conditions

overview

Some problems require the application (by addition) of boundary condition vectors to the solution vector at every timestep. This component a) generates unscaled boundary condition vectors and b) provides functions that scale the boundary condition vectors for the current time. The mathematical/physical significance of these vectors has not generally been understood on the C++ development side, but the above description demonstrates the most salient features of these vectors for C++ development: boundary condition vectors are tied to the current problem and grid (and so must be regenerated after adapting the grid), and must be scaled for time before addition to the solution vector. These features are shared with source vectors, although the implementations are different.

primary classes/functions

  • function make_unscaled_bc_parts

Given a PDE and grid, returns a pair of “unscaled_bc_parts” representing the left and right unscaled boundary condition vectors. The start and stop element integer parameters are used to generate only part of the vectors; these parameters are used to generate only the part of the vector assigned to a rank in the distribution plan.

  • function generate_scaled_bc

Accepts the unscaled boundary conditions and current time as input, and scales the vectors appropriately.

notes for improvement

  • This component has a strange interface – an unscaled boundary condition vector is represented by a vector of vectors of vectors of fk::vectors. This three level nesting is the result of each partial term in each term in each dimension contributing to the solution. However, we already have that exact structure in the PDE class – the PDE contains dimensions and terms that contain partial terms. Rather than existing in an isolated component, each partial term should be responsible for producing its own unscaled part, and the PDE should be responsible for providing the boundary condition vector. Housing source/boundary condition vector storage and generation in the PDE class makes sense logically, would declutter the API, and provide an opportunity to refactor these functions (which could stand a rewrite). The creation and application of source vectors is somewhat similar, and should also probably be consolidated in the PDE class.

coefficients

overview

Each PDE defines dimension-many coefficient matrices (sometimes called operator matrices”) for each partial term. These partial term matrices are chained together by matrix multiplication to form a single coefficient matrix for each term and dimension. These matrices are square, dense, and 2^level * degree in size.

Information from elements::table is used to draw values from these matrices for either 1) building the system matrix for implicit time advance or 2) performing system matrix/state vector multiplication for explicit time advance (or, in future, implicit time advance driven by iterative solve). Specifically, each element in the table contains a series of coordinates that designate, for each dimension, where that element should index into the coefficient matrices.

Currently, all coefficients are stored from construction onward on the GPU (if ASGarD is built to use GPUs). Only time-independent coefficients are supported at present, but we expect to support time-dependent coefficients as well.

primary classes/functions

  • function generate_all_coefficients

Takes an input/output PDE<P> argument and fills its coefficient matrices by generating all partial term coefficient matrices for each and chaining them together by multiplication.

  • function generate_coefficients

Generates one partial term coefficient matrix. Invoked from generate_all_coefficients.

notes for improvement

A PDE’s coefficients (at least, the time-independent ones) should be constructed when the PDE is built (RAII). There is no great reason this is not already done. Aside from that, matrix coefficient generation and update functions could easily (and probably should) be member functions of the PDE class. This rework could take place when the planned overhaul of the PDE class occurs.

Clone this wiki locally