Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit Solver Interface Updates #1230

Closed
5 of 7 tasks
dennisYatunin opened this issue May 6, 2023 · 15 comments
Closed
5 of 7 tasks

Implicit Solver Interface Updates #1230

dennisYatunin opened this issue May 6, 2023 · 15 comments
Assignees
Labels
SDI Software Development Issue

Comments

@dennisYatunin
Copy link
Member

dennisYatunin commented May 6, 2023

Purpose

The interface we use to specify matrices and solve linear systems of equations for the implicit solver needs to be refactored and extended. This interface, which is currently spread across several files in ClimaCore and ClimaAtmos (ClimaCore.jl/src/Operators/stencilcoefs.jl, ClimaCore.jl/src/Operators/pointwisestencil.jl, ClimaCore.jl/src/Operators/operator2stencil.jl, ClimaAtmos.jl/src/tendencies/implicit/wfact.jl, and ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl) is unnecessarily convoluted and involves a large number of hardcoded assumptions, which makes it extremely challenging for new users to experiment with the implicit solver and to add new implicit tendencies. In particular, the next stage of EDMF development will involve adding many implicit tendencies to the atmosphere model, and our approximation of the total implicit tendency's Jacobian matrix will end up with a highly non-trivial sparsity pattern. In order for more than a single developer to be able to understand and implement the implicit solver for the dycore+EDMF, we will first need to make the changes outlined in this SDI. In addition, the land modeling team has been experimenting with their own implicit solver, and these changes will speed up their development and add the new functionality they require.

There is currently a draft PR that contains a detailed sketch of all the proposed changes: #1190

Cost/benefits/risks

The cost/risk is development time. The benefit will be a significantly reduced complexity of implicit solver implementations, both in ClimaAtmos and in ClimaLSM. There will be a simple, well-documented interface to all of the numerical algorithms required by implicit solvers, which will allow user-facing code to be relatively short and easily extensible.

Producers

@dennisYatunin

Components

  • Simplify the interface for band matrix fields and matrix multiplication using fields.
  • Add support for specifying derivatives of two-argument operators and for specifying derivatives of operator boundary conditions.
  • Add a user-friendly representation of block matrices.
  • Add an efficient linear solver for block matrices.
  • Add thorough documentation and unit tests.

Inputs

BandMatrixRow and MultiplyColumnwiseBandMatrixField

The type we currently use to represent an element of a band matrix field is called StencilCoefs, and it is extremely confusing and poorly designed. Upon refactoring, this will become BandMatrixRow, which will have the following improvements:

  • Simple constructors, as well as aliases for common matrix types; e.g., DiagonalMatrixRow(1) and TridiagonalMatrixRow(1, 2, 3)
  • A well-defined set of allowed operations—accessing individual row entries, taking linear combinations of rows, and checking for equality of rows
  • Automatic conversion/promotion for subtypes of BandMatrixRow and LinearAlgebra.UniformScaling, so that different types of matrices can be mixed together; e.g., LinearAlgebra.I / 2 - TridiagonalMatrixRow(1, 2, 3) + 2 * PentadiagonalMatrixRow(1, 2, 3, 4, 5) == PentadiagonalMatrixRow(2, 3, 4.5, 5, 10)

These improvements will also be reflected in fields of BandMatrixRows, which will be aliased as ColumnwiseBandMatrixFields for dispatch. (The alias name is meant to indicate that every column has its own set of BandMatrixRows, which, when taken together, can be interpreted as a band matrix.) So, for example, users will be able to write (@. LinearAlgebra.I / 2 - tridiagonal_matrix_field + 2 * pentadiagonal_matrix_field) == (@. PentadiagonalMatrixRow(field1, field2, field3, field4, field5)).

The operators we currently use for matrix-matrix and matrix-vector multiplication are Operators.ComposeStencils() and Operators.ApplyStencil(), respectively. Again, these are confusing and implemented in a rather roundabout way. Upon refactoring, these will both become , which is an alias for Operators.MultiplyColumnwiseBandMatrixField(). This will allow users to write something like @. matrix_field1 ⋅ matrix_field2 ⋅ field, instead of needing to write @. apply(compose(matrix_field1, matrix_field2), field). In addition, the amount of code used to implement matrix multiplication can be reduced roughly by a factor of 3 (as shown in the sketch), and this simplified code will be easier to update for GPUs in the near future.

FiniteDifferenceOperatorTermsMatrix

When a finite difference operator is applied to a field (@. op(field)), the result is equivalent to multiplying some matrix by that field (@. matrix_field ⋅ field). The operator we currently use to generate this matrix is Operators.Operator2Stencil(op); in order to clarify what this operator is doing, it will be renamed to Operators.FiniteDifferenceOperatorTermsMatrix(op). If op_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(op) and ones_field = ones(axes(field)), users will be able to confirm that (@. op(field)) == (@. op_matrix(ones_field) ⋅ field). As a quirk of our implementation, it is also the case that (@. op_matrix(ones_field) ⋅ field) == (@. op_matrix(field) ⋅ ones_field), which allows us to somewhat simplify expressions involving products with operator matrices.

Aside from the name change, there are two new features that we need to add to FiniteDifferenceOperatorTermsMatrix. First, for EDMF development, we need to add support for multi-argument operators, so that FiniteDifferenceOperatorTermsMatrix(op) will always generate the matrix that corresponds to the last argument of op. For example, given a two-argument operator op (such as WeightedInterpolateF2C or Upwind3rdOrderBiasedProductC2F), users will be able to define op_matrix and confirm that (@. op(field1, field2)) == (@. op_matrix(field1, ones_field2) ⋅ field2) == (@. op_matrix(field1, field2) ⋅ ones_field2).

Second, for land model development, we need to add support for specifying the corner elements of matrices by adding special boundary conditions for FiniteDifferenceOperatorTermsMatrix. The simple expression presented earlier, (@. op(field)) == (@. op_matrix(ones_field) ⋅ field), is only true when op has trivial boundary conditions; i.e., when op is a center-to-face operator with boundary conditions that cause it to return 0 on the top and bottom faces, or when op is a face-to-center operator without any boundary conditions (which means that it uses the values at the top and bottom faces as-is). In general, operators are affine transformations at the boundaries, not linear transformations. This means that, for every op and field, there is some boundary_field that is zero everywhere except at the boundaries, and (@. op(field)) == (@. op_matrix(ones_field) ⋅ field + boundary_field). However, we only use matrices to represent derivatives of operators with respect to their inputs, and, up until now, it has always been the case that boundary_field is a constant that does not depend on field. More specifically, if boundary_matrix_field represents ∂(boundary_field)/∂(field), then ∂(@. op(field))/∂(field) can be expressed as @. op_matrix(ones_field) + boundary_matrix_field. So, if boundary_field is a constant, then boundary_matrix_field is the zero matrix and can be ignored in our computations. This amounts to assuming that the corner elements of our matrices are always zero. Due to new requirements from the land model, we will no longer be able to make this assumption, so we will need to add support for the Operators.SetValue boundary condition for FiniteDifferenceOperatorTermsMatrix, which will allow users to specify nonzero elements from boundary_matrix_field that should be added to @. op_matrix(ones_field).

Additionally, it would be good to refactor how finite difference operators are implemented so that the code for every operator does not need to be duplicated in order to implement the FiniteDifferenceOperatorTermsMatrix for that operator. Currently, every operator implements a method for stencil_interior, stencil_left_boundary, and stencil_right_boundary, and each of these methods returns some value (the value of the operator's result at a particular point in space). The FiniteDifferenceOperatorTermsMatrix for every operator also implements the same three methods, but each of its methods returns a BandMatrixRow (or, rather, a "StencilCoefs") whose entries add up to the value returned by the operator's corresponding method. There are currently almost 700 lines of code required to implement these duplicated methods, and more will need to be added in order to support multi-argument operators. It should be fairly straightforward to refactor things so that:

  • The methods of stencil_interior, stencil_left_boundary, and stencil_right_boundary that get implemented for every operator return a BandMatrixRow.
  • If the operator is wrapped in a FiniteDifferenceOperatorTermsMatrix, this BandMatrixRow gets returned as-is. If matrix boundary conditions are specified, the first or last value of this BandMatrixRow may need to be modified.
  • Otherwise, if it is not wrapped in a FiniteDifferenceOperatorTermsMatrix, the entries of this BandMatrixRow get added together before being returned.

However, this last change would only improve things "under the hood", without any immediate benefit to users. In order to avoid unnecessarily delaying EDMF and land model development, this change will be the last step of this SDI.

ColumnwiseBlockMatrix

The type we currently use to represent block matrices is the SchurComplementW object, which is hardcoded to only work for the dycore in ClimaAtmos. This can be generalized to a ColumnwiseBlockMatrix, which will be a simple dictionary that maps pairs of field names (one name for the row and another name for the column) to their corresponding matrix blocks, each of which can be a ColumnwiseBandMatrixField or a LinearAlgebra.UniformScaling. The ColumnwiseBlockMatrix will be used as follows:

matrix = ColumnwiseBlockMatrix(
    @block_name(c.ρ, c.ρ) => -LinearAlgebra.I,
    @block_name(c.ρ, f.w.components.data.:1) =>
        similar(Y.c, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, c.ρ) =>
        similar(Y.f, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, f.w.components.data.:1) =>
        similar(Y.f, TridiagonalMatrixRow{FT}),
)
matrix[@block_name(f.w.components.data.:1, f.w.components.data.:1)] .=
    -LinearAlgebra.I .+ Δtγ .* <Jacobian matrix block approximation>

The only challenge in implementing the ColumnwiseBlockMatrix is ensuring type stability, particularly when it gets used in a BlockMatrixSystemSolver (see the next section); without type stability, we will have long compilation times and unnecessary allocations. Fortunately, this has already been worked out and tested in the sketch. In particular, the @block_name macro will return a type-stable generalization of what we currently call a property_chain for both the row and column names, and the corresponding row and column fields will be accessed by using a type-stable generalization of Fields.single_field.

BlockMatrixSystemSolver

We are currently solving the linear system of equations specified by the SchurComplementW object by reducing it to a smaller tridiagonal system of equations, solving the reduced problem, and using the reduced problem's solution to compute the original problem's solution. However, this strategy will not work when we add the new implicit EDMF tendencies because the new sparsity pattern of the matrix will not allow us to specify the reduced problem without performing a computationally expensive dense matrix inversion. So, we will need to implement a new algorithm for solving sparse block matrix systems. Per @simonbyrne's advice, this algorithm will work as follows:

  • Permute the rows and columns of the ColumnwiseBlockMatrix so that, instead of blocks that correspond to pairs of variables, we end up with blocks that correspond to pairs of cells.
  • Solve the permuted block matrix system. As long as there are fewer variables than cells, inverting the blocks of the permuted matrix will be less computationally expensive than inverting the blocks of the original matrix.
  • Permute the elements of the solution back to their required form.

To illustrate how this permutation will work, here is a simplified example that illustrates how we would permute the Jacobian of a FieldVector $Y$ with total implicit tendency $Y_t$ that is defined on a single column with two cells, with a field $c$ defined on cell centers and a field $f$ defined on cell faces:

$$\text{Original } \dfrac{\partial Y_t}{\partial Y} = \begin{bmatrix} \dfrac{\partial Y_t.c}{\partial Y.c} & \dfrac{\partial Y_t.c}{\partial Y.f} \\\ \dfrac{\partial Y_t.f}{\partial Y.c} & \dfrac{\partial Y_t.f}{\partial Y.f} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} \\\ \dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\\ \dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]} \end{bmatrix} \\[2 ex] \begin{bmatrix} \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} \\\ \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[2.5]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]} \\\ \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]} \end{bmatrix} \end{bmatrix}$$ $$\text{Permuted } \dfrac{\partial Y_t}{\partial Y} = \begin{bmatrix} \dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 2})} \\\ \dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 2})} \end{bmatrix} = \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad$$ $$\qquad = \begin{cases} \begin{bmatrix} \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} \\\ \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\\ \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]} \end{bmatrix} \\[2 ex] \begin{bmatrix} \dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]} \end{bmatrix} \end{bmatrix} & \text{ if } \dfrac{\partial Y_t.\square}{\partial Y.f[2.5]} = \dfrac{\partial Y_t.f[2.5]}{\partial Y.\square} = C\delta_{\square, f[2.5]} \\[5 ex] \begin{bmatrix} \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\\ \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]} \end{bmatrix} \\[2 ex] \begin{bmatrix} \dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\\ \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]} \end{bmatrix} & \begin{bmatrix} \dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]} \\\ \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]} \end{bmatrix} \end{bmatrix} & \text{ if } \dfrac{\partial Y_t.\square}{\partial Y.f[0.5]} = \dfrac{\partial Y_t.f[0.5]}{\partial Y.\square} = C\delta_{\square, f[0.5]} \end{cases}$$

As this example illustrates, the permutation requires us to drop all of the matrix elements that correspond to the top or bottom cell face from the linear solve, which we can do as long as the only nonzero matrix element for that cell face is the "identity element". In ClimaAtmos, we know that this will always be the case for the top cell face; in the example above, the "identity element" for this cell face is $\partial Y_t.f[2.5]/\partial Y.f[2.5]$. In our code, the value of $\partial Y_t.f[2.5]/\partial Y.f[2.5]$ (or, rather, the value of -1 + Δtγ * ∂(Yₜ.f.w.components.data.:1[2.5])/∂(Y.f.w.components.data.:1[2.5]), and the corresponding EDMF updraft velocity terms) will typically be $-1$, so, using the somewhat hand-wavy notation from above, we will typically have that $\partial Y_t.\square/\partial Y.f[2.5] = \partial Y_t.f[2.5]/\partial Y.\square = -\delta_{\square, f[2.5]}$. In ClimaLSM, there are currently no prognostic variables defined on cell faces, but, if there were, this would be the case for the bottom cell face, rather than the top cell face. In general, as long as we deal with the nonzero identity element separately, we can drop all of the matrix elements related to one cell face from the linear solve. There are also two other types of variables whose corresponding matrix elements we will be able to drop from the linear solve (because the only nonzero elements will be the identity elements): variables that do not have a nonzero implicit tendency, like Y.c.uₕ in ClimaAtmos, and variables that do not lie on cell centers or on cell faces, like those related to the river model in ClimaLSM.

After the matrix is permuted, the new linear system will be solved using a band matrix solver. Both the dycore and the land model will require a tridiagonal solver, and EDMF may also require a pentadiagonal solver. If $N$ denotes the number of cells and $V$ denotes the number of variables, then the band matrix solver will be applied to a matrix of $N \times N$ blocks, where each block is itself a $V \times V$ matrix. To simplify our initial implementation, we can treat each block as a dense matrix, which we will represent using a StaticArrays.SMatrix. Although this means that we will need to perform dense matrix inversions, these shouldn't be too expensive as long as $V$ is relatively small. If we were to instead modify the current "Schur complement solve" algorithm for EDMF, we would end up needing to perform a dense matrix inversion on a large block matrix, where each block is one of the original $N \times N$ blocks that represent pairs of variables. Since it will generally be the case that $N &gt; V$, this means that the new "permuted band matrix solve" algorithm will be significantly more performant for EDMF than the current algorithm.

Unfortunately, it will not be possible to eliminate dense matrix inversions from the new algorithm altogether by specializing on the sparsity structure of the $V \times V$ blocks. Even though each $V \times V$ block in ClimaAtmos will be an arrowhead matrix (both for the dycore and for EDMF), the band matrix solver will need to evaluate linear combinations of products of the blocks and their inverses, which will not have any particularly nice sparsity structure. Specifically, the inverse of an arrowhead matrix is a diagonal-plus-rank-one (DPR1) matrix, and all of the following matrix-matrix products are neither arrowhead nor DPR1 matrices: arrowhead times arrowhead, DPR1 times DPR1, and arrowhead times DPR1 (unless they are inverses of each other). This means that the new algorithm is likely to be slower than the current one for the dycore, since the current one does not involve any dense matrix inversions. So, in addition to implementing the new algorithm for BlockMatrixSystemSolver, we will also need to port over the algorithm currently implemented in ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl. This will require updating the current algorithm to use the new interface for BandMatrixRow and MultiplyColumnwiseBandMatrixField, and it will also require allowing the BlockMatrixSystemSolver to determine which of the two algorithms it should use based on the type of the ColumnwiseBlockMatrix given to it.

Task breakdown

  • Make proper climacore MatrixFields API #1350
    PR: Add new MatrixFields module, along with unit tests and performance tests #1326
    Estimated Completion Date: July 10th
    • Refactor BandMatrixRow and MultiplyColumnwiseBandMatrixField. Ensure that these objects have good docstrings.
    • Set up a thorough suite of unit tests for BandMatrixRow and MultiplyColumnwiseBandMatrixField in CI. These tests should ensure correctness, GPU compatibility, and type stability (i.e., no unexpected allocations). The tests should also be helpful for performance analysis.
  • Rename and extend FiniteDifferenceOperatorTermsMatrix. Add a helpful docstring, and add more unit tests to the test suite.
    PR: Add operator_matrix to MatrixFields, along with tests and docs #1399
    Estimated Completion Date: July 31st
  • Implement ColumnwiseBlockMatrix and BlockMatrixSystemSolver, but only with the current "Schur complement solve" algorithm from ClimaAtmos. Add docstrings and unit tests.
    PR: Add FieldMatrix and linear solvers #1436
    Estimated Completion Date: July 25th
  • Update ClimaAtmos to use the new interface. Verify that the results are unchanged and that the performance is not worsened.
    PR: Update to new implicit solver interface ClimaAtmos.jl#2044
    Estimated Completion Date: August 4th
  • Add the new "permuted band matrix solve" algorithm to BlockMatrixSystemSolver. Expand the docstring and add unit tests.
    Skipping this in favor of JFNK with clever preconditioning.
    Skipping this in favor of the SchurComplementReductionSolve algorithm in conjunction with a BlockArrowheadSchurComplementPreconditioner.
    PR: Add matrix-free iterative solver and fix bugs in FieldNameSet #1551
  • Modify the ClimaAtmos implicit solver to handle implicit EDMF tendencies. Confirm that we get the expected results.
    Estimated Completion Date: TBD
  • When there is some available time, refactor finite difference operators and their corresponding matrices to remove unnecessarily duplicated code.
    Estimated Completion Date: TBD

Reviewers

@dennisYatunin dennisYatunin added the SDI Software Development Issue label May 6, 2023
@dennisYatunin dennisYatunin self-assigned this May 6, 2023
@kmdeck
Copy link
Member

kmdeck commented May 12, 2023

If we imagine a system with n prognostic variables, and denote the tendency due to the boundary term as t_{bc}, t_{bc} is a vector of length n. Then, the derivative is an object like ∂t_{bc,j}∂Y_i, where i and j range from 1 to n, and Y_i is (for us) the prognostic state defined on cell centers at the top (or bottom) layer.

Two questions

  • will we be supplying ∂t_{bc,j}∂Y_i or something different (e.g. derivatives of boundary conditions with respect to state)? Based on what is written, we set values of ∂(boundary_field)/∂(field), and I think boundary_field is the tendency contribution, but want to double check.
  • will there be any assumption like ∂t_{bc,j}∂Y_i \propto \delta_{i,j}? Or could we set nonzero values of the tendency of one variable with respect to another?

@charleskawczynski
Copy link
Member

Can we:

  • Write tests for this sketch that compare against a very simple Fortran-style array implementation, for a few core examples?
  • Write additional unit tests that you believe are important to ensure the correctness, including edge and corner cases (and so that others can exercise this code for, e.g., performance analysis)

@tapios
Copy link
Contributor

tapios commented May 12, 2023

This describes the current issues well and clearly. The description of the solution could use detail (though the draft PR helps).

For us to be able to track progress, could you please add milestones expected completion dates (roughly weekly milestones)?

It'll also be important for @sriharshakandala, @charleskawczynski, and @simonbyrne to review, especially with regard to GPU-friendliness of the proposed solutions.

Wherever this refactoring lands, it'll be important to keep the current Schur complement formulation around (e.g., for convection resolving simulations). It may be nice to allow this as an option that lives within this refactoring, rather than outside of it.

@charleskawczynski
Copy link
Member

Also, please do not leave the simple array-based implementation test as an after thought-- this will be helpful to evaluate the design before it's fully implemented, and it will be too late to help reviewers judge the quality of the design if it's the very last thing that is added. cc @dennisYatunin

@dennisYatunin
Copy link
Member Author

dennisYatunin commented May 16, 2023

@kmdeck @juliasloan25 After discussing the matter with @simonbyrne, it looks like we will be able to set the derivatives of operator boundary conditions without adding any new operators (and without any assumptions like ∂t_{bc,j}∂Y_i \propto \delta_{i,j}). However, we will no longer be able to attach boundary conditions to face-to-center operators like DivergenceF2C, and we will instead need to ensure that the arguments of these operators either have well-defined boundary values or are wrapped in a SetBoundaryOperator. This is because, if we allow operators to overwrite the face values of their inputs, then we also need to allow the corresponding operator matrices to overwrite the values of matrices/vectors that they get multiplied by, but there is no good way to distinguish a matrix that overwrites values during multiplication from one that does not overwrite values (since every value stored in a Field must have the same type, we cannot simply attach additional type information to indicate how values should be handled during multiplication).

Here are some concrete examples:

  • First, a relatively simple example. Suppose that flux³ is a field of scalars on the faces of three cells, and that top_flux³ is another scalar. Suppose that we define
    div = Operators.DivergenceF2C()
    set_boundary_flux = Operators.SetBoundaryOperator(; top = Operators.SetValue(CT3(top_flux³)))
    flux = set_boundary_flux.(CT3.(flux³))
    
    CT3 is short for Contravariant3Vector, so flux is a field that represents a vector-valued flux through each cell face, where the flux through the top cell face is overwritten from CT3(flux³[3.5]) to CT3(top_flux³). If J[n] denotes the Jacobian of the metric tensor at index n, we can express div.(flux) as $$\begin{gather} \text{div}.(\text{flux}) = \text{div}.\begin{pmatrix} \text{CT3}(\text{flux³}[0.5]) \\\ \text{CT3}(\text{flux³}[1.5]) \\\ \text{CT3}(\text{flux³}[2.5]) \\\ \text{CT3}(\text{top\_flux³}) \end{pmatrix} = \begin{pmatrix} \dfrac{1}{J[1]}\ (J[1.5]\ \text{flux³}[1.5] - J[0.5]\ \text{flux³}[0.5]) \\\ \dfrac{1}{J[2]}\ (J[2.5]\ \text{flux³}[2.5] - J[1.5]\ \text{flux³}[1.5]) \\\ \dfrac{1}{J[3]}\ (J[3.5]\ \text{top\_flux³} - J[2.5]\ \text{flux³}[2.5]) \end{pmatrix} \end{gather}$$ Now, suppose that $$\frac{\partial\text{top\_flux³}}{\partial\text{flux³}[n]} = \begin{cases} \text{top\_flux³\_deriv} & n = 3.5 \\\ 0 & n < 3.5 \end{cases}$$ This means that the derivative of div.(flux) with respect to flux³ is the matrix $$\begin{align} \dfrac{\partial\text{div}.(\text{flux})}{\partial\text{flux³}} &= \begin{pmatrix} \dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[0.5]} & \dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[1.5]} & \dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[2.5]} & \dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[3.5]} \\\ \dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[0.5]} & \dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[1.5]} & \dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[2.5]} & \dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[3.5]} \\\ \dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[0.5]} & \dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[1.5]} & \dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[2.5]} & \dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[3.5]} \end{pmatrix} = \\[1em] &= \begin{pmatrix} -\dfrac{J[0.5]}{J[1]} & \dfrac{J[1.5]}{J[1]} & 0 & 0 \\\ 0 & -\dfrac{J[1.5]}{J[2]} & \dfrac{J[2.5]}{J[2]} & 0 \\\ 0 & 0 & -\dfrac{J[2.5]}{J[3]} & \dfrac{J[3.5]}{J[3]}\ \text{top\_flux³\_deriv} \end{pmatrix} \end{align}$$ After this SDI is implemented, the preferred way to specify this matrix will be div_matrix.(∂flux∂flux³), where
    div_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(div)
    set_boundary_flux_deriv = Operators.SetBoundaryOperator(;
        top = Operators.SetValue(CT3(top_flux³_deriv)),
    )
    ∂flux∂flux³ = set_boundary_flux_deriv.(CT3.(one.(flux³)))
    
    The field ∂flux∂flux³ represents the derivative of each vector-valued flux with respect to the corresponding scalar in flux³: $$\text{∂flux∂flux³} = \begin{pmatrix} \dfrac{\partial\text{flux}[0.5]}{\partial\text{flux³}[0.5]} \\\ \dfrac{\partial\text{flux}[1.5]}{\partial\text{flux³}[1.5]} \\\ \dfrac{\partial\text{flux}[2.5]}{\partial\text{flux³}[2.5]} \\\ \dfrac{\partial\text{flux}[3.5]}{\partial\text{flux³}[3.5]} \end{pmatrix} = \begin{pmatrix} \text{CT3}(1) \\\ \text{CT3}(1) \\\ \text{CT3}(1) \\\ \text{CT3}(\text{top\_flux³\_deriv}) \end{pmatrix}$$
  • Now, an example that more closely matches your code (modulo some minus signs). Suppose that θ is a field of scalars on the centers of three cells, and that h, K, Fb, and Ft are all differentiable functions from scalars to scalars. In addition, suppose that
    interp = Operators.InterpolateC2F()
    grad = Operators.GradientC2F()
    set_boundary_fluxes = Operators.SetBoundaryOperator(;
        bottom = Operators.SetValue(C3(∂x³∂ξ³[0.5] Fb(θ[1]))),
        top = Operators.SetValue(C3(∂x³∂ξ³[3.5] Ft(θ[3]))),
    )
    div  = Operators.DivergenceF2C()
    
    The values of ∂x³∂ξ³ are used to convert the boundary fluxes Fb(θ[1]) and Ft(θ[3]) from physical units (m/s if the values of θ are dimensionless), to "covariant units" (m^2/s if the values of θ are dimensionless). This makes it possible to specify the boundary fluxes as Covariant3Vectors, which are normal to the boundary faces. We can also project the boundary fluxes onto the ξ³-axis to turn them into Contravariant3Vectors, which are the native inputs of the div operator: $$\begin{align} \text{CT3}(\text{C3}(∂x³∂ξ³[0.5]\ Fb[1])) &= \text{CT3}(\text{C3}(1))\ ∂x³∂ξ³[0.5]\ Fb[1] = \\\ &= \text{CT3}(g³³[0.5])\ ∂x³∂ξ³[0.5]\ Fb[1] = \\\ &= \text{CT3}(g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1]) \\\ \text{CT3}(\text{C3}(∂x³∂ξ³[3.5]\ Ft[3])) &= \text{CT3}(\text{C3}(1))\ ∂x³∂ξ³[3.5]\ Ft[3] = \\\ &= \text{CT3}(g³³[3.5])\ ∂x³∂ξ³[3.5]\ Ft[3] = \\\ &= \text{CT3}(g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3]) \end{align}$$ To simplify our notation, we will be using the following shorthands: $$\begin{align} h[n] &= h(θ[n]) \\\ K[n] &= K(θ[n]) \\\ Fb[n] &= Fb(θ[n]) \\\ Ft[n] &= Ft(θ[n]) \\\ h'[n] &= h'(θ[n]) \\\ K'[n] &= K'(θ[n]) \\\ Fb'[n] &= Fb'(θ[n]) \\\ Ft'[n] &= Ft'(θ[n]) \\\ \end{align}$$ We can express the tendency of θ as $$\begin{align} θₜ &= \text{div}.(\text{set\_boundary\_fluxes}.(\text{interp}.(K.(θ))\ .*\ \text{grad}.(h.(θ)))) = \\\ &= \text{div}.\left(\text{set\_boundary\_fluxes}.\left( \text{interp}.\begin{pmatrix} K[1] \\ K[2] \\ K[3] \end{pmatrix} \ .*\ \text{grad}.\begin{pmatrix} h[1] \\ h[2] \\ h[3] \end{pmatrix} \right)\right) = \\[1em] &= \text{div}.\left(\text{set\_boundary\_fluxes}.\left( \begin{pmatrix} \text{undefined} \\\ \dfrac{1}{2}\ (K[1] + K[2]) \\\ \dfrac{1}{2}\ (K[2] + K[3]) \\\ \text{undefined} \end{pmatrix}\ .*\ \begin{pmatrix} \text{undefined} \\\ \text{C3}(1)\ (h[2] - h[1]) \\\ \text{C3}(1)\ (h[3] - h[2]) \\\ \text{undefined} \end{pmatrix} \right)\right) = \\[1em] &= \text{div}.\left(\text{set\_boundary\_fluxes}.\left( \begin{pmatrix} \text{undefined} \\\ \dfrac{1}{2}\ (K[1] + K[2]) \\\ \dfrac{1}{2}\ (K[2] + K[3]) \\\ \text{undefined} \end{pmatrix}\ .*\ \begin{pmatrix} \text{undefined} \\\ \text{CT3}(g³³[1.5])\ (h[2] - h[1]) \\\ \text{CT3}(g³³[2.5])\ (h[3] - h[2]) \\\ \text{undefined} \end{pmatrix} \right)\right) = \\[1em] &= \text{div}.\left(\text{set\_boundary\_fluxes}. \begin{pmatrix} \text{undefined} \\\ \text{CT3}\left(\dfrac{1}{2}\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1])\right) \\\ \text{CT3}\left(\dfrac{1}{2}\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2])\right) \\\ \text{undefined} \end{pmatrix} \right) = \\[1em] &= \text{div}. \begin{pmatrix} \text{CT3}(g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1]) \\\ \text{CT3}\left(\dfrac{1}{2}\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1])\right) \\\ \text{CT3}\left(\dfrac{1}{2}\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2])\right) \\\ \text{CT3}(g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3]) \end{pmatrix} = \\[1em] &= \begin{pmatrix} \dfrac{1}{J[1]}\ \begin{pmatrix} \dfrac{1}{2}\ J[1.5]\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1]) -{} \\\ J[0.5]\ g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1] \end{pmatrix} \\\ \dfrac{1}{J[2]}\ \begin{pmatrix} \dfrac{1}{2}\ J[2.5]\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2]) -{} \\\ \dfrac{1}{2}\ J[1.5]\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1]) \end{pmatrix} \\\ \dfrac{1}{J[3]}\ \begin{pmatrix} J[3.5]\ g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3] -{} \\\ \dfrac{1}{2}\ J[2.5]\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2]) \end{pmatrix} \end{pmatrix} \end{align}$$ The derivative of the tendency with respect to θ is the matrix $$\begin{align} \dfrac{\partial θₜ}{\partial θ} &= \begin{pmatrix} \dfrac{\partial θₜ[1]}{\partial θ[1]} & \dfrac{\partial θₜ[1]}{\partial θ[2]} & \dfrac{\partial θₜ[1]}{\partial θ[3]} \\\ \dfrac{\partial θₜ[2]}{\partial θ[1]} & \dfrac{\partial θₜ[2]}{\partial θ[2]} & \dfrac{\partial θₜ[2]}{\partial θ[3]} \\\ \dfrac{\partial θₜ[3]}{\partial θ[1]} & \dfrac{\partial θₜ[3]}{\partial θ[2]} & \dfrac{\partial θₜ[3]}{\partial θ[3]} \end{pmatrix} = \\[1em] &= \tiny\begin{pmatrix} \begin{matrix} \dfrac{J[1.5]\ g³³[1.5]}{2J[1]}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} -{} \\\ \dfrac{J[0.5]}{J[1]}\ g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'[1] \end{matrix} & \dfrac{J[1.5]\ g³³[1.5]}{2J[1]}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} & 0 \\[1em] \dfrac{J[1.5]\ g³³[1.5]}{2J[2]}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} & \begin{matrix} \dfrac{J[2.5]\ g³³[2.5]}{2J[2]}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} -{} \\\ \dfrac{J[1.5]\ g³³[1.5]}{2J[2]}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])] h'[2]\end{pmatrix} \end{matrix} & \dfrac{J[2.5]\ g³³[2.5]}{2J[2]}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\[1em] 0 & \dfrac{J[2.5]\ g³³[2.5]}{2J[3]}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} & \begin{matrix} \dfrac{J[3.5]}{J[3]}\ g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'[3] -{} \\\ \dfrac{J[2.5]\ g³³[2.5]}{2J[3]}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \end{matrix} \end{pmatrix} \end{align}$$ Now, suppose that
    unit_CT3_field = CT3.(ones(Spaces.FaceExtrudedFiniteDifferenceSpace(axes(θ))))
    interp_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(interp)
    grad_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(grad)
    set_matrix_boundary_rows = Operators.SetBoundaryOperator(;
        bottom = Operators.SetValue(UpperBidiagonalMatrixRow(C3(∂x³∂ξ³[0.5] Fb'(θ[1])))),
        top = Operators.SetValue(LowerBidiagonalMatrixRow(C3(∂x³∂ξ³[3.5] Ft'(θ[3])))),
    )
    div_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(div)
    
    After this SDI is implemented, the preferred way to specify the derivative matrix will be
    div_matrix.(unit_CT3_field) .⋅ set_matrix_boundary_rows(
        interp_matrix.(K'.(θ)) .* grad.(h.(θ)) .+
        interp.(K.(θ)) .* grad_matrix.(h'.(θ))
    )
    
    The first term in this matrix-matrix multiplication is $$\text{div\_matrix}.(\text{unit\_CT3\_field}) = \begin{pmatrix} -\dfrac{J[0.5]}{J[1]} & \dfrac{J[1.5]}{J[1]} & 0 & 0 \\\ 0 & -\dfrac{J[1.5]}{J[2]} & \dfrac{J[2.5]}{J[2]} & 0 \\\ 0 & 0 & -\dfrac{J[2.5]}{J[3]} & \dfrac{J[3.5]}{J[3]} \end{pmatrix}$$ The second term roughly corresponds to the matrix $$\begin{gather} \text{set\_matrix\_boundary\_rows}.\begin{pmatrix} \text{interp\_matrix}.(K'.(θ))\ .*\ \text{grad}.(h.(θ))\ .+ \\\ \text{interp}.(K.(θ))\ .*\ \text{grad\_matrix}.(h'.(θ)) \end{pmatrix} =\\[1 em] = \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix} \begin{pmatrix} \text{undefined} & \text{undefined} & \text{undefined} \\\ \dfrac{1}{2}\ K'[1] & \dfrac{1}{2}\ K'[2] & 0 \\\ 0 & \dfrac{1}{2}\ K'[2] & \dfrac{1}{2}\ K'[3] \\\ \text{undefined} & \text{undefined} & \text{undefined} \end{pmatrix}\ .*\ \begin{pmatrix} \text{undefined} \\\ g³³[1.5]\ (h[2] - h[1]) \\\ g³³[2.5]\ (h[3] - h[2]) \\\ \text{undefined} \end{pmatrix}\ .+ \\[1em] \begin{pmatrix} \text{undefined} \\\ \dfrac{1}{2}\ (K[1] + K[2]) \\\ \dfrac{1}{2}\ (K[2] + K[3]) \\\ \text{undefined} \end{pmatrix}\ .*\ \begin{pmatrix} \text{undefined} & \text{undefined} & \text{undefined} \\\ -g³³[1.5]\ h'[1] & g³³[1.5]\ h'[2] & 0 \\\ 0 & -g³³[2.5]\ h'[2] & g³³[2.5]\ h'[3] \\\ \text{undefined} & \text{undefined} & \text{undefined} \end{pmatrix} \end{pmatrix} \normalsize=\\[1 em] = \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix} \begin{pmatrix} \text{undefined} & \text{undefined} & \text{undefined} \\\ \dfrac{g³³[1.5]}{2}\ K'[1]\ (h[2] - h[1]) & \dfrac{g³³[1.5]}{2}\ K'[2]\ (h[2] - h[1]) & 0 \\\ 0 & \dfrac{g³³[2.5]}{2}\ K'[2]\ (h[3] - h[2]) & \dfrac{g³³[2.5]}{2}\ K'[3]\ (h[3] - h[2]) \\\ \text{undefined} & \text{undefined} & \text{undefined} \end{pmatrix}\ .+ \\[1em] \begin{pmatrix} \text{undefined} & \text{undefined} & \text{undefined} \\\ -\dfrac{g³³[1.5]}{2}\ (K[1] + K[2])\ h'[1] & \dfrac{g³³[1.5]}{2}\ (K[1] + K[2])\ h'[2] & 0 \\\ 0 & -\dfrac{g³³[2.5]}{2}\ (K[2] + K[3])\ h'[2] & \dfrac{g³³[2.5]}{2}\ (K[2] + K[3])\ h'[3] \\\ \text{undefined} & \text{undefined} & \text{undefined} \end{pmatrix} \end{pmatrix} \normalsize= \end{gather}$$ $$\begin{gather} = \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix} \text{undefined} & \text{undefined} & \text{undefined} \\\ \dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} & \dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} & 0 \\\ 0 & \dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} & \dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\\ \text{undefined} & \text{undefined} & \text{undefined} \end{pmatrix} \normalsize= \\[1em] = \tiny\begin{pmatrix} g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'(θ[1]) & 0 & 0 \\\ \dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} & \dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} & 0 \\\ 0 & \dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} & \dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\\ 0 & 0 & g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'(θ[3]) \end{pmatrix} \end{gather}$$
  • Finally, we can also rewrite the previous example without using SetBoundaryOperator. We can do this by moving the boundary conditions from set_boundary_fluxes to interp and grad, and by moving the corresponding matrix boundary conditions from set_matrix_boundary_rows to interp_matrix and grad_matrix. For example, we could set
    interp = Operators.InterpolateC2F(;
        bottom = Operators.SetValue(1),
        top = Operators.SetValue(1),
    )
    grad = Operators.GradientC2F(;
        bottom = Operators.SetGradient(C3(∂x³∂ξ³[0.5] Fb(θ[1]))),
        top = Operators.SetGradient(C3(∂x³∂ξ³[3.5] Ft(θ[3]))),
    )
    
    We could then obtain the same tendency of θ by evaluating $$θₜ = \text{div}.(\text{interp}.(K.(θ))\ .*\ \text{grad}.(h.(θ)))$$ The preferred way to express the derivative matrix would then be
    div_matrix.(unit_CT3_field) .⋅ (
        interp_matrix.(K'.(θ)) .* grad.(h.(θ)) .+
        interp.(K.(θ)) .* grad_matrix.(h'.(θ))
    )
    
    In this case, the operator matrices would be
    interp_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(
        interp;
        bottom = Operators.SetValue(UpperBidiagonalMatrixRow(0)),
        top = Operators.SetValue(LowerBidiagonalMatrixRow(0)),
    )
    grad_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(
        grad;
        bottom = Operators.SetValue(UpperBidiagonalMatrixRow(C3(∂x³∂ξ³[0.5] Fb'(θ[1])))),
        top = Operators.SetValue(LowerBidiagonalMatrixRow(C3(∂x³∂ξ³[3.5] Ft'(θ[3])))),
    )
    
    The second term in the matrix-matrix multiplication would then be $$\begin{gather} \text{interp\_matrix}.(K'.(θ))\ .*\ \text{grad}.(h.(θ))\ .+\ \text{interp}.(K.(θ))\ .*\ \text{grad\_matrix}.(h'.(θ)) =\\[1 em] = \tiny\begin{matrix} \begin{pmatrix} 0 & 0 & 0 \\\ \dfrac{1}{2}\ K'[1] & \dfrac{1}{2}\ K'[2] & 0 \\\ 0 & \dfrac{1}{2}\ K'[2] & \dfrac{1}{2}\ K'[3] \\\ 0 & 0 & 0 \end{pmatrix}\ .*\ \begin{pmatrix} g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb(θ[1]) \\\ g³³[1.5]\ (h[2] - h[1]) \\\ g³³[2.5]\ (h[3] - h[2]) \\\ g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft(θ[3]) \end{pmatrix}\ .+ \\[1em] \begin{pmatrix} 1 \\\ \dfrac{1}{2}\ (K[1] + K[2]) \\\ \dfrac{1}{2}\ (K[2] + K[3]) \\\ 1 \end{pmatrix}\ .*\ \begin{pmatrix} g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'(θ[1]) & 0 & 0 \\\ -g³³[1.5]\ h'[1] & g³³[1.5]\ h'[2] & 0 \\\ 0 & -g³³[2.5]\ h'[2] & g³³[2.5]\ h'[3] \\\ 0 & 0 & g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'(θ[3]) \end{pmatrix} \end{matrix} \end{gather}$$ In order to simplify things, we will have operator matrices contain 0's at the boundaries by default, so the boundary conditions for interp_matrix will not actually need to be specified.

@dennisYatunin
Copy link
Member Author

@tapios @charleskawczynski I have made the following updates to the SDI based on your comments:

  • Split the tasks into more incremental milestones, each of which will take at most a week (if there are no unforeseen issues).
  • Add a specific task for porting over the current "Schur complement solve" from ClimaAtmos, and modify the SDI to explain why this is necessary.
  • Add a specific task for setting up a test suite that will monitor correctness and performance.

After a lengthy discussion with Simon and Sriharsha today, it has become clear that the "permuted band matrix solve" algorithm will require a fair bit of work to ensure that it is performant and GPU-compatible. However, all of the tasks that come before the implementation of this new algorithm (up to testing the interface in ClimaAtmos) should be good as-is. Per the revised time estimates, these tasks should take 3--4 weeks, which should be enough time for us to settle on a decent implementation of the new algorithm. If I start later this week, this means that the interface will be tested in ClimaAtmos by around this time in June.

@dennisYatunin
Copy link
Member Author

@simonbyrne @sriharshakandala Here is a list of all the matrix sparsity patterns that BlockMatrixSystemSolver will need to support in the near future. For each case, I will specify the sparsity pattern of $J = \partial Y_t/\partial Y$, where $Y_t$ is the implicit tendency of $Y$, though the actual matrix required by the implicit solver will be $W = -I + \Delta t \gamma J$, where $I$ is the identity matrix and $\Delta t \gamma$ is a scalar. I will also use the following shorthands:

  • $\mathbb{D}$ refers to a diagonal matrix field
  • $\mathbb{B}$ refers to a bidiagonal matrix field
  • $\mathbb{T}$ refers to a tridiagonal matrix field
  • $\mathbb{Q}$ refers to a quaddiagonal matrix field
  • $\mathbb{P}$ refers to a pentadiagonal matrix field
  • $u_3\_\text{data}$ refers to $u_3.\text{components}.\text{data}.{:}1$

For ClimaLSM, we only need to support the sparsity pattern of the matrix specified in the comment above,

$$J = \begin{pmatrix} \dfrac{\partial Y_t.c.\theta}{\partial Y.c.\theta} \end{pmatrix} = \begin{pmatrix} \mathbb{T} \end{pmatrix}$$

In the future, ClimaLSM will also include the prognostic variable $Y.c.\psi$, but this is still several months away.

For the ClimaAtmos dycore, the sparsity pattern of $J$ with $T$ tracers (and no FCT, just upwinding or central differencing) is

$$J = \begin{pmatrix} \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho}{\partial Y.f.u_3\_\text{data}} \\\ \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.f.u_3\_\text{data}} \\\ \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.f.u_3\_\text{data}} \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.f.u_3\_\text{data}} \\\ \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.f.u_3\_\text{data}} \end{pmatrix} =$$ $$= \begin{pmatrix} \mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ \mathbb{T}/\mathbb{P} & \mathbb{T}/\mathbb{P} & 0/\mathbb{T}/\mathbb{P} & \cdots & 0/\mathbb{T}/\mathbb{P} & \mathbb{Q} \\\ \mathbb{T}/\mathbb{P} & 0 & \mathbb{T}/\mathbb{P} & \cdots & 0 & \mathbb{B} \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ \mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & \mathbb{T}/\mathbb{P} & \mathbb{B} \\\ \mathbb{B} & \mathbb{B} & 0/\mathbb{B} & \cdots & 0/\mathbb{B} & \mathbb{T} \end{pmatrix}$$

However, instead of using the exact value of $J$, we approximate it so that its sparsity pattern simplifies to

$$J \approx \begin{pmatrix} 0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ 0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ 0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ 0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ \mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T} \end{pmatrix}$$

For AMIP, we only need to support $T = 1$, with $\chi = q_{tot}$. After AMIP, though, we will need to support $T \gg 1$, with $q_{liq}$, $q_{ice}$, $q_{rai}$, $q_{sno}$, and every aerosol represented by a prognostic variable. So, BlockMatrixSystemSolver needs to perform well for the dycore with large values of $T$. At the moment, the best algorithm we have for this is the "Schur complement solve".

When we add EDMF to ClimaAtmos, we end up with a significantly more complicated Jacobian. We are not entirely certain of how sparse we can afford to make our approximation of $J$, but our best guess at the moment (with only one draft in in the EDMF model, which corresponds to only one sub-grid-scale copy of the grid-scale state) is

$$\begin{gather} J = \begin{pmatrix} \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}} \end{pmatrix},\ \text{where}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} \approx 0, \\[1 em] \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} \approx \begin{pmatrix} 0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ 0 & \mathbb{T} & 0 & \cdots & 0 & \mathbb{B} \\\ 0 & 0 & \mathbb{T} & \cdots & 0 & \mathbb{B} \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ 0 & 0 & 0 & \cdots & \mathbb{T} & \mathbb{B} \\\ \mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T} \end{pmatrix},\ \text{and}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}} \approx \begin{pmatrix} \mathbb{D} & 0 & 0 & \cdots & 0 & \mathbb{B} \\\ \mathbb{D} & \mathbb{D} & 0 & \cdots & 0 & \mathbb{B} \\\ \mathbb{D} & 0 & \mathbb{D} & \cdots & 0 & \mathbb{B} \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ \mathbb{D} & 0 & 0 & \cdots & \mathbb{D} & \mathbb{B} \\\ \mathbb{B} & 0 & 0 & \cdots & 0 & \mathbb{T} \end{pmatrix} \end{gather}$$

Since we plan to approximate the bottom-left matrix block as 0, our strategy for solving the linear problem $W\ \Delta Y = b$ will be to first solve a linear problem that corresponds to the bottom-right block, and to then solve another linear problem that corresponds to the top-left block. That is to say,

$$\begin{gather} W\ \Delta Y = b \implies \\[1 em] \begin{pmatrix} -I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\\ 0 & -I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}} \end{pmatrix} \begin{pmatrix} \Delta\text{ GS State} \\ \Delta\text{ SGS State} \end{pmatrix} = \begin{pmatrix} \text{GS RHS} \\ \text{SGS RHS} \end{pmatrix} \implies \\[1 em] \left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}\right)\ \Delta\text{ SGS State} = \text{SGS RHS}\ \text{and} \\\ \left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}}\right)\ \Delta\text{ GS State} = \text{GS RHS} - \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}}\ \Delta\text{ SGS State} \end{gather}$$

This means that our implicit solver will first compute changes at the sub-grid-scale ($\Delta\text{ SGS State}$), and then it will compute changes at the grid-scale ($\Delta\text{ GS State}$). We do not yet know how we will approximate the top-right matrix block of the Jacobian, so we do not know what its sparsity structure will be. Fortunately, this doesn't actually matter, since that block will only be used for computing a matrix-vector product, rather than for solving a linear problem.

In the near future, we only plan to use EDMF with $T = 1$ or $T \approx 1$, since there is still a fair bit of work that needs to be done before we are ready to test it with large numbers of tracers. So, we can afford to have BlockMatrixSystemSolver scale poorly with $T$ when we use EDMF, which justifies using the "permuted matrix solve" algorithm proposed in this SDI.

@tapios
Copy link
Contributor

tapios commented May 20, 2023

I appreciate your thoroughness, @dennisYatunin!

One important simplification, though: In the dycore, we can treat all tracers fully explicitly, so T=0 is fine, or, if you want to include moisture, T=1 (but even that is not strictly necessary). We only need to treat the fast waves (acoustic and gravity) implicitly, and tracers play no role in them.

@dennisYatunin
Copy link
Member Author

dennisYatunin commented May 22, 2023

@tapios Thanks! We actually stopped making that assumption in ClimaAtmos sometime last year. I think the intent was to ensure that all forms of moisture get treated roughly the same way as energy in the implicit solve, but neither Daniel nor Zhaoyi nor I remember how significant the resulting timestep increase was. Should we revert to making that approximation, or add a flag to toggle between the two options? In either case, the "Schur complement solve" scales very well with $T$, so whether or not the vertical advection of tracers is treated implicitly should not have a significant impact on performance.

@tapios
Copy link
Contributor

tapios commented May 23, 2023

@dennisYatunin It's ok to treat the moisture tracers (q_t, q_l, q_i) implicitly and in the same way as energy. But no need to do the same for other tracers.

@sriharshakandala
Copy link
Member

sriharshakandala commented May 26, 2023

@dennisYatunin :
Is there a reason we are deviating from the more traditional band storage format (https://www.ibm.com/docs/en/essl/6.2?topic=representation-blas-general-band-storage-mode) for storing band matrices, which essentially stores the bands?
We can use an offset array and eliminate storing the "unused data" for off-diagonal bands!

@dennisYatunin
Copy link
Member Author

dennisYatunin commented May 26, 2023

@sriharshakandala We are indeed storing the matrices in a traditional band storage format, except we are using "row-major" storage instead of "column-major" storage (the link you sent uses "column-major" storage). We are storing the matrices row-by-row in order to encode them as Fields, and we only store the nonzero entries in each row. The top and bottom rows are exceptions to this; since they have fewer nonzero entries than the interior rows, we need to pad them with several "dummy entries" that do not lie in the matrix (these are equivalent to the *s in the link you sent). Also, we want our matrices to be Fields so that we can include them in Field broadcasts, which makes it much easier to specify Jacobian blocks.

@sriharshakandala
Copy link
Member

Thanks, @dennisYatunin. For the quaddiagonal and pentadiagonal matrices, I am assuming we are either losing them in the Jacobian matrix approximation or the bands are located contiguously!

@dennisYatunin
Copy link
Member Author

@tapios Time estimates have been updated!

@tapios
Copy link
Contributor

tapios commented Jul 7, 2023

Thanks, @dennisYatunin. Looking good!

bors bot added a commit that referenced this issue Jul 21, 2023
1326: Add new MatrixFields module, along with unit tests and performance tests r=dennisYatunin a=dennisYatunin

# Purpose

This PR encompasses the first two tasks of the implicit interface updates [SDI](#1230). That is, it refactors `StencilCoefs` into `BandMatrixRow` and `ApplyStencil`/`ComposeStencil` into `MultiplyColumnwiseBandMatrixField`, and it also adds a comprehensive set of tests. In the process of refactoring, the code for low-level matrix operations has been simplified and useful new functionality has been added. In order to avoid conflicts with pre-existing code, the new data structures and functions have been placed in a separate module called `MatrixFields`; this module can be removed once we are ready to replace all of the old code.

## Content

### API improvements
- Matrix field rows now have simple constructors (e.g., `DiagonalMatrixRow` and `TridiagonalMatrixRow`) that can be used to construct matrix fields on the fly in broadcast expressions.
- Matrix fields are now automatically promoted to common types for basic arithmetic operations. This involves promoting the entries in the rows, as well as padding rows with zeros when rows have unequal numbers of entries. In addition, `LinearAlgebra.I` (or, rather, `LinearAlgebra.UniformScaling`) is automatically promoted to a `DiagonalMatrixRow` when used with `BandMatrixRow`s. So, for example, it is now possible to evaluate `DiagonalMartrixRow(1) - 2 * TridiagonalMatrixRow(3, 4, 5) - I / 6`.
- Matrix fields are now fully integrated with `RecursiveApply`, which allows a single matrix field to store multiple matrices by using `Tuple`s or `NamedTuple`s as entries instead of scalars. This is achieved by using the new `rzero` function instead of `zero`, and also by making `+`, `-`, `*`, and `/` call `radd`, `rsub`, `rmul`, and `rdiv` when used on `BandMatrixRow`s. This change will make it possible to, e.g., store the derivatives of all tracer tendencies with respect to vertical velocity as a single matrix field in `ClimaAtmos`, which will lead to simpler code and better performance.
- All matrix-matrix and matrix-vector multiplications can now be handled using the `⋅` operator, which is an alias for `MultiplyColumnwiseBandMatrixField()`. The implementation of this operator is much simpler than the implementation of `ApplyStencil`/`ComposeStencil`, and it is effectively localized to the single function `multiply_matrix_at_index`. In order to avoid inference failures for complex matrix field broadcast expressions, the recursion limit for this function needed to be disabled. Unfortunately, just as with the pre-existing code, inlining this function with ``@propagate_inbounds`` drastically increased compilation time, and it also caused inference failures for particularly complicated matrix broadcast expressions. Although not using ``@propagate_inbounds`` causes a 3--4x slowdown in the evaluation of matrix broadcast expressions, it avoids a 10--100x slowdown in their compilation.
- The matrix of the divergence operator can now be properly represented as a matrix of covectors, rather than a matrix of scalars. Since the divergence operator turns vectors into scalars, multiplying its matrix by a matrix/vector of vectors should return a matrix/vector of scalars. However, this is not as simple as just multiplying a covector by a vector and returning a scalar, since the vector may need to be projected onto the dual axis of the covector before the multiplication. So, matrix-matrix and matrix-vector multiplication is now implemented using the `rmul_with_projection` function instead of `rmul`. In general, this function works just like `rmul`, but, when the first argument is a covector (an `AdjointAxisVector`) and the second is a vector or higher-order tensor (an `AxisTensor`), it uses the local geometry to project the second argument before the multiplication. In the future, we may want to generalize this to higher-order cotensors.
- The `MatrixFields` module also includes functions for converting matrix fields to more conventional arrays that are compatible with `LinearAlgebra.jl`. The function `column_field2array` converts a matrix field defined on a column space to a `BandedMatrix` from `BandedMatrices.jl`, and it also converts a regular field defined on a column space to a `Vector`. The function `field2arrays` works for fields defined on multiple columns by calling `column_field2array` on each column. 
- Matrix fields now get displayed in the REPL as actual matrices. Specifically, when a matrix field contains `Number` entries, its first column is passed to `column_field2array_view`, which is identical to `column_field2array`, except that it allocates less memory by returning a view of the field's underlying data. Although `column_field2array` and `column_field2array_view` work for matrix fields with `Tuple`/`NamedTuple` entries, the `show` method for a `BandedMatrix` crashes when displaying such entries. In addition, this `show` method results in somewhat unreadable output for other struct entries, like vectors and covectors. So, all non-`Number` matrix fields still get displayed in the REPL like regular fields.

### Related ClimaCore Changes
- To ensure that promotion and conversion of `BandMatrixRow` entries works correctly for nested `Tuple`/`NamedTuple` entries, the `rpromote_type` and `rconvert` functions have been added to RecursiveApply. The first function uses `rmaptype` to recursively call `promote_type`, and the second function uses `rmap` and `rzero` to make a more type-stable version of Julia's built-in `convert` function.
- To support manipulating matrices of covectors (and, in the future, matrices of higher-order cotensors), all basic arithmetic operations have been defined for `AdjointAxisTensor`. This includes `+`, `-`, `*`, `/`, `\`, and `==`, as well as the `zero` function. The new methods undo the adjoint and fall back to the methods for `AxisTensor`.
- To simplify the setup of the non-scalar matrix tests, `map` has been generalized from only working for a single `Field` to working for multiple `Field`s.

### Tests
- All of the following test files check for correctness, allocations, and type instabilities.
- The file `test/MatrixFields/band_matrix_row.jl` tests low-level operations with `BandMatrixRow`s, ensuring that constructors, conversions, arithmetic operations, and combinations with `LinearAlgebra.I` work for numbers and nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/rmul_with_projection.jl` tests `rmul_with_projection` for all currently supported combinations of scalars, vectors, tensors, and covectors, as well as several combinations of nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/field2array.jl` checks that `column_field2array`, `column_field2array_view`, and `field2arrays` work as expected for very simple matrix fields.
- The file `test/MatrixFields/matrix_field_broadcasting.jl` tests a wide range of scalar and non-scalar matrix field broadcast expressions. It compares each scalar matrix field broadcast expression against an equivalent implementation using `field2arrays` and LinearAlgebra's `mul!`, making sure that the matrix field implementation is roughly as performant as the array implementation. It also compares each non-scalar matrix field broadcast expression (e.g., an expression with matrices of covectors or `NamedTuple`s) against an equivalent scalar matrix field implementation.
- I have also checked that the new implementation is at least as performant as the pre-existing implementation for those cases that the pre-existing API can support. For example, to compare the two implementations for the "diagonal matrix times bi-diagonal matrix times tri-diagonal matrix times quad-diagonal matrix times vector" test case, run the following code after `test/MatrixFields/matrix_field_broadcasting.jl`:
    ```
    using ClimaCore: MatrixFields, Operators
    function compare_to_old_implementation()
        ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec, ᶠvec = random_test_fields(Float64)
        result = `@.` ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        inputs = (ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec)
        func!(result, ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) =
            `@.` result = ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        time, allocs = `@benchmark` call_func(func!, result, inputs)
        
        to_old_version(row::BMR) where {BMR <: MatrixFields.BandMatrixRow} =
            Operators.StencilCoefs{MatrixFields.outer_diagonals(BMR)...}(row.entries)
        apply = Operators.ApplyStencil()
        compose = Operators.ComposeStencils()
        ᶜᶜmat_old = to_old_version.(ᶜᶜmat)
        ᶜᶠmat_old = to_old_version.(ᶜᶠmat)
        ᶠᶠmat_old = to_old_version.(ᶠᶠmat)
        ᶠᶜmat_old = to_old_version.(ᶠᶜmat)
        result_old = `@.` apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        inputs_old = (ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec)
        func!_old(result_old, ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) =
            `@.` result_old = apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        time_old, allocs_old = `@benchmark` call_func(func!_old, result_old, inputs_old)

        `@assert` allocs == allocs_old == 0

        `@info` "New implementation time: $time s"
        `@info` "Old implementation time: $time_old s"
    end
    compare_to_old_implementation()
    ```
    This code prints out the following:
    ```
    [ Info: New implementation time: 1.5848e-5 s
    [ Info: Old implementation time: 1.5283e-5 s
    ```

Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Jul 21, 2023
1326: Add new MatrixFields module, along with unit tests and performance tests r=dennisYatunin a=dennisYatunin

# Purpose

This PR encompasses the first two tasks of the implicit interface updates [SDI](#1230). That is, it refactors `StencilCoefs` into `BandMatrixRow` and `ApplyStencil`/`ComposeStencil` into `MultiplyColumnwiseBandMatrixField`, and it also adds a comprehensive set of tests. In the process of refactoring, the code for low-level matrix operations has been simplified and useful new functionality has been added. In order to avoid conflicts with pre-existing code, the new data structures and functions have been placed in a separate module called `MatrixFields`; this module can be removed once we are ready to replace all of the old code.

## Content

### API improvements
- Matrix field rows now have simple constructors (e.g., `DiagonalMatrixRow` and `TridiagonalMatrixRow`) that can be used to construct matrix fields on the fly in broadcast expressions.
- Matrix fields are now automatically promoted to common types for basic arithmetic operations. This involves promoting the entries in the rows, as well as padding rows with zeros when rows have unequal numbers of entries. In addition, `LinearAlgebra.I` (or, rather, `LinearAlgebra.UniformScaling`) is automatically promoted to a `DiagonalMatrixRow` when used with `BandMatrixRow`s. So, for example, it is now possible to evaluate `DiagonalMartrixRow(1) - 2 * TridiagonalMatrixRow(3, 4, 5) - I / 6`.
- Matrix fields are now fully integrated with `RecursiveApply`, which allows a single matrix field to store multiple matrices by using `Tuple`s or `NamedTuple`s as entries instead of scalars. This is achieved by using the new `rzero` function instead of `zero`, and also by making `+`, `-`, `*`, and `/` call `radd`, `rsub`, `rmul`, and `rdiv` when used on `BandMatrixRow`s. This change will make it possible to, e.g., store the derivatives of all tracer tendencies with respect to vertical velocity as a single matrix field in `ClimaAtmos`, which will lead to simpler code and better performance.
- All matrix-matrix and matrix-vector multiplications can now be handled using the `⋅` operator, which is an alias for `MultiplyColumnwiseBandMatrixField()`. The implementation of this operator is much simpler than the implementation of `ApplyStencil`/`ComposeStencil`, and it is effectively localized to the single function `multiply_matrix_at_index`. In order to avoid inference failures for complex matrix field broadcast expressions, the recursion limit for this function needed to be disabled. Unfortunately, just as with the pre-existing code, inlining this function with ``@propagate_inbounds`` drastically increased compilation time, and it also caused inference failures for particularly complicated matrix broadcast expressions. Although not using ``@propagate_inbounds`` causes a 3--4x slowdown in the evaluation of matrix broadcast expressions, it avoids a 10--100x slowdown in their compilation.
- The matrix of the divergence operator can now be properly represented as a matrix of covectors, rather than a matrix of scalars. Since the divergence operator turns vectors into scalars, multiplying its matrix by a matrix/vector of vectors should return a matrix/vector of scalars. However, this is not as simple as just multiplying a covector by a vector and returning a scalar, since the vector may need to be projected onto the dual axis of the covector before the multiplication. So, matrix-matrix and matrix-vector multiplication is now implemented using the `rmul_with_projection` function instead of `rmul`. In general, this function works just like `rmul`, but, when the first argument is a covector (an `AdjointAxisVector`) and the second is a vector or higher-order tensor (an `AxisTensor`), it uses the local geometry to project the second argument before the multiplication. In the future, we may want to generalize this to higher-order cotensors.
- The `MatrixFields` module also includes functions for converting matrix fields to more conventional arrays that are compatible with `LinearAlgebra.jl`. The function `column_field2array` converts a matrix field defined on a column space to a `BandedMatrix` from `BandedMatrices.jl`, and it also converts a regular field defined on a column space to a `Vector`. The function `field2arrays` works for fields defined on multiple columns by calling `column_field2array` on each column. 
- Matrix fields now get displayed in the REPL as actual matrices. Specifically, when a matrix field contains `Number` entries, its first column is passed to `column_field2array_view`, which is identical to `column_field2array`, except that it allocates less memory by returning a view of the field's underlying data. Although `column_field2array` and `column_field2array_view` work for matrix fields with `Tuple`/`NamedTuple` entries, the `show` method for a `BandedMatrix` crashes when displaying such entries. In addition, this `show` method results in somewhat unreadable output for other struct entries, like vectors and covectors. So, all non-`Number` matrix fields still get displayed in the REPL like regular fields.

### Related ClimaCore Changes
- To ensure that promotion and conversion of `BandMatrixRow` entries works correctly for nested `Tuple`/`NamedTuple` entries, the `rpromote_type` and `rconvert` functions have been added to RecursiveApply. The first function uses `rmaptype` to recursively call `promote_type`, and the second function uses `rmap` and `rzero` to make a more type-stable version of Julia's built-in `convert` function.
- To support manipulating matrices of covectors (and, in the future, matrices of higher-order cotensors), all basic arithmetic operations have been defined for `AdjointAxisTensor`. This includes `+`, `-`, `*`, `/`, `\`, and `==`, as well as the `zero` function. The new methods undo the adjoint and fall back to the methods for `AxisTensor`.
- To simplify the setup of the non-scalar matrix tests, `map` has been generalized from only working for a single `Field` to working for multiple `Field`s.

### Tests
- All of the following test files check for correctness, allocations, and type instabilities.
- The file `test/MatrixFields/band_matrix_row.jl` tests low-level operations with `BandMatrixRow`s, ensuring that constructors, conversions, arithmetic operations, and combinations with `LinearAlgebra.I` work for numbers and nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/rmul_with_projection.jl` tests `rmul_with_projection` for all currently supported combinations of scalars, vectors, tensors, and covectors, as well as several combinations of nested `Tuple`/`NamedTuple` values.
- The file `test/MatrixFields/field2array.jl` checks that `column_field2array`, `column_field2array_view`, and `field2arrays` work as expected for very simple matrix fields.
- The file `test/MatrixFields/matrix_field_broadcasting.jl` tests a wide range of scalar and non-scalar matrix field broadcast expressions. It compares each scalar matrix field broadcast expression against an equivalent implementation using `field2arrays` and LinearAlgebra's `mul!`, making sure that the matrix field implementation is roughly as performant as the array implementation. It also compares each non-scalar matrix field broadcast expression (e.g., an expression with matrices of covectors or `NamedTuple`s) against an equivalent scalar matrix field implementation.
- I have also checked that the new implementation is at least as performant as the pre-existing implementation for those cases that the pre-existing API can support. For example, to compare the two implementations for the "diagonal matrix times bi-diagonal matrix times tri-diagonal matrix times quad-diagonal matrix times vector" test case, run the following code after `test/MatrixFields/matrix_field_broadcasting.jl`:
    ```
    using ClimaCore: MatrixFields, Operators
    function compare_to_old_implementation()
        ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec, ᶠvec = random_test_fields(Float64)
        result = `@.` ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        inputs = (ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec)
        func!(result, ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) =
            `@.` result = ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec
        time, allocs = `@benchmark` call_func(func!, result, inputs)
        
        to_old_version(row::BMR) where {BMR <: MatrixFields.BandMatrixRow} =
            Operators.StencilCoefs{MatrixFields.outer_diagonals(BMR)...}(row.entries)
        apply = Operators.ApplyStencil()
        compose = Operators.ComposeStencils()
        ᶜᶜmat_old = to_old_version.(ᶜᶜmat)
        ᶜᶠmat_old = to_old_version.(ᶜᶠmat)
        ᶠᶠmat_old = to_old_version.(ᶠᶠmat)
        ᶠᶜmat_old = to_old_version.(ᶠᶜmat)
        result_old = `@.` apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        inputs_old = (ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec)
        func!_old(result_old, ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) =
            `@.` result_old = apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec)
        time_old, allocs_old = `@benchmark` call_func(func!_old, result_old, inputs_old)

        `@assert` allocs == allocs_old == 0

        `@info` "New implementation time: $time s"
        `@info` "Old implementation time: $time_old s"
    end
    compare_to_old_implementation()
    ```
    This code prints out the following:
    ```
    [ Info: New implementation time: 1.5848e-5 s
    [ Info: Old implementation time: 1.5283e-5 s
    ```

Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Aug 14, 2023
1399: Add operator_matrix to MatrixFields, along with tests and docs r=dennisYatunin a=dennisYatunin

## Purpose

Second PR of #1230. Refactors what is currently in `src/operators/operator2stencil.jl`.

## Content

- Adds the function `operator_matrix(op)`, which will replace `Operator2Stencil(op)`. This function has a detailed docstring and throws descriptive errors for operators without well-defined operator matrices.
- Improves the usability of operator matrices. With this new interface, users will no longer need to specify intermediate fields to compute operator matrices. For example, with our pre-existing code, the matrix of the interpolation operator (a bidiagonal matrix whose entries are `-1/2` and `+1/2`) needs to be computed with ``@.` matrix_field = interp_matrix(ones_field)`, where `ones_field` is a field filled with the number `1` that is used to infer the space and entry type of the matrix. Now, this matrix can just be computed with ``@.` matrix_field = interp_matrix()`. In order to make this work, `interp_matrix` is now defined as a "lazy operator". When a broadcast expression containing lazy operators is evaluated, each lazy operator is replaced with an actual operator, and it is given one or more fields as input arguments. In this case, `interp_matrix` is given the local geometry field as an input argument, and this field is used to infer the space and entry type of the operator matrix.
- This usability improvement slightly changes the computation of derivative matrices. With our pre-existing code, ``@.` op(func(field))` is equivalent to ``@.` op_matrix(ones_field) ⋅ func(field)`, and the derivative of this expression with respect to `field` can be specified as ``@.` op_matrix(func'(field))`, where `func'` is the derivative of the point-wise function `func`. With this new interface, ``@.` op(func(field))` is equivalent to ``@.` op_matrix() ⋅ func(field)`, and the derivative can be specified as ``@.` op_matrix() ⋅ DiagonalMatrixRow(func'(field))`. Similarly, the derivative of ``@.` op2(func2(op1(func1(field))))` with respect to `field` is `op2_matrix(func2'(op1(func1(field)))) ⋅ op1_matrix(func1'(field))` with our pre-existing code and `op2_matrix() ⋅ DiagonalMatrixRow(func2'(op1(func1(field)))) ⋅ op1_matrix() ⋅ DiagonalMatrixRow(func1'(field))` with the new interface. Although the new interface leads to longer derivative expressions, those expressions are more similar to how the chain rule is usually written out, and they can be debugged/analyzed more incrementally.
- Adds support for computing operator matrices of multi-argument operators. For example, if `op` is the `Upwind3rdOrderBiasedProductC2F` operator, then ``@.` op(velocity_field, tracer_field)` is equivalent to ``@.` op_matrix(velocity_field) ⋅ tracer_field`. The implementation is similar to that of single-argument operators, except that it does not require the use of "lazy operators" (since there is already a field being passed to the operator matrix, the local geometry field can be obtained from that field during the evaluation of `Base.Broadcast.broadcasted`).
- Adds support for computing operator matrices of operators with `Extrapolate` boundary conditions. These boundary conditions cause the matrices to have larger bandwidths than other boundary conditions.
- Tests the `operator_matrix` function with every valid combination of finite difference operators and boundary conditions. The tests check for correctness, type stability, and lack of allocations. The tests are run on both CPUs and GPUs. In addition, the tests print out how the performance of ``@.` op_matrix() ⋅ field` compares to the performance of ``@.` op(field)`; the two expressions are similarly fast on GPUs (between `-70%` and `+40%` relative change in speed), though the operator matrix expressions tend to be slower on CPUs.
- Tests a few more complicated broadcast expressions involving products and linear combinations of operator matrices. These tests indicate that operator matrices are similarly performant to regular matrix fields.
- Modifies `test_field_broadcast` so that it also tests whether `get_result` generates the same result as `set_result!`.
- Modifies the `*` method for `BandMatrixRow` so that matrix fields can be scaled by vectors/covectors in addition to numbers. This simplifies a few of the complicated broadcast tests.
- Fixes a typo (`Geometery`) in the method of `stencil_left_boundary` for `GradientF2C`.
- Adds a missing method to `rpromote_type` that was preventing empty matrix rows from being constructed.
- Modifies the `Base.Broadcast.broadcasted` method for `FiniteDifferenceOperator` and `SpectralElementOperator` so that lazy operators can work correctly (the original versions of these methods would always overwrite the `LazyOperatorStyle` with `StencilStyle` or `SpectralStyle`, respectively).
- Unfortunately, this PR also adds 2 method invalidations. These are due to a new definition of `broadcasted` for lazy operators:
    ```
    Base.Broadcast.broadcasted(::LazyOperatorStyle, f::F, args...) where {F} =
        LazyOperatorBroadcasted(f, args)
    ```
    As explained in an accompanying code comment, removing this method requires modifying several other method definitions, and one of these modifications adds 11 invalidations. So, there doesn't seem to be a good way to avoid these 2 invalidations.

Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 10, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
bors bot added a commit that referenced this issue Oct 11, 2023
1436: Add FieldMatrix and linear solvers r=dennisYatunin a=dennisYatunin

## Purpose

Third PR of #1230. Adds an interface for specifying block matrices of matrix fields and solving linear systems with these matrices. This will replace what is currently in `ClimaAtmos.jl/src/prognostic_equations/implicit/schur_complement_W.jl`, generalizing it for implicit diffusion and implicit EDMF.

## Content

### Main Changes

- Add the `FieldName` struct, which is a singleton type that represents a chain of `getproperty` calls
  - Add the ``@name`` macro for constructing `FieldName`s, which checks whether its input expression is a syntactically valid chain of `getproperty` calls before calling the default constructor.
  - A `name` can be used to access a property or sub-property of an object `x` by calling `get_field(x, name)`.
  - An `internal_name` can be appended onto another `name` in order to access a property or sub-property of `get_field(x, name)`.
- Add the `FieldNameDict` struct, which maps each key in a set of `FieldVectorKeys` or `FieldMatrixKeys` (see below) to a `Field` or some other object.
  - There are currently four subtypes of `FieldNameDict`:
    - `FieldMatrix` (the only user-facing subtype), which maps `NTuple{2, FieldName}`s to `ColumnwiseBandMatrixField`s or multiples of `LinearAlgebra.I`
    - `FieldVectorView`, which maps `FieldName`s to `Field`s; this is used to wrap a `FieldVector` so that it can be used in conjunction with a `FieldMatrix`
    - `FieldVectorViewBroadcasted` and `FieldMatrixBroadcasted`, each of which can store unevaluated `Base.AbstractBroadcasted` objects, in addition to what `FieldVectorView` and `FieldMatrix` can already store
  - Supports standard `AbstractDict` functions like `keys` and `pairs`.
  - An individual block of a `FieldNameDict` can be accessed by calling `dict[key]`, and a range of blocks can be accessed by calling `dict[set]`, where `set` is a `FieldNameSet`.
  - Given a `FieldMatrix` `A`, a similar matrix that only contains identity matrix blocks can be constructed with `one(A)`.
  - `FieldNameDict`s can be used in broadcast expressions, which support the following operations:
    - `+`, `-`, or `*`, where each input is either a `FieldNameDict` or a `FieldVector`
    - `inv`, where the input is a diagonal `FieldMatrix`
  - The new methods for `Base.Broadcast.broadcasted` construct chains of `Field` broadcast expressions from `FieldNameDict` broadcast expressions on the fly, somewhat similarly to how broadcasting works for ClimaCore operators.
- Add the `FieldMatrixSolver` struct, which solves an equation of the form `A * x = b`, where `A` is a `FieldMatrix` and where `x` and `b` are `FieldVector`s.
  - Add the `field_matrix_solve!` function, which works just like `ldiv!(x, A, b)`, except that it also takes a `FieldMatrixSolver` as its first argument.
  - Add four `FieldMatrixSolverAlgorithm`s, which can be nested inside of each other to build up more specialized algorithms:
    - `BlockDiagonalSolve`, which runs a "single field solver" for each block of the block diagonal matrix `A`; the single field solver can handle the four types of blocks:
      - Multiples of `LinearAlgebra.I`
      - Diagonal `ColumnwiseBandMatrixField`s
      - Tri-diagonal `ColumnwiseBandMatrixField`s (implementation of the Thomas algorithm)
      - Penta-diagonal `ColumnwiseBandMatrixField`s (implementation of the PTRANS-I algorithm)
    - `BlockLowerTriangularSolve`, which uses forward substitution to solve the equation for a block lower triangular matrix `A`
    - `SchurComplementSolve`, which generalizes what is currently in ClimaAtmos's `schur_complement_W.jl` file to any block matrix `A` with a diagonal block in the top-left corner
    - `ApproximateFactorizationSolve`, which lets us use "operator splitting" to approximately solve the equation for a diagonally dominant block matrix `A`
- Add documentation for how to specify a `FieldMatrix` and use it in a linear solver, along with internal documentation for the new `FieldName`-based infrastructure.
- Add unit tests for correctness, type stability, and allocations, and run them on both CPUs and GPUs through CI.
  - Test each single field solver on both a cell-center and a cell-face field.
  - Test each `FieldMatrixSolverAlgorithm` on block diagonal, block lower triangular, and block dense matrices.
  - Test solvers with identical structures to what we will use in ClimaAtmos for the following examples:
    - Dry dycore with implicit acoustic waves
    - Dry dycore with implicit acoustic waves and diffusion
    - Dry dycore + prognostic EDMF with implicit acoustic waves and SGS fluxes
    - Moist dycore + prognostic EDMF + tracers with implicit acoustic waves and SGS fluxes

### Internal Chagnes

- Add a collection of "unrolled functions", whose return values can be inferred during compilation if their input values are all singleton types.
  - These are all implemented as combinations of `unrolled_zip`, `unrolled_map`, and `unrolled_foldl`.
  - Several of these need to have their recursion limits disabled for the unit tests to be type stable.
- Add the `FieldNameTree` struct, which stores every `FieldName` that can be used to access `x` with `get_field(x, name)`.
  - A `name` can be checked for validity by calling `has_subtree_at_name(tree, name)`.
  - The children of `name` (the `FieldName`s that can be used to access the properties of `get_field(x, name)`) can be obtained by calling `child_names(name, tree)`.
- Add the `FieldNameSet` struct, which stores a set of `FieldVectorKeys` (each of which is a `FieldName`) or a set of `FieldMatrixKeys` (each of which is an `NTuple{2, FieldName}`).
  - Roughly equivalent to the built-in `KeySet` for `AbstractDict`s, but specialized for `FieldNameDict`s.
  - Supports standard `AbstractSet` functions like `union` and `setdiff`, as well as custom functions like `set_complement` and `matrix_product_keys`.
  - Handles overlaps between `FieldName`s (that is, situations where one property of `x` lies inside another property of `x`) by storing a `FieldNameTree` that contains all available `FieldName`s.
- Disable the recursion limits for several functions used to manipulate `FieldName`s, `FieldNameTree`s, and `FieldNameSet`s, as this is necessary in order for the unit tests to be type stable.
- Remove the methods for `RecursiveApply.rmul` that specialize on `Number`, which is also necessary in order for the unit tests to be type stable.
  - These methods are no longer required, now that #1454 has been merged in.
- Add support for calling `inv` on `BandMatrixRow`s.
- Qualify the use of `CUDA.`@allowscalar`,` per Charlie's suggestion.
- Fix some type instabilities in `matrix_field_test_utils.jl`.
- Remove an unused variable name in `operator_matrices.jl`.


Co-authored-by: Dennis Yatunin <[email protected]>
@szy21 szy21 closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDI Software Development Issue
Projects
None yet
Development

No branches or pull requests

7 participants