Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1326: Add new MatrixFields module, along with unit tests and performance tests r=dennisYatunin a=dennisYatunin # Purpose This PR encompasses the first two tasks of the implicit interface updates [SDI](#1230). That is, it refactors `StencilCoefs` into `BandMatrixRow` and `ApplyStencil`/`ComposeStencil` into `MultiplyColumnwiseBandMatrixField`, and it also adds a comprehensive set of tests. In the process of refactoring, the code for low-level matrix operations has been simplified and useful new functionality has been added. In order to avoid conflicts with pre-existing code, the new data structures and functions have been placed in a separate module called `MatrixFields`; this module can be removed once we are ready to replace all of the old code. ## Content ### API improvements - Matrix field rows now have simple constructors (e.g., `DiagonalMatrixRow` and `TridiagonalMatrixRow`) that can be used to construct matrix fields on the fly in broadcast expressions. - Matrix fields are now automatically promoted to common types for basic arithmetic operations. This involves promoting the entries in the rows, as well as padding rows with zeros when rows have unequal numbers of entries. In addition, `LinearAlgebra.I` (or, rather, `LinearAlgebra.UniformScaling`) is automatically promoted to a `DiagonalMatrixRow` when used with `BandMatrixRow`s. So, for example, it is now possible to evaluate `DiagonalMartrixRow(1) - 2 * TridiagonalMatrixRow(3, 4, 5) - I / 6`. - Matrix fields are now fully integrated with `RecursiveApply`, which allows a single matrix field to store multiple matrices by using `Tuple`s or `NamedTuple`s as entries instead of scalars. This is achieved by using the new `rzero` function instead of `zero`, and also by making `+`, `-`, `*`, and `/` call `radd`, `rsub`, `rmul`, and `rdiv` when used on `BandMatrixRow`s. This change will make it possible to, e.g., store the derivatives of all tracer tendencies with respect to vertical velocity as a single matrix field in `ClimaAtmos`, which will lead to simpler code and better performance. - All matrix-matrix and matrix-vector multiplications can now be handled using the `⋅` operator, which is an alias for `MultiplyColumnwiseBandMatrixField()`. The implementation of this operator is much simpler than the implementation of `ApplyStencil`/`ComposeStencil`, and it is effectively localized to the single function `multiply_matrix_at_index`. In order to avoid inference failures for complex matrix field broadcast expressions, the recursion limit for this function needed to be disabled. Unfortunately, just as with the pre-existing code, inlining this function with ``@propagate_inbounds`` drastically increased compilation time, and it also caused inference failures for particularly complicated matrix broadcast expressions. Although not using ``@propagate_inbounds`` causes a 3--4x slowdown in the evaluation of matrix broadcast expressions, it avoids a 10--100x slowdown in their compilation. - The matrix of the divergence operator can now be properly represented as a matrix of covectors, rather than a matrix of scalars. Since the divergence operator turns vectors into scalars, multiplying its matrix by a matrix/vector of vectors should return a matrix/vector of scalars. However, this is not as simple as just multiplying a covector by a vector and returning a scalar, since the vector may need to be projected onto the dual axis of the covector before the multiplication. So, matrix-matrix and matrix-vector multiplication is now implemented using the `rmul_with_projection` function instead of `rmul`. In general, this function works just like `rmul`, but, when the first argument is a covector (an `AdjointAxisVector`) and the second is a vector or higher-order tensor (an `AxisTensor`), it uses the local geometry to project the second argument before the multiplication. In the future, we may want to generalize this to higher-order cotensors. - The `MatrixFields` module also includes functions for converting matrix fields to more conventional arrays that are compatible with `LinearAlgebra.jl`. The function `column_field2array` converts a matrix field defined on a column space to a `BandedMatrix` from `BandedMatrices.jl`, and it also converts a regular field defined on a column space to a `Vector`. The function `field2arrays` works for fields defined on multiple columns by calling `column_field2array` on each column. - Matrix fields now get displayed in the REPL as actual matrices. Specifically, when a matrix field contains `Number` entries, its first column is passed to `column_field2array_view`, which is identical to `column_field2array`, except that it allocates less memory by returning a view of the field's underlying data. Although `column_field2array` and `column_field2array_view` work for matrix fields with `Tuple`/`NamedTuple` entries, the `show` method for a `BandedMatrix` crashes when displaying such entries. In addition, this `show` method results in somewhat unreadable output for other struct entries, like vectors and covectors. So, all non-`Number` matrix fields still get displayed in the REPL like regular fields. ### Related ClimaCore Changes - To ensure that promotion and conversion of `BandMatrixRow` entries works correctly for nested `Tuple`/`NamedTuple` entries, the `rpromote_type` and `rconvert` functions have been added to RecursiveApply. The first function uses `rmaptype` to recursively call `promote_type`, and the second function uses `rmap` and `rzero` to make a more type-stable version of Julia's built-in `convert` function. - To support manipulating matrices of covectors (and, in the future, matrices of higher-order cotensors), all basic arithmetic operations have been defined for `AdjointAxisTensor`. This includes `+`, `-`, `*`, `/`, `\`, and `==`, as well as the `zero` function. The new methods undo the adjoint and fall back to the methods for `AxisTensor`. - To simplify the setup of the non-scalar matrix tests, `map` has been generalized from only working for a single `Field` to working for multiple `Field`s. ### Tests - All of the following test files check for correctness, allocations, and type instabilities. - The file `test/MatrixFields/band_matrix_row.jl` tests low-level operations with `BandMatrixRow`s, ensuring that constructors, conversions, arithmetic operations, and combinations with `LinearAlgebra.I` work for numbers and nested `Tuple`/`NamedTuple` values. - The file `test/MatrixFields/rmul_with_projection.jl` tests `rmul_with_projection` for all currently supported combinations of scalars, vectors, tensors, and covectors, as well as several combinations of nested `Tuple`/`NamedTuple` values. - The file `test/MatrixFields/field2array.jl` checks that `column_field2array`, `column_field2array_view`, and `field2arrays` work as expected for very simple matrix fields. - The file `test/MatrixFields/matrix_field_broadcasting.jl` tests a wide range of scalar and non-scalar matrix field broadcast expressions. It compares each scalar matrix field broadcast expression against an equivalent implementation using `field2arrays` and LinearAlgebra's `mul!`, making sure that the matrix field implementation is roughly as performant as the array implementation. It also compares each non-scalar matrix field broadcast expression (e.g., an expression with matrices of covectors or `NamedTuple`s) against an equivalent scalar matrix field implementation. - I have also checked that the new implementation is at least as performant as the pre-existing implementation for those cases that the pre-existing API can support. For example, to compare the two implementations for the "diagonal matrix times bi-diagonal matrix times tri-diagonal matrix times quad-diagonal matrix times vector" test case, run the following code after `test/MatrixFields/matrix_field_broadcasting.jl`: ``` using ClimaCore: MatrixFields, Operators function compare_to_old_implementation() ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec, ᶠvec = random_test_fields(Float64) result = `@.` ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec inputs = (ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) func!(result, ᶜᶜmat, ᶜᶠmat, ᶠᶠmat, ᶠᶜmat, ᶜvec) = `@.` result = ᶜᶜmat ⋅ ᶜᶠmat ⋅ ᶠᶠmat ⋅ ᶠᶜmat ⋅ ᶜvec time, allocs = `@benchmark` call_func(func!, result, inputs) to_old_version(row::BMR) where {BMR <: MatrixFields.BandMatrixRow} = Operators.StencilCoefs{MatrixFields.outer_diagonals(BMR)...}(row.entries) apply = Operators.ApplyStencil() compose = Operators.ComposeStencils() ᶜᶜmat_old = to_old_version.(ᶜᶜmat) ᶜᶠmat_old = to_old_version.(ᶜᶠmat) ᶠᶠmat_old = to_old_version.(ᶠᶠmat) ᶠᶜmat_old = to_old_version.(ᶠᶜmat) result_old = `@.` apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec) inputs_old = (ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) func!_old(result_old, ᶜᶜmat_old, ᶜᶠmat_old, ᶠᶠmat_old, ᶠᶜmat_old, ᶜvec) = `@.` result_old = apply(compose(compose(compose(ᶜᶜmat_old, ᶜᶠmat_old), ᶠᶠmat_old), ᶠᶜmat_old), ᶜvec) time_old, allocs_old = `@benchmark` call_func(func!_old, result_old, inputs_old) `@assert` allocs == allocs_old == 0 `@info` "New implementation time: $time s" `@info` "Old implementation time: $time_old s" end compare_to_old_implementation() ``` This code prints out the following: ``` [ Info: New implementation time: 1.5848e-5 s [ Info: Old implementation time: 1.5283e-5 s ``` Co-authored-by: Dennis Yatunin <[email protected]>
- Loading branch information