Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc tasks (in the lack of a better name) #3

Closed
amartinhuertas opened this issue Apr 28, 2020 · 5 comments
Closed

Misc tasks (in the lack of a better name) #3

amartinhuertas opened this issue Apr 28, 2020 · 5 comments
Assignees

Comments

@amartinhuertas
Copy link
Member

amartinhuertas commented Apr 28, 2020

I would suggest the following tasks for @amartinhuertas

  • Create a new version of the assembling process that makes use of a GridPortion based on the owned cells (thus generating the owned vefs of lower dim) and thus consider a sub-assembled local matrix, which can be used for DD methods, we could provide a flag when creating the distributed fe space to decide which one use, or even compare both implementations (adding a comm in the subassembled one). I think he can learn many things about Gridap in this process (possibly related to some extent to issue Implement localize function to remove ghost cells of a given triangulation #2)

  • Create a new version of the current approach for multifield that only does two comms, and extend the sub-assembled approach + comm for this case too

@fverdugo
Copy link
Member

I have implemented function remove_ghost_cellsin order to remove ghost cells from a given distributed triangulation. I have also implemented include_ghost_cells which reverts a previous removal of ghost cells.

I have implemented them only for volume triangulations since it was 4 lines of code and I needed them to have a first working driver. @amartinhuertas you can try to extend the functions to boundary and skeleton triangulations if you wish.

@amartinhuertas amartinhuertas self-assigned this May 2, 2020
@amartinhuertas
Copy link
Member Author

Hi @fverdugo, @santiagobadia,

I have finished a first prototype of the distributed (still sequential, though) algorithm for integration+assembly restricted to owned cells. By now, the code implementing this algorithm is provided in a separate source file which has not been yet integrated into the software architecture of GridapDistributed.jl (because of reasons made clear below). In particular, you will find it in the following file:

https://github.com/gridap/GridapDistributed.jl/blob/distributed_integration_restricted_to_owned_cells/distributed_integration_restricted_to_owned_cells.jl

I had to implement algorithm in a separate file because I could not (or, perhaps, I did not find how to) accommodate it (6-step sketch available here) into the current design pattern chosen for the assemble_matrix_and_vector+DistributedAssembler type dispatch combination, available here.

Thus, I propose that the central topic in our next meeting, @fverdugo, will be to go over this file, and to determine whether it can be actually accommodated into the software architecture of GridadDistributed.jl, and if not, how the latter should be adapted to the requirements of this code.

Other issues to be discussed: (will be made clear during the meeting, hard to explain here):

  • How to determine the communication pattern within assemble_vector without having access to the raw entries (before local sub-assembly) of the vector.

@amartinhuertas

@fverdugo
Copy link
Member

@amartinhuertas I would say that the integration+assembly restricted to owned cells can be implemented in the current framework, and it can be done in few lines of code (it only requires to write a very simple specialization of AssemblyStrategy and to use the filter of ghost cells that we have already in place).

Provably we had two different things in mind when talking about this topic. We can clarify them in the next meeting.

@fverdugo
Copy link
Member

Sketch of the code I was thinking of. (commented during meeting with @amartinhuertas on 2020-5-25)

With this we control how and where (in which part) the row nnz entries are computed. How to assemble a distributed PETSc matrix is the next step.

struct OwnedCellsStrategy <: AssemblyStrategy
  part::Int
  gids::IndexSet
end

function Gridap.FESpaces.row_map(a::OwnedCellsStrategy,row)
  a.gids.lid_to_gid[row]
end

function Gridap.FESpaces.col_map(a::OwnedCellsStrategy,col)
  a.gids.lid_to_gid[col]
end

function Gridap.FESpaces.row_mask(a::OwnedCellsStrategy,row)
  true
end

function Gridap.FESpaces.col_mask(a::OwnedCellsStrategy,col)
  true
end

function remove_ghost_cells_when_needed(strategy::AssemblyStrategy,trian::Triangulation)
  @abstractmethod
end

function remove_ghost_cells_when_needed(strategy::RowsComputedLocally,trian::Triangulation)
  trian
end

function remove_ghost_cells_when_needed(strategy::OwnedCellsStrategy,trian::Triangulation)
  remove_ghost_cells(trian,strategy.part,strategy.gids)
end

function Triangulation(strategy::AssemblyStrategy,model::DiscreteModel,args...)
  trian = Triangulation(model,args...)
  remove_ghost_cells_when_needed(strategy,trian)
end

function BoundaryTriangulation(strategy::AssemblyStrategy,model::DiscreteModel,args...)
  trian = BoundaryTriangulation(model,args...)
  remove_ghost_cells_when_needed(strategy,trian)
end

function SkeletonTriangulation(strategy::AssemblyStrategy,model::DiscreteModel,args...)
  trian = SkeletonTriangulation(model,args...)
  remove_ghost_cells_when_needed(strategy,trian)
end

function OwnedCellsStrategy(V::DistributedFESpace)
  dgids = V.gids
  strategies = DistributedData(dgids) do part, gids
    OwnedCellsStrategy(part,gids)
  end
  DistributedAssemblyStrategy(strategies)
end

module DistributedPoissonDGTests

using Test
using Gridap
using Gridap.FESpaces
using GridapDistributed
using SparseArrays

# Select matrix and vector types for discrete problem
# Note that here we use serial vectors and matrices
# but the assembly is distributed
T = Float64
vector_type = Vector{T}
matrix_type = SparseMatrixCSC{T,Int}

# Manufactured solution
u(x) = x[1]*(x[1]-1)*x[2]*(x[2]-1)
f(x) = - Δ(u)(x)
ud(x) = zero(x[1])

# Discretization
subdomains = (2,2)
domain = (0,1,0,1)
cells = (4,4)
comm = SequentialCommunicator(subdomains)
model = CartesianDiscreteModel(comm,subdomains,domain,cells)
const h = (domain[2]-domain[1]) / cells[1]

# FE Spaces
order = 2
V = FESpace(
  vector_type, valuetype=Float64, reffe=:Lagrangian, order=order,
  model=model, conformity=:L2)

U = TrialFESpace(V)

# Chose parallel assembly strategy
strategy = OwnedCellsStrategy(V)

# Terms in the weak form
terms = DistributedData(model,strategy) do part, (model,gids,strategy)

  γ = 10
  degree = 2*order

  trian = Triangulation(strategy,model)
  btrian = BoundaryTriangulation(strategy,model)
  strian = SkeletonTriangulation(strategy,model)

  quad = CellQuadrature(trian,degree)
  bquad = CellQuadrature(btrian,degree)
  squad = CellQuadrature(strian,degree)
  
  bn = get_normal_vector(btrian)
  sn = get_normal_vector(strian)

  a(u,v) = inner((v),(u))
  l(v) = v*f
  t_Ω = AffineFETerm(a,l,trian,quad)
  
  a_Γd(u,v) =/h)*v*u  - v*(bn*(u)) - (bn*(v))*u
  l_Γd(v) =/h)*v*ud - (bn*(v))*ud
  t_Γd = AffineFETerm(a_Γd,l_Γd,btrian,bquad)
  
  a_Γ(u,v) =/h)*jump(v*sn)*jump(u*sn) - jump(v*sn)*mean((u)) -  mean((v))*jump(u*sn)
  t_Γ = LinearFETerm(a_Γ,strian,squad)

  (t_Ω,t_Γ,t_Γd)
end


# Assembly
assem = SparseMatrixAssembler(matrix_type, vector_type, U, V, strategy)
op = AffineFEOperator(assem,terms)

# FE solution
op = AffineFEOperator(assem,terms)
uh = solve(op)

# Error norms and print solution
sums = DistributedData(model,uh) do part, (model,gids), uh

  trian = Triangulation(model)

  owned_trian = remove_ghost_cells(trian,part,gids)
  owned_quad = CellQuadrature(owned_trian,2*order)
  owned_uh = restrict(uh,owned_trian)

  writevtk(owned_trian,"results_$part",cellfields=["uh"=>owned_uh])

  e = u - owned_uh

  l2(u) = u*u
  h1(u) = u*u + (u)*(u)

  e_l2 = sum(integrate(l2(e),owned_trian,owned_quad))
  e_h1 = sum(integrate(h1(e),owned_trian,owned_quad))

  e_l2, e_h1
end

e_l2_h1 = gather(sums)

e_l2 = sum(map(i->i[1],e_l2_h1))
e_h1 = sum(map(i->i[2],e_l2_h1))

tol = 1.0e-9
@test e_l2 < tol
@test e_h1 < tol

end # module

@fverdugo
Copy link
Member

@amartinhuertas I attach here the other code snippet of today's meeting (how to incorporate a petsc matrix in the current framework.)

struct PetscMat
 # All the required fields here
end

function PetscMat(Alocal,gids_rows,gids_cols,complte_cols
  # All initialization work here
end

function allocate_coo_vectors(::Type{<PetscMat},dn)
  DistributedData(dn) do part, n
    I = zeros(Int,n)
    J = zeros(Int,n)
    V = zeros(Float32,n)
    (I,J,V)
  end
end

function sparse_from_coo(
  ::Type{<:PetscMat},dIJV,dgids_rows,dgids_cols)

  I,J,V = dIJV.part

  gids_rows = dgids_rows.part
  gids_cols = dgids_cols.part

  # convert to lids (perhaps not needed if we already assemble with lids)
  _I = [ gids_rows.gid_to_lid[i] for i in I ]
  _J = [ gids_cols.gid_to_lid[i] for i in J ]

  # Compress I,J,V (Sub-assembled matrix)
  nlrows = length(gids_rows.lid_to_gid)
  nlcols = length(gids_cols.lid_to_gid)
  Alocal = sparse_from_coo(SparseMatixCSR{0,Float64,Int32},_I,_J,V,nlrows,nlcols)

  # This will complete with comms the nnz pattern
  # and eventually call Petsc.Assembly if needed (e.g., when constraints are present)
  complete_cols = false
  PetscMat(Alocal,gids_rows,gids_cols,complete_cols)
end

amartinhuertas added a commit that referenced this issue Jun 24, 2020
MPIPETScCommunicator:

1. The current algorithm does not compute, before actual numerical assembly, the (full) sparsity pattern of the rows which are locally owned by each processor. Instead, we preallocate some storage for the matrix (with a rough, upper estimate of the number of nonzero elements per row), and entries are dynamically introduced into the matrix; see [here](https://github.com/gridap/GridapDistributed.jl/blob/f595e3ebb45c7777a25dded4a205b9a80195c7a9/src/MPIPETScDistributedAssemblersInterfaces.jl#L25) and [here](https://github.com/gridap/GridapDistributed.jl/blob/f595e3ebb45c7777a25dded4a205b9a80195c7a9/src/MPIPETScDistributedAssemblersInterfaces.jl#L35). Our experience with FEMPAR reveals that this has a number of practical performance and memory implications that we want to avoid in the final version of the algorithm (see also issue #9).

2. As a complementary side-note to 1., let me recall (to not forget/reuse) that: (1) we discussed [here](#3 (comment)) a possible way to accommodate the final/performant algorithm such that sticks into the current structure of `assembly_matrix_vector`; (2) I already wrote an implementation of this algorithm, but it does not stick to the current structure of `assembly_matrix_vector`, see [here](#3 (comment)) for more details; (3) in the most general case (non-conforming meshes, hanging DoF constraints), the final/performant algorithm requires to perform an ad-hoc communication among nearest neighbours in order to set up the sparsity pattern of the locally owned rows, see [here](https://github.com/gridap/GridapDistributed.jl/blob/distributed_integration_restricted_to_owned_cells/distributed_integration_restricted_to_owned_cells.jl#L275) for more details. At first glance, I do not see how it can be implemented given the aforementioned limitations of `MPIPETScDistributedVector` mentioned above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants