-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptivity + Stateful Materials + Repartitioning #4532
Comments
Found a test that is now tripping an assert 100% of the time when running on certain numbers of processors. stateful_adaptivity_test.i fails with n_cpus = 3: This assertion was only recently added as a sanity check here: |
Just to be clear: this is failing in the PR... right? Or do you mean that it is failing all the time (like with the current HEAD)? |
This is only failing with the PR. I thought it was head but I was mistaken. |
@roystgnr - where would I need to be poking around in libMesh to see how to transfer stateful material properties between processes upon repartitioning? Is there a pattern that you could point me toward that I could follow - e.g. like how elements are sent between processes? |
Vector data is transferred as part of System::project_vector(), a much more complex operation that probably isn't going to be analogous to anything you want to do.
Nodes and elements are copied in MeshCommunication::redistribute() and then (if now unnecessary on their origin processor) deleted in MeshCommunication::delete_remote_elements(). That code is probably both a half-decent example to follow and a good place to hook in to the system. If you add a ghosting functor to the mesh which overloads GhostingFunctor::redistribute() and GhostingFunctor::delete_remote_elements() then it'll get called right where you need it to be.
I think mesh_communication.C is decently commented code right now, but if you disagree don't hesitate to send me questions.
For the copy operation, we're using send_packed_range() etc. methods, which are necessary if your data is variable-length or overkill if not. To get up and running with those you could look at the unit test in tests/parallel/packed_range_test.C or at the Node and Elem implementations in src/parallel/parallel_{node,elem}.C. @permcody helped me make that framework more useful and is pretty familiar with it too now if you need somebody nearby to ask when I'm not online.
|
We currently index stateful data based on Elem pointers... that needs to change to indexing based on All of the stateful data is already serializable and can be sent as After the adaptivity system has done its thing you can go over the local active elements and look for any that have missing stateful data. There'll then need to be a broadcast where all procs ask for stateful data they are missing. Everyone can receive the lists and then look through their own stateful data (based on |
Otherwise we hit a "no Stateful Materials with Adaptivity and DistributedMesh" warning, presumably due to issue idaholab#4532.
Otherwise we hit a "no Stateful Materials with Adaptivity and DistributedMesh" warning, presumably due to issue idaholab#4532.
There are a number of places in the code and in input files where partitioning can be disabled, and just reporting "Partitioner: metis" is misleading if we're not actually using the partitioner. Refs idaholab#4532, for which I'm currently stripping the third layer of partitioning-disabling workaround off a test case...
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
* Improvements to mesh redistribution * `ReplicatedMesh` and serialized `DistributedMesh` objects now also call `GhostingFunctor::redistribute()` callbacks (enabling communication of distributed data like MOOSE stateful materials on top of replicated mesh data). * GhostingFunctor code now supports (in deprecated builds) less strict behavior from subclasses. * Redundant calls to `redistribute()` in special cases have been combined. * `scatter_constraints()` now uses a more optimal data structure. * `send_coarse_ghosts()` and `redistribute()` now use the NBX parallel synchronization algorithm. These had been the last two distributed algorithms in libMesh using older less-scalable MPI techniques. * Bug fix: in some use cases (including MOOSE applications using mesh refinement) libMesh could fail to properly synchronize changing nodeset definitions * Side boundary ids now be set on child elements, not just coarse mesh elements, allowing for adaptive refinement of sidesets. * Clearer error messages are now printed when a `parallel_only` assertion is failed. * `subdomain_id` has been added to output when printing `Elem` info. * `send_list` data is now properly prepared in all use cases. This fixes Geometric MultiGrid compatibility with PETSc 3.18, and may give slight performance improvements elsewhere. * A `System::has_constraint_object()` query API has been added. * Bug fixes for msys2 / Windows builds, TIMPI `set_union` of maps with inconsistent values assigned to the same key, packed-range communication of pairs with fixed-size data and padding bytes. Refs #000 This is also a prerequisite to the fixes for idaholab#4532
* Improvements to mesh redistribution * `ReplicatedMesh` and serialized `DistributedMesh` objects now also call `GhostingFunctor::redistribute()` callbacks (enabling communication of distributed data like MOOSE stateful materials on top of replicated mesh data). * GhostingFunctor code now supports (in deprecated builds) less strict behavior from subclasses. * Redundant calls to `redistribute()` in special cases have been combined. * `scatter_constraints()` now uses a more optimal data structure. * `send_coarse_ghosts()` and `redistribute()` now use the NBX parallel synchronization algorithm. These had been the last two distributed algorithms in libMesh using older less-scalable MPI techniques. * Bug fix: in some use cases (including MOOSE applications using mesh refinement) libMesh could fail to properly synchronize changing nodeset definitions * Side boundary ids now be set on child elements, not just coarse mesh elements, allowing for adaptive refinement of sidesets. * Clearer error messages are now printed when a `parallel_only` assertion is failed. * `subdomain_id` has been added to output when printing `Elem` info. * `send_list` data is now properly prepared in all use cases. This fixes Geometric MultiGrid compatibility with PETSc 3.18, and may give slight performance improvements elsewhere. * A `System::has_constraint_object()` query API has been added. * Bug fixes for msys2 / Windows builds, TIMPI `set_union` of maps with inconsistent values assigned to the same key, packed-range communication of pairs with fixed-size data and padding bytes. Refs #000 This is also a prerequisite to the fixes for idaholab#4532
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
We're on a new subdomain now; it *doesn't* necessarily have the properties the old one did. This fixes assertion failures in idaholab#23603 code for me - hopefully with this we *really* have closed idaholab#4532
This is enough to solve idaholab#4532 for me. With these changes we should no longer be automatically using workarounds instead. I'll fix up the regression tests where we were manually using workarounds too.
Right now we can do any 2 of 3 of these things. We need to be able to marshal and communicate our material properties when the mesh is repartitioned. Our current idea is to rekey our material property database on libMesh's "unique_id" instead of elem pointers.
Note: Material data types are templated so users can use any type they want.
dataLoad()
anddataStore()
must be implemented for every type used as a Material property.The text was updated successfully, but these errors were encountered: