-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration branch (Uniform refine earlier) #4638
Conversation
Results of testing 6f28272 using moose_PR_pre_check recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11350 |
Results of testing 6f28272 using moose_PR_test recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11351 |
Results of testing 6f28272 using moose_PR_test_dbg recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11352 |
Results of testing 6f28272 using moose_PR_test recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11351 |
@@ -312,6 +309,9 @@ ActionWarehouse::executeAllActions() | |||
void | |||
ActionWarehouse::executeActionsWithAction(const std::string & task) | |||
{ | |||
// Set the current task name | |||
_current_task = task; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change. Was it actually required to make all of this work?
Note: I'm not criticizing, just curious...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it was in order to clean up MooseApp.C where I added the new "uniform_refine_mesh" and had it fire an existing Action manually. There's no place in the code where we hitting this potential bug yet but now it's fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool: that was definitely a bug waiting to happen
Well I'm currently stumped. This PR simply does not work with the stateful adaptivity system in parallel and I don't fully understand why. When running on multiple processors I receive this assert after the first time step. Yes, the simulation starts and runs just fine through the first time step and dies during adaptivity.
Is there a problem with doing the refinement up front BEFORE adding the equation systems? Is there information being lost by adding the equation system to a refined mesh that already has active children? Stack traces from the two processes:
@friedmud, @roystgnr - Can you think of any call I'm missing? |
Actually even that doesn't make sense. This PR passes the other 1000+ tests, many of which have uniform refinement turned on. Maybe there's a problem with the way we build our Refinement and Coarsening maps that's confusing the equation systems object. I don't think we trigger that code AND the early refinement in the same test. hmm.... |
When is the mesh partitioned? |
Now I want to put an assert in at line 80085 somewhere.
What this means is that the mesh has a node in it which the current The problem should be independent of EquationSystems. I'm not sure |
How about doing several levels of refinement at once? I removed this logic since the equation systems don't exist at the point where I'm doing the uniform refinements. If that isn't it, I'll dig into https://github.com/idaholab/moose/blob/devel/framework/src/mesh/MooseMesh.C#L1378 |
OK - just to recheck for my own sanity. I changed the logic temporarily to force the uniform refinement in the original location (i.e. in FEProblem::initialSetup()) which is way late and after all systems have been setup. The test ran just fine without any assertions or other errors. Simply moving the refinement to an early spot is causing this error. |
I believe doing several levels of refinement at once should be supported if-and-only-if there's no solutions being projected from the old to the new mesh. Typically we do have solutions to be projected, so even when we have multiple refinement levels to do (e.g. when performing a fine restart from a very coarse solution) we typically do the refinement one level at a time. But "start with a coarse mesh, then do uniformly_refine(N>1), then build systems on it and do further AMR" is such a common use case that I'd be astonished if it had regressed. Unless the independent meshes are sharing nodes (which would have been too huge a bug to have gone uncaught to begin with), using multiple meshes shouldn't create any confusion about which processors own what. How do I replicate the current failure case? ./run_tests --something_or_other? How small (n_elem) a failure case can you boil this down to? |
Thanks - I don't think there's any issue with several levels of refinements or anything else you proposed. The MOOSE and application tests suites run fine with this change and in parallel. This issue with this one test is something much more stupid 😄 Hopefully @friedmud has an idea to shed some light on the situation. |
I'm stumped too. I mean - what does this have to do with stateful material properties?!? Or can you get this to happen without that now? |
I believe you need stateful materials + adaptivity + uniform refinement to On Wed Feb 04 2015 at 1:42:23 PM Derek Gaston [email protected]
|
Looking at this now |
Cool, if you need something let me know. I've been doing a lot of this work on one of the hpcbuild boxes. I just used X-forwarding and -start_in_debugger to trip the error. It's 100% reproducible on more than one processor. One more thing. Just for fun I commented out the buildRefinementandCoarsening() maps call in FEProblem::initialSetup(). I STILL hit this error after the first timestep but before it would need those maps to do the material property copies. |
Ok - thanks for the heads up on that. I'm going to go hardcore... Derek On Thu Feb 05 2015 at 10:26:29 AM EST Cody Permann [email protected]
|
Other things I tried:
|
@roystgnr and @jwpeterson - @friedmud discovered that the issue had to do with our use of I hardcoded the skip partitioning call right after the uniform refinement calls and this test case ran. The problem is that we only want to disable partitioning for the very specific case spelled out in #4532. Now that we are doing these refinements first I don't have all of the information available to make that decision so I'm at a point where I'm trying to understand the specific issue of why this doesn't work to figure out the right place. Current MOOSE workflow looks like this (simplified):
This PR moves step 9 to step 3.5 which breaks due to the skip partitioning in 6. I already mentioned that If I hard code 6 at 3.6 it fixes this case. What I don't understand is why this is issue. Clearly we are getting away with preparing the mesh in 2 and 4 without the skip partitioning flag now. In a different attempt to fix the issue I attempted to add yet another |
Wait! Even more interesting is that I successfully put step 6 at 4.1 and it worked. That leaves only step 5 in the middle so there's something else going on possibly with setting up the equation system. Still looking... |
On Thu, Feb 5, 2015 at 10:44 AM, Cody Permann [email protected]
I'm not sure I understood all that, but what I think you are saying is that uniform refining before preparing the mesh (while skipping partitioning) breaks stateful material properties in MOOSE, while uniform refining |
Don't look at it as "breaks stateful materials", instead I'm theorizing that it's actually breakage due to specific ordering of libMesh calls. I'm getting a libMesh assertion which I posted way up above before it gets to any of the material prolongation/restriction logic. |
Yes - this isn't actually a "stateful material properties" problem at all. Like Cody says: there is some order dependence to uniform refine, I suspect that basically no one else in the world ever uses On Thu Feb 05 2015 at 1:05:41 PM EST Cody Permann [email protected]
|
Whoa - I lied. Turns out that due to multiple registration my disabling of the partitioning was happening earlier than I thought. I have found out that I have to disable partitioning before uniform refinement. The old workflow did that since the uniform refinement happened very late. It appears to still be a requirement. I don't have an easy solution to this problem unless I do something really hacky like disabling it and then turning it back on later (yuck). While I can snoop most pieces of information earlier if need be, we don't have a good way of snooping information about stateful material properties since they are added programmatically when Materials are added to the system. |
I modified libMesh adaptivity ex2 to skip partitioning and to do the refinements before building the equation system and it ran just fine on multiple processors so there's still something in MOOSE that's causing this issue, not libMesh. |
I don't know - I wasn't able to trigger it in other MOOSE tests either... I
|
I got it! Takes four processors in the libMesh example. It won't fail on 2 or 3... |
c1fd3ae
to
e099710
Compare
Results of testing e099710 using moose_PR_pre_check recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11870 |
Results of testing e099710 using moose_PR_test recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11871 |
Results of testing e099710 using moose_PR_app_tests recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11873 |
e099710
to
0bfe906
Compare
Results of testing 0bfe906 using moose_PR_pre_check recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11885 |
Results of testing 0bfe906 using moose_PR_app_tests recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11888 |
Results of testing 0bfe906 using moose_PR_test recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11886 |
Results of testing e099710 using moose_PR_test_dbg recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11872 |
Results of testing 0bfe906 using moose_PR_test_dbg recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11887 |
0bfe906
to
4a9c2e7
Compare
Results of testing 4a9c2e7 using moose_PR_pre_check recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11904 |
Results of testing 4a9c2e7 using moose_PR_app_tests recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11907 |
Results of testing 4a9c2e7 using moose_PR_test recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11905 |
…ing FEProblem::initialSetup() when possible closes #4584
…rt from the complete mesh after modifiers and refinements
4a9c2e7
to
7cc9faa
Compare
Results of testing 7cc9faa using moose_PR_pre_check recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11910 |
Results of testing 7cc9faa using moose_PR_test recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11911 |
Results of testing 7cc9faa using moose_PR_app_tests recipe: Failed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11913 |
Results of testing 4a9c2e7 using moose_PR_test_dbg recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11906 |
Integration branch (Uniform refine earlier)
Results of testing 7cc9faa using moose_PR_test_dbg recipe: Passed on: linux-gnu View the results here: https://www.moosebuild.com/view_job/11912 |
This PR moves the uniform refinement steps up into the mesh setup stage. The original code path remains for the case where restart and uniform refinements are both needed.
Important note: The definition of what uniform_refinement means has changed for oversampled meshes. Previously that number assumed how many refinements you wanted from the original mesh, now we have defined it to mean the number of refinements to apply after the mesh has been setup which included the initial uniform refinement steps.
Includes a small bug fix for setting _current_task when executing Actions individually.