-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSM init endless cycle MWE fail case #754
Comments
also added as a recent new test in IIF, testBasicTreeInit.jl |
Another basic endless cycle fail case (after #459) using IncrementalInference
fg = generateCanonicalFG_lineStep(5;
poseEvery=1,
landmarkEvery=5,
posePriorsAt=[0,2],
sightDistance=4,
solverParams=SolverParams(algorithms=[:default, :parametric]))
getSolverParams(fg).graphinit = false
getSolverParams(fg).treeinit = true
getSolverParams(fg).limititers = 50
smtasks = Task[]
tree, smt, hist = solveTree!(fg; smtasks=smtasks, verbose=true, timeout=50, recordcliqs=ls(fg)); |
TO DEBUG THE PACKAGES WERE AT DFG v0.10.5, and IIF at 3a41031 I'm going to debug a little by adding: mkpath(getLogPath(fg))
fid = open(joinLogPath(fg,"csmVerbose.log"), "w")
#solveTree!(...; verbosefid=fid, verbose=true, ...)
tree, smt, hist = solveTree!(fg; smtasks=smtasks, verbose=true, verbosefid=fid, timeout=50, recordcliqs=ls(fg));
# after finished
close(fid)
open(joinLogPath(fg, "csmLogicalReconstructMax.log"),"w") do io
IIF.reconstructCSMHistoryLogical(getLogPath(fg), fid=io)
end |
Here it is: |
So this is the loop on (1):
While Children First question for me is why does |
We can also look at the log on (1): |
Okay, so log from (1) shows something else, it says the endless cycle occurs with ┌ Info: 23:22:02.713 | 1---1| x2 @ null | 4e, blockUntilChildrenHaveStatus_StateMachine, maybe wait cliq=2, child status=upsolved.
└ @ IncrementalInference /home/dehann/.julia/dev/IncrementalInference/src/CliqStateMachine.jl:63
┌ Info: 23:22:02.713 | 1---1| x2 @ null | 4e, blockUntilChildrenHaveStatus_StateMachine, maybe wait cliq=3, child status=needdownmsg.
└ @ IncrementalInference /home/dehann/.julia/dev/IncrementalInference/src/CliqStateMachine.jl:63
┌ Info: 23:22:02.721 | 1---1| x2 @ null | 4b, trafficRedirectConsolidate459_StateMachine, cliqst=null
└ @ IncrementalInference /home/dehann/.julia/dev/IncrementalInference/src/CliqStateMachine.jl:63 Note the run away cycle happens within a few milliseconds after 23:22:02.713. The next place to look is back in the Logical.log at what happened with (3) between
The jump in global sequential step 40 to 95 is the part of interest. I'm checking out |
Yup, here is the call in (3) that sets off the cycle in (1) -- i.e. the last step from any neighboring CSMs: IncrementalInference.jl/src/CliqStateMachine.jl Line 1150 in 3a41031
So now we can look at how to resolve the process inside only CSM (1) with the pre-knowledge from (2) and (3)... |
So we need a way to redirect the loop (#754 (comment)) towards one of the
|
Lets start with IncrementalInference.jl/src/CliqStateMachine.jl Lines 740 to 751 in 3a41031
Yup, that looks pretty good. The question now is which of the cycle members (#754 (comment)) should divert out in this case. Best is to read all the code in those 4 cycle functions and see which part is closest to the current case... |
okay, more information -- so clique 3 during "good case" waits to go from Good case:
Bad case:
So why in the bad case does |
CSM log on clique 3, Good case:
Bad case
So the difference is in the solve order, good: [2, 3, 5]; and bad [3, 2, 5] |
IncrementalInference.jl/src/InferDimensionUtils.jl Lines 380 to 407 in d410dcc
With the most troubling part the separate out:
Which probably means that the priority this is on issue #910 |
My current view of the problem here is that the joint synchronization between solvableDims as a channel (directly from siblings) as well as passing message/status information down from the parent is producing a deadlock condition. Solving #910 is the best suited way for deterministic solution to remove any possibility of this deadlock occurring. |
workaround fix for current first compile test failure in IIF on current master on hex test in testBasicTreeInit.jl im locally using a delay hack while doing #910, injectDelayBefore = [6=>(determineCliqNeedDownMsg_StateMachine=>10);] See hack in code here: IncrementalInference.jl/test/testBasicTreeInit.jl Lines 53 to 58 in e3c0b52
|
Ah, not great this issue has been re-introduced with #958 See example here: |
We should add that test to RoME to help catch it in the future. |
Perhaps I should add this to tree init tests? IncrementalInference.jl/test/testExpXstroke.jl Lines 27 to 49 in 362e2fa
It is the same structure as EDIT: see #959 |
Obsolete with resolution of #855 and decision to only use xstroke-take for final consolidation of CSMs. |
Found a easy fail case on CSM, relates to existing test:
IncrementalInference.jl/test/testBasicTreeInit.jl
Line 9 in 03d231d
MWE
Clique 5 and 6 solves cycle endlessly if the following clique solve sequence is used:
The text was updated successfully, but these errors were encountered: