Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashes by undoing/redoing nodes deletion #11021

Merged
merged 7 commits into from
Aug 19, 2020

Conversation

mmisol
Copy link
Collaborator

@mmisol mmisol commented Aug 18, 2020

Purpose

Fixes two crashes whie doing this at a high pace:

One crash happened in HomeWorkspaceViewModel, when postprocessing the
nodes after execution. These changes were done from the scheduler
thread, which when timed just right conflicted with changes done from
the UI thread, by the undo and redo operations.

Another one happened in ChangeSetComputer, when trying to compute the
delta AST. The cause was an optimization that attempted to skip tasks
based on containment comparison but disrespecting causality. The path
taken was to prevent skipping tasks altogether, but some more elaborate
optimization can probably be put in place.

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated

Reviewers

@aparajit-pratap @mjkkirschner

FYIs

@DynamoDS/dynamo

Fixes two crashes whie doing this at a high pace:

One crash happened in HomeWorkspaceViewModel, when postprocessing the
nodes after execution. These changes were done from the scheduler
thread, which when timed just right conflicted with chnages done from
the UI thread, by the undo and redo operations.

Another one happened in ChnageSetComputer, when trying to compute the
delta AST. The cause was an optimization that attempted to skip tasks
based on containment comparison but disrespecting causality. The path
taken was to prevent skipping tasks altogether, but some more elaborate
optimization can probably be put in place.
protected override TaskMergeInstruction CanMergeWithCore(AsyncTask otherTask)
{
var theOtherTask = otherTask as UpdateGraphAsyncTask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change may have significant performance impacts - we should investigate the original reason for the implementation - my memory is that when moving a slider it kicked off hundreds of unnecessary runs and made dynamo unresponsive - if this is related then theres an argument that the crash is actually acceptable...?

Copy link
Collaborator Author

@mmisol mmisol Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that was fixed by doing this instead of throttling the requests created by the slider 🤔

Still, maybe we could restore a more elaborate merge strategy that accounted for this case in particular. What kind of graphSyncData would we expect? Do we have a test case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heres part of the discussion I recall:
#4276

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a related github issue:
#4229

Copy link
Collaborator Author

@mmisol mmisol Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delay seems to work just fine according to my manual testing. Is there any special validation you would like to make beyond what's shown in that issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned if there are other scenarios (other than the slider) where we are not dropping redundant tasks that are quickly accumulating as a result of this change thus hitting performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't control the UI that other package authors may build - delay for the slider is great, but it does not cover the case generally, though I guess we could document the idea somewhere to throttle their controls?

For example I am thinking of controls like:
image

and

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted. For those cases having a strategy to merge/discard would be useful. I'll try to come up with something.

@mmisol
Copy link
Collaborator Author

mmisol commented Aug 19, 2020

Manually tested the slider without the Delay in the binding, to try to replicate what a faulty slider would do. The new code prevents that issue, while it also doesn't cause the crashes related to undoing/redoing. Please give it another look @mjkkirschner

@aparajit-pratap
Copy link
Contributor

Looks good to me.

@mjkkirschner
Copy link
Member

@mmisol I really like this explanation - thanks, and solution looks good.

  • I'm curious about the change to dispatch to the UI thread though - can you forsee any issues if the scheduler thread is the UI thread? (like in Revit?)

  • Did you mean to remove the delay, or keep it just to be safe?

@mmisol
Copy link
Collaborator Author

mmisol commented Aug 19, 2020

@mjkkirschner I can give it a try in Revit to be sure.

I kept the Delay but maybe I should remove it. Transitions look "smoother" when it's not there.

@mmisol
Copy link
Collaborator Author

mmisol commented Aug 19, 2020

@mjkkirschner Worked fine in Revit. I'll wait for https://master-5.jenkins.autodesk.com/job/Dynamo/job/DynamoSelfServe/job/pullRequestValidation/1315/ to make sure this is not causing anything weird in the CI.

In the end I removed the delay on the slider to keep it smooth like before.

@mmisol
Copy link
Collaborator Author

mmisol commented Aug 19, 2020

In the end I had to use lock instead of dispatching to the UI. Too many corner cases. If you have a minute please take another look @mjkkirschner

The last build I ran is https://master-5.jenkins.autodesk.com/job/Dynamo/job/DynamoSelfServe/job/pullRequestValidation/1319/

NodeViewModel nodeViewModel;
lock (Nodes)
{
nodeViewModel = Nodes.First(x => x.NodeLogic == node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is safest, but is there any harm in scanning the collection outside the lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is you can run into errors by enumerating over it while it's being modified. That's the initial crash I found.

@mjkkirschner
Copy link
Member

mjkkirschner commented Aug 19, 2020

Thanks @mmisol - though this makes me wonder how rigorous we should be in applying locks or other mutual exclusion mechanisms in Dynamo - especially in the view - in the Revit context this rarely matters, but now that more integrations are using multiple threads this will become more necessary.

@mmisol mmisol merged commit 4440c2d into DynamoDS:master Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants