Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED 2282]Investigate the root cause of nodes being undefined in Safari #310

Merged
merged 4 commits into from
Nov 27, 2020

Conversation

studioswong
Copy link
Contributor

@studioswong studioswong commented Nov 19, 2020

Description

Related ticket: https://jira.quantumblack.com/browse/KED-2272

This ticket is mainly investigate the root cause of the bug that caused the app to break un safari. The bug is first reported during regression testing of the latest release, and can be reproduced by changing pipelines and hovering over the nodes on the graph.

Development notes

While the fix is a very simple fix, the bulk of the work lies in the investigation of the problem behind the current setup of the flowchart component in relation to how safari works under the hood. ( note: this bug does not appear on chrome).

On trying to reproduce the bug, One thing I notice is that the bug is triggered not by changing pipelines, but by simply hovering over the graph on any selected pipeline - the bug will still be triggered by continuously hovering over the nodes on the graph while remaining on the same pipeline. ( one thing to note: this bug happens on a 'spontaneous' manner and does not happen on every try which suggests the root cause is very likely related to how safari works underneath the hood)

The main cause of the bug are undefined data within allNodes in the drawNodes function ( see below screenshot)
image

In investigating the the reason behind the undefined element, one thing to note is that this error is trigged within the if statement that checks against 5 changing props: nodes, nodesActive, nodeSelected, centralNode and linkedNodes. ( code ref: if (changed('nodes', 'nodeActive', 'nodeSelected', 'centralNode', 'linkedNodes'))) . Further investigation leads to the initial discovery that this bug is only triggered under when there are changes in the following 3 props: nodeActive, centralNode and linkedNodes i.e this bug is only triggered purely by the 'hover' action over the nodes on the graph.

In logging out and comparing the linkedNodes props as well as allNodes field ( see below), it surfaces the main root cause of the problem - only the nodes that exists in the linkedNodes are persisted in memory under allNodes (i.e nodes of the currently selected pipeline), while the nodes from other pipelines are overwritten by undefined.

log of linkedNodes values:
image

log of allNodes ( which is array of all ref instances of all nodes across all pipelines):
image

This suggests a huge difference in the garbage collector between chrome and Safari, in that safari's garbage collector might have interpreted the nodes of non-selected pipeline as 'non-reachable' objects and had cleared out the information of those nodes within the allNodes array, hence those fields being re-assigned as undefined in memory. ( further details can be further dig down through the rabbit hole of browser tech as safari most likely adopts the ' tracing GC' approach: https://en.wikipedia.org/wiki/Tracing_garbage_collection )

The above can be fixed by ensuring that this.el.nodes gets properly reassigned the entire set of nodes whenever there are any changes in the nodeActive prop (i.e whenever there is hovering of the nodes) so that allNodes would be freshly propagated with all nodes data rather than relying on the data in memory. This is done so by adding in an extra condition to include the change in the nodeActive prop as part of the condition in the assignment of this.el.nodes within drawNodes.

Further regression tests have been performed to ensure that this bug does not happen while it does not break anything else in the app. There is no impact on app performance as well, although it might be worth to further test / consider situations where there are 50+ pipelines.

This also suggest that we could perhaps consider refactoring the drawNodes function so that, instead of taking in nodes across pipelines under this.el.nodes, we would only reference the existing set of nodes under the selected pipeline within the drawNodes function.

I have kept the last patch fix (the filter of underfined items within the definition of allNodes) as a further check to ensure that this bug would not happen, but theoretically this is not needed with the current implementation of the nodes reassignment.

Thanks to @bru5 for discovering the bug and all your help and discussion on this issue!

QA notes

To test this feature, you can also take out the patch fix ( the filter operation under allNodes assignment on line 134.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Legal notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

Copy link
Contributor

@richardwestenra richardwestenra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly unable to replicate the original bug today so I can't tell if this is working or not. However assuming this will fix the issue, should we also revert the change made in #306? @bru5 what do you think? Just wondering whether it's now unnecessary, or whether it's still a worthwhile safeguard.

@bru5
Copy link
Contributor

bru5 commented Nov 25, 2020

Thanks for investigating this and providing the detailed report, very helpful in understanding what's going on here.

My feeling on this is that for now we might best just continue to use the .filter(node => Boolean(node)) approach, rather than also adding this on top. Maybe even change that to .filter(node => typeof node !== 'undefined') for more clarity. Well that's assuming it was sufficient to avoid the issue?

  • it avoids running selectors and data on every node click
  • it's a bit more explicit and general without needing as much comment
  • there may be other reasons upstream that will cause the same failure now or later on

If it's the case that there are nodes coming through from non-visible pipelines, then this definitely sounds like an upstream problem that @richardwestenra you might be worth taking a look at?

Let me know your thoughts, do you think the existing fix is enough to handle this and as a safeguard?

@studioswong
Copy link
Contributor Author

studioswong commented Nov 26, 2020

Thanks @richardwestenra and @bru5 for the review. Here is my view and summary of the possible options after this investigation and our discussion:

Option 1: simply keep current filter patch fix as the only fix

imo that is sufficient as a fix given that the undefined nodes that are filtered out are not the nodes that are currently shown on the graph ( those are the nodes that are not within the selected pipeline), while on reselecting a new pipeline the if(changed(nodes)) statement will be called and this.el.nodes will be flushed with a new set of nodes again.

Pros: simplest implementation without any changes in performance
Cons: This is a patch fix whereby the undefined data still exists

Option 2: Implement this fix to ensure no memory leak in Safari

This is the solution that ultimately fixes this problem of memory leak as it ensures this.el.nodes within drawNodes gets assigned with the entire set of data whenever the node gets toggled.

Pros: There are no more instances of undefined data with it being the root cause to eliminate the problem.
Cons: Yes, given that we are assigning this.el.nodes on every instance of toggleNodes, I would imagine it would have an impact on performance, however, my regression test on with the demo dataset does not seem to introduce any performance issues.

Option 3: Implement the fix and keep the patch fix

This is the option that this PR implemented - the fix to implement the root cause + the patch fix as a double check

Is this problem caused by data upstream?
On checking out the props passed into the drawNodes function when the error occurs, the props of all Nodes are intact ( as in there are no undefined nodes.) Moreover, if this is a problem that is caused by data upstream, it will likely happen cause an error across all browsers as well, which isn't the case here.

Hence from what I see, probably not - this is likely just caused by the garbage collector of Safari.

@studioswong studioswong merged commit a1fd13c into main Nov 27, 2020
@studioswong studioswong deleted the 2272-investigate-and-fix-root-cause-safari-bug branch November 27, 2020 19:57
@richardwestenra richardwestenra mentioned this pull request Dec 16, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants