-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph duplicate node cleansing tool #17885
Comments
Original comment by @markharwood: A similar requirement is to use the text labels of selected nodes as a tokenized query to match similar nodes not currently in the workspace. Using index patterns that span more than one index I have used this feature to connect people/companies/addresses in Panama papers to similar entities in an OFAC sanctions list. This provides a tool for linking entities from different datasets. Ideally any grouping actions the user takes to merge entities visually could optionally be preserved as an "alias" definition that the UI could use as a reference to benefit other users or repeat visits to the same datasets. |
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations) |
Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner. |
Original comment by @markharwood:
In datasets like Panama papers the issue of noisy duplicate data raises its head and is a major pain.
Consider the near-duplicate names in this real example:
!LINK REDACTED
To assist end-users a simple Levenshtein edit-distance on the labels typically used in a graph can be used to suggest candidates for grouping. This process would run with the click of a new "link similar" button. These suggestions can be added as dotted links between related vertices which also has the effect of pulling the related vertices closer to each other in the diagram. The end user could act on these suggested links by using existing tools to select and group vertices or perhaps hitting the undo button to remove the suggestions.
I had this implementation working to good effect on a demo using SwissLeaks data (pre-cursor to Panama papers).
The text was updated successfully, but these errors were encountered: