You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to find a way to compute delta on a json tree with the minimal size possible (so with the ability to split/move/match nodes)
I first tried to implement an algorithm from scratch using the slate operations that I gather each time the editor got an update and try to optimize them in a way to get only the relevant operations at the end. It is pretty easy to combine/merge text operations, but when you come to nodes operations, the task become a real nightmare.
I first get to know google diff-match-patch for plain text that works well on planar data to find out what has been moved or replaced, but like explained it do not work well for structured data.
Continuing the research by looking json diffing libraries, but you are facing libraries that do not try to find moving/splitting nodes and results with remove everything and insert everything.
Then you can find interesting work made on structured data with the homologous format XML and understand you are facing a NP-hard problem with a lot of academic research on the subject and lot of different libs with pros and cons (XMLDiff, XyDiff, graphtage, ...).
My issue I do not found yet a library that express my need and make simple changes that could be displayed as redlining.
Simple json example from a document tree
constinitial=[{"type": "p","children": [{"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin "},{"text": "efficitur","bold": true},{"text": " sit amet sem a feugiat. Sed finibus eleifend elit a ultrices. Curabitur ac massa at mauris mollis varius."}]}]constfinal=[{"type": "p","children": [{"text": "Lorem ipsum dolor sit "},{"text": "amet","bold": true},{"text": ", consectetur Hello World adipiscing elit. Proin "},{"text": "efficitur","bold": true},{"text": " sit amet sem. Sed finibus "},{"text": "eleifend","underline": true},{"text": " elit a ultrices. Curabitur ac massa at mauris mollis varius."}]}]// Resulting deltasconstdeltas=[{"op": "split","path": [0,0],"position": 26},{"op": "split","path": [0,0],"position": 22},{"op": "addProps","path": [0,1],"key": "bold","value": true}...]
XML transposed of the json version
# initial
<document><nodetype="p"><text>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin </text><textbold="true">efficitur</text><text> sit amet sem a feugiat. Sed finibus eleifend elit a ultrices. Curabitur ac massa at mauris mollis varius.</text></node></document>
# final
<document><nodetype="p"><text>Lorem ipsum dolor sit </text><textbold="true">amet</text><text>, consectetur Hello World adipiscing elit. Proin </text><textbold="true">efficitur</text><text> sit amet sem. Sed finibus </text><textunderline="true">eleifend</text><text> elit a ultrices. Curabitur ac massa at mauris mollis varius.</text></node></document>
Results when using XyDiff is replace everything (very old library)
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xy:unit_deltaxmlns:xy="urn:schemas-xydiff:xydelta">
<xy:tfromXidMap="(9-14;3-4;15-20;7-8|21)">
<xy:dpar="7"pos="3"xm="(5-6)">
<text> sit amet sem a feugiat. Sed finibus eleifend elit a ultrices. Curabitur ac massa at mauris mollis varius.</text>
</xy:d>
<xy:dpar="7"pos="1"xm="(1-2)">
<text>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin </text>
</xy:d>
<xy:ipar="7"pos="1"xm="(9-10)">
<text>Lorem ipsum dolor sit </text>
</xy:i>
<xy:ipar="7"pos="2"xm="(11-12)">
<textbold="true">amet</text>
</xy:i>
<xy:ipar="7"pos="3"xm="(13-14)">
<text>, consectetur Hello World adipiscing elit. Proin </text>
</xy:i>
<xy:ipar="7"pos="5"xm="(15-16)">
<text> sit amet sem. Sed finibus </text>
</xy:i>
<xy:ipar="7"pos="6"xm="(17-18)">
<textunderline="true">eleifend</text>
</xy:i>
<xy:ipar="7"pos="7"xm="(19-20)">
<text> elit a ultrices. Curabitur ac massa at mauris mollis varius.</text>
</xy:i>
</xy:t>
</xy:unit_delta>
Results when using graphtage (both json & xml version is looping infinitely):
...
WARNING:graphtage.bounds:The most recent call to <function FixedLengthSequenceEdit.tighten_bounds at 0x7f5756c38790> on <graphtage.sequences.FixedLengthSequenceEdit object at 0x7f5730e519d0> returned bounds [191, 191] when the previous bounds were [212, 457]
WARNING:graphtage.bounds:The most recent call to <function FixedLengthSequenceEdit.tighten_bounds at 0x7f5756c38790> on <graphtage.sequences.FixedLengthSequenceEdit object at 0x7f5730e519d0> returned bounds [191, 191] when the previous bounds were [212, 457]
Diffing: 31%|███████████████████████████████████████████████████▊ | 111/356 [00:02<00:04, 156.74it/s]
I'm wondering how can I improve the results of the diffing? What if I'm preprocessing the data in a way? Is there other libraries with a better suitable algorithm?
At the end I found this topic related to many amount of studies and academic research but do not find anything suiting my case.
I would enjoy getting some help to figuring out this complex challenge.
The text was updated successfully, but these errors were encountered:
Hello community,
I'm trying to find a way to compute delta on a json tree with the minimal size possible (so with the ability to split/move/match nodes)
I first tried to implement an algorithm from scratch using the slate operations that I gather each time the editor got an update and try to optimize them in a way to get only the relevant operations at the end. It is pretty easy to combine/merge text operations, but when you come to nodes operations, the task become a real nightmare.
I first get to know google diff-match-patch for plain text that works well on planar data to find out what has been moved or replaced, but like explained it do not work well for structured data.
Continuing the research by looking json diffing libraries, but you are facing libraries that do not try to find moving/splitting nodes and results with remove everything and insert everything.
Then you can find interesting work made on structured data with the homologous format XML and understand you are facing a NP-hard problem with a lot of academic research on the subject and lot of different libs with pros and cons (XMLDiff, XyDiff, graphtage, ...).
My issue I do not found yet a library that express my need and make simple changes that could be displayed as redlining.
Simple json example from a document tree
XML transposed of the json version
Results when using XyDiff is replace everything (very old library)
Results when using graphtage (both json & xml version is looping infinitely):
But it works when using another sample
I'm wondering how can I improve the results of the diffing? What if I'm preprocessing the data in a way? Is there other libraries with a better suitable algorithm?
At the end I found this topic related to many amount of studies and academic research but do not find anything suiting my case.
I would enjoy getting some help to figuring out this complex challenge.
The text was updated successfully, but these errors were encountered: