Merging ops/changes #488

wmertens · 2022-05-04T09:03:16Z

wmertens
May 4, 2022

I was thinking about the wastefulness of ops history, and how ops could be merged for efficiency, and its implication on interactivity. Writing down my thoughts, and apologies if/when I use the wrong terms or notations. I only saw the CRDT Hard Parts video.

For example, if you're moving an item in a list and then moving it again, you could merge those two ops into a single move. And when inserting letters, you could merge those ops into "insert string" (especially if you represent text as a tree of paragraphs of words and space strings).

By merging ops, you compress the history, thereby reducing sync and storage overhead. However, if you already sent out a change (group of ops), you can't just send the merged ops instead. So you would have to wait a bit for possible merges before sending changes. This hurts interactivity.

So could there be a way to have both history compression and good interactivity?

Suppose you define some rules about op merging, e.g. two move ops of the same item can be merged as a single move to the ultimate target. On node A you'd have ops MoveA1 and MoveA2, and they would be merged by removing MoveA1 and replacing MoveA2 with MoveA2' (which is probably just MoveA2 in this case).

So a node's op history would be missing some op versions.

One approach would be to send merged ops to all participants, and then they would have to undo previous ops and apply the merged op.

But if all participants have the same rules, they can apply the same merges and then merged ops don't need to be sent to current participants. This even works if the participants have different version of the rules, because the remote op version numbers remain the same, and their merged results should have the same effect (except for bugs).

So then while editing interactively, you'd send out ops as soon as you have them. Any ops you didn't send yet you can merge, and after sending you can merge them with already-sent ops as well.

pvh · 2022-05-04T20:06:52Z

pvh
May 4, 2022
Maintainer

There are quite a few different reasons for wanting to compress or elide history. You might want to reduce the storage overhead of a busy document, or you might want to avoid sharing the password you accidentally pasted into the document.

Automerge is designed to emphasize the accurate preservation of the history of a document. The storage format already uses a technique called RLE (Run Length Encoding) to do just what you suggest. A run of characters inserted at the same point will appear on-disk as a string. This makes a huge difference. The overhead of our reference large text document's edit history went from ~300x overhead to a mere 1.2x. This doesn't solve the case of, say, a counter that is updated a billion times and all you want to know is the current state. Solving that is possible, but there are other CRDTs out there which are better-suited to that use-case and Automerge's development is historically focused around collaborative authoring of documents which tend to have moderate size and only a few editors.

As for elision of history for privacy or security reasons, it's an ongoing discussion. I think we have a theory about how you could do it (I often refer to this as a "shortcut commit") but to the best of my knowledge it isn't anyone's current priority to solve. I think the next big changes to Automerge are more likely to be switching over to the Rust implementation as the "main" codebase and exploring things like branches and attribution. For my own part, I'm working on a repository object with pluggable storage & networking to simplify adoption into various application contexts.

So on the whole, I guess what I'd say is that I don't think ops history is wasteful. I think it's an incredibly valuable and underappreciated asset. I never check out a git repository and think "boy I wish I didn't have the history of this code". Being able to see the how and the who behind a source tree is something many software developers use every day. I think every application should get that capability, and that Automerge is the way to get it.

Still, as I said above, we've heard from plenty of folks who have examples of use-cases (say, contract negotiation) where you might not want to show "how the sausage is made" and I'm sure those features will arrive eventually so please do continue to share your thoughts on why you want it or how you imagine it might work.

0 replies

wmertens · 2022-05-13T12:08:36Z

wmertens
May 13, 2022
Author

Points taken. I'd say that even if you have good compression, that data still needs to be processed, so less data is still better.

I was thinking of an equivalency relation, where a set of ops can be replaced with a smaller set of ops. Then if the group theory constraints still hold, it's a vaild substitution (even if it throws away data).

So, a series of single item insertions in a list can be substituted with a single insertion of a list of items at the version of the first insertion. Other ops will only be using this node's positions, so the substitution is valid.

If you maintain only a single cursor position of a node, and have an op that sets a cursor position, and you don't care about it's history at all, you could just delete all historical cursor positions and only retain the last one. No other ops on would become invalid, so this is a valid substitution.

Now suppose you also have a chat channel, you might want to keep the last cursor position just before each chat message op. Still valid.

When you only want to keep x time of history, and you set the constraint that no node can provide ops older than x, you can take the state of the document at x time ago including tombstones, and replace all older ops with a single op that loads the state. Still valid.

When you have undo/redo of ops, you could replace an op+undo with its tombstone. If other ops don't use the actual value of the original op but only its position, the substitution is valid. A node could even keep track of last synced ops and remove such tombstoned ops entirely before sending them out.

Is this workable?

1 reply

ept May 16, 2022
Maintainer

There are other discussions of this topic at #435 and at automerge/automerge#348 (see Orion's gist).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging ops/changes #488

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Merging ops/changes #488

wmertens May 4, 2022

Replies: 2 comments · 1 reply

pvh May 4, 2022 Maintainer

wmertens May 13, 2022 Author

ept May 16, 2022 Maintainer

wmertens
May 4, 2022

Replies: 2 comments 1 reply

pvh
May 4, 2022
Maintainer

wmertens
May 13, 2022
Author

ept May 16, 2022
Maintainer