Operational Transformation (OT) Protocol for Collaborative Working #276

nelsonic · 2020-06-19T17:09:23Z

In order to sync a list of items between 2+ devices (either belonging to the same person or when shared/collaborating with another person) we need to have an efficient way of transmitting the changes that have been made. We don't want to send the whole list each time as it's both inefficient and error prone as conflicts can occur. The last thing we want is two people to be editing the same item on a list and for one to overwrite the changes made by the other.

This is a well understood problem and has a name "Operational Transform".
Each operation that is performed is described by a set of keywords that can be sent between clients to keep them all in sync. see: https://en.wikipedia.org/wiki/Operational_transformation

For a quick (basic) introduction to how OT works, take a look at the Quill.js rich text editor.
Quill uses the Delta OT format to describe the changes that are made in an editor.
The delta is a List (Array) of "ops" that can be sent over the network to other connected clients and "replayed" to bring the two clients into sync.
While reading the Delta spec and researching how Quill.js works in #275 (comment) we realised that Quill is designed for Text Editing not for tasks. There is a feature request in their backlog: slab/quill#759 but it's not yet been implemented. And that got us thinking:

What if we need to extend Delta to do a lot more things, how do we do that?
Should we still use Delta even though we are going to heavily modify it to include timestamps, item.id (so we know which item was edited), person.id (to see who is editing what) and some sort of checksum (e.g. CID) to avoid spoofing.

## Google/Apache Wave

Google/Apache Wave https://en.wikipedia.org/wiki/Apache_Wave was a software framework for real-time collaborative editing online. It was way ahead of it's time and most people didn't "get" it.

Watch the Google Overview from 2009:

https://youtu.be/p6pgxLaDdQw

All of the innovative features that were available in Google Wave were made possible by OT.

While Google Wave was discontinued in 2012 https://killedbygoogle.com 🙄
the underlying collaborative editing features are still available in Google Docs https://en.wikipedia.org/wiki/Google_Docs#Editing
11 years later, inlining "gadgets" in the rich UI is now being copied by Microsoft Fluid:

https://youtu.be/cy1Tyr93Sw4

The Google/Apache Wave (over XMPP) Spec is still available:
https://svn.apache.org/repos/asf/incubator/wave/whitepapers/federation/wavespec.html

Todo

Investigate the available OT formats and see which one can work for us.
The format we select (or create) needs to be extensible so that we can add features to our app and not be limited by the existing OT vocabulary.

I expect that reading the Google Wave Spec and researching other OT specs will take me around a day. I will leave comments on this thread with my progress.

The text was updated successfully, but these errors were encountered:

nelsonic · 2022-10-21T08:57:25Z

@LuchoTurtle this is how we are going to store & sync actions between devices/people in our App. ♻️
Please do some reading/research into it and capture what you learn. 🙏

LuchoTurtle · 2022-12-15T18:23:01Z

I know this is more "low-level" than what was mentioned on dwyl/technology-stack#106. After all, we are going to make use of libraries (perhaps https://github.com/izelnakri/paper_trail ?) that use OT "under the hood".
I, however, had never heard of OT before in my life and it's crazy to know that this algorithm/methodology/technology is still being used 11 years later by Google Docs.

Here's a quick rundown of what I captured.

The Operational part of OT

The first part is operation. The basic philosophy is that you can have a document and it can be translated into a series of functions - operation. For example, here are two examples of operations:

Operation A
[insert("hello"), retain(1), insert("world")] -> "hello world"

Operation B
"hello world" -> [delete(1), insert("H"), retain(5), delete(1), insert("W"), retain(5) -> "Hello World"

retain means you move a space/index forward. In the end, you have retain(5) it's important because to make use of these operations, they have to start/end at the same index.

It's important because once we have this rule, we can combine operations together.

$OperationA \cdot OperationB = Hello World$

The Transformation part of OT

Perhaps the most interesting part of OT. What Google uses in Wave and Google Docs seems to be the following formula.

$xform(a,b) = (a', b')$, where $b' \circ a \equiv a' \circ b$

Looking at this is hard to understand. To see this graphically makes much more sense.

Imagine if we start from a starting state and two users are changing a document. The document has the word AT written.
Now imagine if Person 1 deletes "T" (operation $a$) and person 2 adds "H" (operation $b$) at the beginning of the string, like so.

If we merely applied each operation of each user after the other, we would get a result that is not valid.

This is where the formula comes in handy. We are able, from $a$ and $b$, to create an equivalent $a'$ and $b'$.
We pass $a$ and $b$ and transform in the function so it gives out $a'$ and $b'$. We use these to yield a valid result, like so!

This a simple way of explaining what's going on.

How does Google implement this?

So this is how Google does it. They try to minimize server usage, so they basically push all the work onto the client side/browser.
Basically, when a user logs into Google Docs, the browser keeps track of not your changes but the changes on the server-side. Google servers are there just to keep track of changes and notify all browsers. They are there just to instill order.

Let's give an example. This is taken from https://www.youtube.com/watch?v=u2_yccaHbQk&ab_channel=CocoaheadsTaipei.

Imagine if we have three browsers and each one is working on a document and they all make a change at the same time.

On the Google server, it prioritizes one operation. Let's imagine it prioritizes operation $a$.

After logging this operation, it will send this information to all the browsers.

So browser A knows his operation was taken by the server and knows it's in sync. The other browsers B and C will work on applying that change on the client-side.

Now Google server decides to continue and add operation $b$. It sends the confirmation to all the browsers again.

Browser B will merge the operation sent from the server.

And browser B is now in sync.
This operation keeps repeating for each operation.

So Google Server just creates the order of the operations and updates all the clients with the order of the operations.

On the other hand, the client-side only sends one instruction at a time and receives back the operation it sent to the server - it knows that it's time to converge.

This is a simplistic view of this. This is the type of versioning that occurs in real-time document collaboration and can be opted for other use-cases.

Why did Google use a client/server picture?

By their own words, they purposely use this because it allows them to scour a large number of clients without complicating the system.

By having a single source of truth (server), if users crash, they know where to fetch the timeline of actions and stay consistent with other clients. By forcing the client to wait for the server to acknowledge, it allows the server to keep a single history of operations without having to keep mirror of the client state for each client that is connected to it. This is much better for scalability.

Takeaways

Of course, this is much more complex when it is applied in real-life scenarios and by Google. But having this simplistic view helps me understand how the inner workings...work!

There are many libraries that allow doing OT. As it was pointed out in this issue, Quill uses Delta to OT in their live-editor.
There is also an implementation in Elixir that is also actively being used in production for 5 years.

It's great having this understanding because personally, I didn't know how Google Docs made it possible. 😄

nelsonic · 2022-12-15T21:12:26Z

Great that you've gone on a journey of discovery/learning. thank you for capturing it. 👌

nelsonic mentioned this issue Jul 23, 2022

Replayable Action Log! dwyl/mvp#112

Closed

nelsonic self-assigned this Aug 8, 2022

nelsonic removed their assignment Aug 8, 2022

nelsonic added the priority-1 Highest priority issue. This is costing us money every minute that passes. label Aug 8, 2022

nelsonic mentioned this issue Aug 19, 2022

Y.js dwyl/technology-stack#101

Open

nelsonic mentioned this issue Oct 4, 2022

SPIKE: PaperTrail ? dwyl/technology-stack#106

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operational Transformation (OT) Protocol for Collaborative Working #276

Operational Transformation (OT) Protocol for Collaborative Working #276

nelsonic commented Jun 19, 2020

nelsonic commented Oct 21, 2022

LuchoTurtle commented Dec 15, 2022

nelsonic commented Dec 15, 2022

Operational Transformation (OT) Protocol for Collaborative Working #276

Operational Transformation (OT) Protocol for Collaborative Working #276

Comments

nelsonic commented Jun 19, 2020

Todo

nelsonic commented Oct 21, 2022

LuchoTurtle commented Dec 15, 2022

The Operational part of OT

The Transformation part of OT

How does Google implement this?

Why did Google use a client/server picture?

Takeaways

nelsonic commented Dec 15, 2022