Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operational Transformation (OT) Protocol for Collaborative Working #276

Open
2 tasks
nelsonic opened this issue Jun 19, 2020 · 3 comments
Open
2 tasks

Operational Transformation (OT) Protocol for Collaborative Working #276

nelsonic opened this issue Jun 19, 2020 · 3 comments
Labels
discuss Share your constructive thoughts on how to make progress with this issue epic A feature idea that is large enough to require a sprint (5 days) or more and has smaller sub-issues. priority-1 Highest priority issue. This is costing us money every minute that passes. T1d Time Estimate 1 Day technical A technical issue that requires understanding of the code, infrastructure or dependencies

Comments

@nelsonic
Copy link
Member

In order to sync a list of items between 2+ devices (either belonging to the same person or when shared/collaborating with another person) we need to have an efficient way of transmitting the changes that have been made. We don't want to send the whole list each time as it's both inefficient and error prone as conflicts can occur. The last thing we want is two people to be editing the same item on a list and for one to overwrite the changes made by the other.

This is a well understood problem and has a name "Operational Transform".
Each operation that is performed is described by a set of keywords that can be sent between clients to keep them all in sync. see: https://en.wikipedia.org/wiki/Operational_transformation

For a quick (basic) introduction to how OT works, take a look at the Quill.js rich text editor.
Quill uses the Delta OT format to describe the changes that are made in an editor.
The delta is a List (Array) of "ops" that can be sent over the network to other connected clients and "replayed" to bring the two clients into sync.
While reading the Delta spec and researching how Quill.js works in #275 (comment) we realised that Quill is designed for Text Editing not for tasks. There is a feature request in their backlog: slab/quill#759 but it's not yet been implemented. And that got us thinking:

  • What if we need to extend Delta to do a lot more things, how do we do that?
  • Should we still use Delta even though we are going to heavily modify it to include timestamps, item.id (so we know which item was edited), person.id (to see who is editing what) and some sort of checksum (e.g. CID) to avoid spoofing.

## Google/Apache Wave

Google/Apache Wave https://en.wikipedia.org/wiki/Apache_Wave was a software framework for real-time collaborative editing online. It was way ahead of it's time and most people didn't "get" it.

Watch the Google Overview from 2009:
image
https://youtu.be/p6pgxLaDdQw

All of the innovative features that were available in Google Wave were made possible by OT.

While Google Wave was discontinued in 2012 https://killedbygoogle.com 🙄
the underlying collaborative editing features are still available in Google Docs https://en.wikipedia.org/wiki/Google_Docs#Editing
11 years later, inlining "gadgets" in the rich UI is now being copied by Microsoft Fluid:
image
https://youtu.be/cy1Tyr93Sw4

The Google/Apache Wave (over XMPP) Spec is still available:
https://svn.apache.org/repos/asf/incubator/wave/whitepapers/federation/wavespec.html

Todo

  • Investigate the available OT formats and see which one can work for us.
  • The format we select (or create) needs to be extensible so that we can add features to our app and not be limited by the existing OT vocabulary.

I expect that reading the Google Wave Spec and researching other OT specs will take me around a day. I will leave comments on this thread with my progress.

@nelsonic nelsonic added discuss Share your constructive thoughts on how to make progress with this issue technical A technical issue that requires understanding of the code, infrastructure or dependencies T1d Time Estimate 1 Day epic A feature idea that is large enough to require a sprint (5 days) or more and has smaller sub-issues. labels Jun 19, 2020
@nelsonic nelsonic self-assigned this Aug 8, 2022
@nelsonic nelsonic removed their assignment Aug 8, 2022
@nelsonic nelsonic added the priority-1 Highest priority issue. This is costing us money every minute that passes. label Aug 8, 2022
@nelsonic
Copy link
Member Author

@LuchoTurtle this is how we are going to store & sync actions between devices/people in our App. ♻️
Please do some reading/research into it and capture what you learn. 🙏

@LuchoTurtle
Copy link
Member

I know this is more "low-level" than what was mentioned on dwyl/technology-stack#106. After all, we are going to make use of libraries (perhaps https://github.com/izelnakri/paper_trail ?) that use OT "under the hood".
I, however, had never heard of OT before in my life and it's crazy to know that this algorithm/methodology/technology is still being used 11 years later by Google Docs.

Here's a quick rundown of what I captured.

The Operational part of OT

The first part is operation. The basic philosophy is that you can have a document and it can be translated into a series of functions - operation. For example, here are two examples of operations:

Operation A
[insert("hello"), retain(1), insert("world")] -> "hello world"

Operation B
"hello world" -> [delete(1), insert("H"), retain(5), delete(1), insert("W"), retain(5) -> "Hello World"

retain means you move a space/index forward. In the end, you have retain(5) it's important because to make use of these operations, they have to start/end at the same index.

It's important because once we have this rule, we can combine operations together.

$OperationA \cdot OperationB = Hello World$

The Transformation part of OT

Perhaps the most interesting part of OT. What Google uses in Wave and Google Docs seems to be the following formula.

$xform(a,b) = (a', b')$, where $b' \circ a \equiv a' \circ b$

Looking at this is hard to understand. To see this graphically makes much more sense.

Imagine if we start from a starting state and two users are changing a document. The document has the word AT written.
Now imagine if Person 1 deletes "T" (operation $a$) and person 2 adds "H" (operation $b$) at the beginning of the string, like so.

image

If we merely applied each operation of each user after the other, we would get a result that is not valid.

image

This is where the formula comes in handy. We are able, from $a$ and $b$, to create an equivalent $a'$ and $b'$.
We pass $a$ and $b$ and transform in the function so it gives out $a'$ and $b'$. We use these to yield a valid result, like so!

image

This a simple way of explaining what's going on.

How does Google implement this?

So this is how Google does it. They try to minimize server usage, so they basically push all the work onto the client side/browser.
Basically, when a user logs into Google Docs, the browser keeps track of not your changes but the changes on the server-side. Google servers are there just to keep track of changes and notify all browsers. They are there just to instill order.

Let's give an example. This is taken from https://www.youtube.com/watch?v=u2_yccaHbQk&ab_channel=CocoaheadsTaipei.

Imagine if we have three browsers and each one is working on a document and they all make a change at the same time.

image

On the Google server, it prioritizes one operation. Let's imagine it prioritizes operation $a$.

image

After logging this operation, it will send this information to all the browsers.

So browser A knows his operation was taken by the server and knows it's in sync. The other browsers B and C will work on applying that change on the client-side.

image

Now Google server decides to continue and add operation $b$. It sends the confirmation to all the browsers again.

image

Browser B will merge the operation sent from the server.

image

And browser B is now in sync.
This operation keeps repeating for each operation.

So Google Server just creates the order of the operations and updates all the clients with the order of the operations.

On the other hand, the client-side only sends one instruction at a time and receives back the operation it sent to the server - it knows that it's time to converge.

This is a simplistic view of this. This is the type of versioning that occurs in real-time document collaboration and can be opted for other use-cases.

Why did Google use a client/server picture?

By their own words, they purposely use this because it allows them to scour a large number of clients without complicating the system.

image

By having a single source of truth (server), if users crash, they know where to fetch the timeline of actions and stay consistent with other clients. By forcing the client to wait for the server to acknowledge, it allows the server to keep a single history of operations without having to keep mirror of the client state for each client that is connected to it. This is much better for scalability.

image

Takeaways

Of course, this is much more complex when it is applied in real-life scenarios and by Google. But having this simplistic view helps me understand how the inner workings...work!

There are many libraries that allow doing OT. As it was pointed out in this issue, Quill uses Delta to OT in their live-editor.
There is also an implementation in Elixir that is also actively being used in production for 5 years.

It's great having this understanding because personally, I didn't know how Google Docs made it possible. 😄

@nelsonic
Copy link
Member Author

Great that you've gone on a journey of discovery/learning. thank you for capturing it. 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Share your constructive thoughts on how to make progress with this issue epic A feature idea that is large enough to require a sprint (5 days) or more and has smaller sub-issues. priority-1 Highest priority issue. This is costing us money every minute that passes. T1d Time Estimate 1 Day technical A technical issue that requires understanding of the code, infrastructure or dependencies
Projects
None yet
Development

No branches or pull requests

2 participants