Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better diffing algorithm #1

Closed
yoshuawuyts opened this issue Jul 7, 2016 · 6 comments
Closed

A better diffing algorithm #1

yoshuawuyts opened this issue Jul 7, 2016 · 6 comments

Comments

@yoshuawuyts
Copy link
Member

We could turn the DOM into a Merkle tree, but because each Node is already unique (hurray for objects), the tree itself is already a tree of hashes. The goal of this is to provide functionality similar to React's .shouldComponentUpdate() but without explicitly having to create manual checks, so that any tree of valid DOM nodes can be parsed and diffed efficiently (hurray for the DOM as the lowest common denominator!). A few rules should suffice I reckon:

rules for creating new trees

  1. When a node in the new tree is updated, traverse the tree upwards and set an incremental "edited=" label; this acts as a signal if that part of the tree should be traversed for further updates
  2. Thunk as many components as possible. When creating nodes if oldState === newState return the existing node so [object Object] === [object Object] will return true.

rules for diffing trees

  1. If a node in the new tree !== node in the old tree, update the node in the old tree
  2. If a node in the new tree has a later "edited=" label than the new tree, or no label at all, update the node in the old tree - don't forget to copy over the "edited" label. Then continue traversing the tree

Trees are generally only a few layers of nesting, with loads of elements living in parallel. Traversing upwards in a tree should be generally inexpensive, so the "edited=" label can be applied with little cost. Because JS Objects are unique, and the label provides guarantees about the state of the child nodes, we can stop traversing nodes early and essentially gain the result of React's .shouldComponentUpdate().

caveats

  • no idea how this works with widgets; if widgets don't rely on external state this should be sweet. If they do - I'm not sure
  • this might not work with webworker diffing, as when stringifying properties on the nodes might not be transferred (plain properties, not DOM node attributes). But if we're sending regular objects it might work - but then we get the problem of how to label nodes, as an incremental clock may not work - shared state between workers is introduced and... yeah, that would def make it more complicated hah
  • walking svg nodes is hard. Not specifically related to this algorithm, but just want to throw it out there. I've got not idea how to walk svg haha

So yeah, that's the idea. morphdom can already be improved in size and speed a bit by removing the event hooks; but with this we might be able to take things a bit further. Thanks! ✨

@timwis
Copy link
Member

timwis commented Jul 7, 2016

So I'm still quite new to dom diffing, so forgive me if I'm way off, but it strikes me that the idea you proposed would require memoizing the elements, which would require they be created by a function. One thing that's nice about morphdom is that you can just apply it to any node in the dom; you don't have to use it as part of a framework or have a "view."

And maybe it's totally fine to require memoization/functions, and well worth the benefit. But I'm wondering if the idea of comparing "node hashes" could work even without memoization. What if there were a way to serialize a node tree and compare it to another serialized node tree? So <div>Foo</div> and <div>Foo</div> would always be === even if they are completely different objects.

Following that, if they're not equal, what if you traversed the tree downwards until you found the one that is equal, or reached the end of the tree, and started doing the normal diffing process upwards.

My guess is that this is how morphdom works except instead of comparing (1) tagName, (2) attributes, (3) children/textContent separately, this would speed up the process of finding the differences by serializing and comparing. Of course if the cost of serializing outweighs the cost of those multiple comparisons, then it's bunk.

As a very rough proof of concept, here's a demo that uses .outerHTML as a means of serializing.

const html = require('bel')

function update (target, source, parent) {
  console.log('comparing', target, source)
  if (!target) {
    console.log('appending')
    parent.appendChild(source)
  } else if (!source) {
    console.log('removing')
    parent.removeChild(target)
  } else if (target.outerHTML === source.outerHTML) {
    console.log('equal')
    return target
  } else if (target.children.length || source.children.length) {
    console.log('has children')
    for (let i = 0; i < source.children.length || i < target.children.length; i++) {
      console.log('recursing', i)
      update(target.children[i], source.children[i], target)
    }
  } else {
    console.log('replacing')
    // replaceChild would be better, but it removes source from
    // its parent tree, breaking the for loop above
    target.outerHTML = source.outerHTML
  }
}

const el1 = html`<ul><li>Foo</li><li>Bar</li></ul>`
const el2 = html`<ul><li>Foo</li><li>BAZ</li><li>Sandwich</li></ul>`

document.body.appendChild(el1)
setTimeout(() => update(el1, el2), 500)

Anything there worth investigating you think?

@yoshuawuyts
Copy link
Member Author

So I'm still quite new to dom diffing, so forgive me if I'm way off, but it strikes me that the idea you proposed would require memoizing the elements

No no, they could generally be created by whatever - on a first pass we could label all elements and continue from there ✨ - If interacting with a framework that also performs node updates, rather than creating completely new nodes, we might wrap those in a little thunk function that gets the result and copies the properties over to a new node whenever new data is passed, just to preserve the semantics. You're right this does introduce some light incompatibilities - but thunking as a form of compat for other vdom implementations is a reasonable tradeoff I reckon. But I'll def give this a bit more thought - thank you.


what if you traversed the tree downwards

Traversing trees downwards means you'll potentially hit all elements in a tree; on average half the elements even, whereas traversing upwards is generally O(1) I think.


finding the differences by serializing and comparing

So what I'm thinking here is that instead of serializing and comparing, we simply assume that a new DOM node is created when new data is passed - that's why between equality checks (replaced with new node) and update tags (old node was simply updated) we should be able to determine if that part of the tree should be traversed.


I really like your demo; def worth investigating more I reckon - bit tired right now so will give this more thought. Heaps thanks for replying!

@trueadm
Copy link

trueadm commented Jul 8, 2016

@timwis how do you diff DOM nodes that have internal state that won't appear with outerHTML or other methods? Take two input elements. If someone has entered some text into one of the input elements (its value property state changes), they both have the same surface markup still. It get's going to even more complicated when you try and use custom elements.

@timwis
Copy link
Member

timwis commented Jul 8, 2016

Good point @trueadm - I was using outerHTML as an example but I was hoping there'd be a way to hash/serialize a DOM tree including its attributes.

@trueadm
Copy link

trueadm commented Jul 8, 2016

@timwis there isn't one I'm afraid. It was one of the ideas I explored in the early days of creating Inferno.

@yoshuawuyts
Copy link
Member Author

Yeah, so the whole edited= label stuff feels off. I think thunking is where it's at. Anyway, I've done a full rewrite of the engine; seems to be working alright. Pretty sure it'll be mad fast too, but yeahhhh. Missing features, but I think we could close this issue for now. Quite stoked about it 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants