Skip to content
This repository has been archived by the owner on Jan 19, 2023. It is now read-only.

Element reference type benchmark #16

Closed

Conversation

Britefury
Copy link
Collaborator

Implemented a benchmark for measuring the performance of various types of element reference. The benchmark can be run with cargo bench, although you need to comment and uncomment various blocks of code to try the different reference types. A generics based approach was attempted, but a variety of implementations of it were all failed by the borrow/lifetime checker.

To save time, here are the results:

Box<TElement>: ~49,940,000 ns (Base case; boxed trait)
Box<Box<TElement>>: ~53,300,000 ns (1 additional pointer indirection)
RefCell<Box<TElement>>: ~62,260,000 ns (interior mutability)
Rc<Box<TElement>>: ~62,100,000 ns (multiple ownership, mutate via Rc::get_mut)
Rc<RefCell<Box<TElement>>>: ~76,880,000 ns (everything)

These results are approximately reflected in the War and Peace demo, where layout with the Box<TElement> reference type takes ~133ms, and ~200ms with the Rc<RefCell<Box<TElement>>> reference type; approximately an increase of 50%.

@ltratt
Copy link
Member

ltratt commented Oct 12, 2015

Interesting stuff. [I'm going to try the inline experiment again; it failed on Friday, but I'm not clear why.] I wonder if we want to merge this or leave it in a branch though? I presume the idea is that we use this to inform us which approach we think best suited to our situation?

@Britefury
Copy link
Collaborator Author

Thats pretty much it.
Given the above overhead figures, I am thinking that unsafe is the way forward. shudder

@ltratt
Copy link
Member

ltratt commented Oct 12, 2015

Overhead relative to...?

@Britefury
Copy link
Collaborator Author

It seems that Rc<RefCell<...>> adds about 50% overhead, at least to layout, in addition to adding problems further down the line when it comes to mutating the element tree. I am currently looking into an unsafe based solution that will have good performance and less API problems.
The DOM tree library ((forest)[https://crates.io/crates/forest]) that we discussed on Friday would be quite restrictive in that it only supports XML style trees. While in theory we could adopt this approach, I think that looking into something closer to what we want is worthwhile.

@ltratt
Copy link
Member

ltratt commented Oct 12, 2015

OK... although what's the problem with RefCell mutation? [Let's pretend performance isn't an issue, at least at this point.]

@Britefury
Copy link
Collaborator Author

Now that I have thought about this in more detail, I have reached the conclusion that there are some wrinkles to address, but I think it can be done.

(BTW, I have sort of used this reply as a scratch space to flesh out and develop my ideas, so pardon the text wall)

The forest library declares many of the methods on the NodeRef class rather than the Node class. NodeRef has a single field; an Rc<RefCell<Node>>. This way, self is a wrapped reference, as are the arguments. Its possible, but there will likely be wrinkles and additional work along the way.

One issue is the fact that the NodeRef will only implement the interface of the abstract base node type, rather than the interface of more specialised types. This can be overcome by having each specialised node type have its own trait, with a specialised trait retrieval method (e.g. get_fraction_node_interface()) that returns Option<FractionNodeInterfaceTrait> in the abstract type. Not ideal, but I was heading in this direction anyway. Not sure I can see another way.
Normally this would run into problems with parent <--> child relationships; FractionNodeInterfaceTrait could define a method add_numerator_child that sets a specific child of that element. All is fine, until we need to call child.set_parent(self); self is a &FractionNodeInterfaceTrait but we need a NodeRef. We cannot try to re-wrap self; even if we use unsafe to get around the ownership rules, we get two wrappers who are unaware of each others ref counts.
A possible solution is to have NodeRef implement a get_wrapped_fraction_node_interface that returns Option<FractionNodeRefInterfaceTrait> (a similar behaviour specific trait accessor method) where the methods of FractionNodeRefInterfaceTrait do take some sort of NodeRef as self. This could address the reciprocal parent <--> child relationship problem that was giving me grief on Friday afternoon. It would need to be done for each element that has parent <--> child relationships take a form that cannot be represented as a vector of children. Its not the prettiest solution, but it may work.

All this of course has the Rc<RefCell<...>> overhead. The other option is to wrap Box<...> in something lightweight that uses unsafe to get around all of that.

@ltratt
Copy link
Member

ltratt commented Oct 12, 2015

The crux of the matter, I think, is that each Node x would be able to return me an Rc pointing to x?

@Britefury
Copy link
Collaborator Author

Exactly.
The solution presented in the forest library is to define most of your methods on NodeRef rather than Node. This way, the object that you deal with has an Rc<RefCell<Box<NodeTrait>>> field and therefore the Rc can be accessed as a field of self, rather than defining methods on NodeTrait where self is a &NodeTrait from which its owning Rc is inaccessible. The additional development overhead comes in where you have to decide which methods belong on NodeTrait - mostly for performance reasons; you don't want to pay the Rc and RefCell overheads unless there is some definite benefit - and which belong on NodeRef.

My next step is to try to implement this and see how things turn out.

@ltratt
Copy link
Member

ltratt commented Oct 13, 2015

Can I suggest we make the Rc<RefCell<...>> be a method, not a field? It might give us some useful flexibility later.

Separately, what's the advantage of Rc<RefCell<Box<...>>> vs. Rc<RefCell<...>>?

@Britefury
Copy link
Collaborator Author

Box is used when you want polymorphism. You want a node reference to be able to refer to a variety of node types, all of which implement some base trait, e.g. NodeTrait. Box allows you to own an object by some abstract type, e.g. Box<NodeTrait>. Without Box you need to specify the exact struct that is being used, e.g. TextNode. This causes problems when nodes need to refer to one another. You want a node to have child nodes of any valid node type; they must refer to objects of type NodeTrait. Borrowing a reference to the child and putting it in the parent is tricky, since that either consumes your one mutable reference or requires an immutable reference - thereby limiting what methods the parent node can call - and preventing you from borrowing any other references. As a consequence, we must use RefCell to get interior mutability to get around this problem. RefCell however requires an exact type; you cannot have RefCell<NodeTrait>, therefore we insert a Box in there to get what we need. Rc of course gets multiple ownership, but like RefCell cannot contain a trait, so box is needed here too.

Make it a method? I don't quite understand....

@ltratt
Copy link
Member

ltratt commented Oct 13, 2015

OK -- makes sense.

What I mean is that instead of saying x.ref_cell do something like x.ref_cell().

@Britefury
Copy link
Collaborator Author

I already do this mostly; my existing solution is approximately:

struct NodeRef {
   x: Rc<RefCell<Box<NodeTrait>>>   // internal field type
}

// alias the borrow types, as they can vary on the type of `x`, e.g. Ref/RefMut
// when using RefCell, &Box<NodeTrait> when using something simpler
type NodeBorrow = Ref<Box<NodeTrait>>;
type NodeBorrowMut = RefMut<Box<NodeTrait>>; 

impl NodeRef {
   // methods for acquiring borrows
   fn get(&self) -> NodeBorrow {
      return self.x.borrow();
   }

   fn get_mut(&mut self) -> NodeBorrowMut {
      return self.x.borrow_mut();
   }
}

@ltratt
Copy link
Member

ltratt commented Oct 13, 2015

I see. Makes sense.

Geoffrey French added 2 commits October 13, 2015 12:12
Borrow type aliases renamed.
…at would be more representative of a practical implementation.
@Britefury
Copy link
Collaborator Author

Sadly, this is turning out to be trickier than first expected.
The current (sort of) plan is to have an abstract NodeTrait that is the base type for all nodes. We need to wrap it in Box for ownership since its a trait not a struct and therefore must be heap allocated, a RefCell for interior mutability and Rc for multiple ownership. After construction, a node is wrapped within these. Whenever a node is referred to by other nodes, etc, it must be in this form. We declare the NodeRef type that contains a field that is a Rc<RefCell<Box<NodeTrait>>> for convenience. Nodes that have children must use this reference type. Conversely, nodes must refer to their parents in the same way. This ensures that nodes can call up and down the tree - mutably if necessary - in order to pass events around, notify parents of changes to schedule redraws and layouts, etc.

Constructing a tree based on the above is fine. When a node is constructed, it is immediately wrapped in a NodeRef. It must be referred to be cloning this reference from now on. When we make a node, we must pass a reference to its parent node, so that it knows who its parent is. We can pass this to the constructor.

Problems arise when we want to mutate the tree structure after its constructed. Lets say we have a branch node, e.g. ColumnNode that defines a method called set_children(&mut self, children: Vec<NodeRef>. It can copy the contents of the children parameter into its appropriate parameter. Problems arise however when we want to ensure that the child nodes have reciprocal parent references. The ColumnNode::set_children method has a self parameter whose type is &mut ColumnNode. We need it in NodeRef form. Even if we used unsafe to bypass Rust to create a new NodeRef wrapper, we now have two wrappers containing the same instance, each unaware of each other's reference counts, etc. We really need self to refer to the NodeRef wrapper, not the ColumnNode.

This leads us to moving all tree structure mutation methods out of the node type - e.g. ColumnNode - into the node reference types. This way, when invoking set_children, self is a NodeRef which is what must be passed to child nodes set_parent method in order to maintain the reciprocal parent->child link.

We then hit another snag; not all nodes have children and those that do may not structure their children in the same way. Leaf nodes - e.g. text nodes for displaying tokens - don't have any children. A column node would need a list of children in a Vec, while a table node would want its children in a 2D array, possibly augmented with additional information about cell layout. This indicates that the the tree structure mutation methods differ according to node type. As a result, we cannot just add a set_children method to NodeRef and have done with it; such a structure would only be valid for nodes that represent their children as a list and would not suffice for other node types.

A solution to this problem is to break our node reference types into a type hierarchy in much the same way we do with nodes. In fact, the node reference type hierarchy would mirror that of the node type hierarchy to some extent. We leave the non-tree-structure methods in the node types and have the tree-structure methods in the node reference types. As a consequence, TextNodeRef has no tree structure methods, ColumnNodeRef defines a set_children method that takes a Vec while TableNodeRef could define set_row(), set_column(), set_cell() etc. Each of these methods would retrieve a reference to the underlying node, modify its children, then pass itself - the node reference that is - to the set_parent method on the child nodes.

Having a node reference type hierarchy indicates that we must now have a base node reference trait; NodeRefTrait. All node references should implement this. When a node reference is created it must be represented as a Box<NodeRefTrait>, so that we have one abstract type for representing node references, irrespective of their concrete type.

Unfortunately, this means quite a lot of wrappers. Each node reference is referred to by Box<NodeRefTrait>. Each node reference will contain a field which is a Rc<RefCell<Box<NodeTrait>>> that owns the actual node:

// Abstract node type
trait NodeTrait {
    fn set_parent(&mut self, parent: Option<Box<NodeRefTrait>>);
    // various functions common to all elements...
}

// Concrete node type
struct ColumnNode {
    // children and paren
    children: Vec<Box<NodeRefTrait>>,
    parent: Option<Box<NodeRefTrait>>
    // other fields...
}

impl ColumnNode {
    fn new(parent: Option<Box<NodeRefTrait>>) -> ColumnNode {
        return ColumnNode{parent: parent, children: Vec::new()};
    }
    // ColumnNode specific stuff...
}

impl NodeTrait for ColumnNode {
    fn set_parent(&mut self, parent: Option<Box<NodeRefTrait>>) {
        self.parent = parent;
    }
}


// Abstract node reference type
trait NodeRefTrait {
   fn get(&self) -> Ref<Box<NodeTrait>>;   // borrow ref to the node
   fn get_mut(&mut self) -> RefMut<Box<NodeTrait>>;   /// borrow mut ref to node
}

// Concerete node reference type
struct ColumnNodeRef {
    // node wrapped in box, refcell, rc
    wrapped_node: Rc<RefCell<Box<NodeTrait>>>
}

impl ColumnNodeRef {
    fn new(node: ColumnNode) -> Box<NodeRefTrait> {
        return Box::new(ColumnNodeRef{wrapped_node: node});
    }

    // tree structure mutation methods:
    fn set_children(&self, children: Vec<Box<NodeRefTrait>>) {
        let abstract_node: RefMut<Box<NodeTrait>> = self.get_mut();
        // we really need the concrete ColumnNode here, so we use some as-of-yet untested
        // unsafe magic to open up the RefMut and the Box
        let node: &mut ColumnNode = unsafe{ transmute(abstract_node.as_unsafe_cell().get().into_raw()) };
        // AFAICT, we are not going to get away without using unsafe, whatever approach we take

        // clear parents of existing children
        for child in node.children {
            child.set_parent(None);
        }
        node.children = children;
        // set parents of new children
        for child in node.children {
            child.set_parent(Some(self));   // assume set_parent takes an Option<&NodeRefTrait> parameter
        }
    }
}

impl NodeRefTrait for ColumnNodeRef {
    fn get(&self) ->  Ref<Box<NodeTrait>> {
        return self.wrapped_node.borrow();
    }

    fn get_mut(&self) ->  RefMut<Box<NodeTrait>> {
        return self.wrapped_node.borrow_mut();
    }
}

Now this is all fine and good. The extra Box around the NodeRefTrait adds another pointer indirection and probably another allocation (one alloc for the node itself, one for the Rc shared box and another for the reference box), but the overhead is probably manageable. Things look a little worse however when we consider that the borrow methods (get and get_mut) defined by the NodeRefTrait now involve a virtual call (AFAICT; the drop in performance in the benchmark certainly suggests this). This virtual call each time we ask a node reference for a reference to its contained node seems to add quite a bit of overhead. It would seem that when performing layout, all these extra indirections add an overhead of almost 100%.

@ltratt
Copy link
Member

ltratt commented Oct 13, 2015

I must admit, I no longer know why we split up Node from NodeRef.

@ltratt
Copy link
Member

ltratt commented Oct 13, 2015

Can we not try this stuff in the context of our simpler tree repo example? I wonder if we are again going into a little too much depth a little too soon.

@ltratt
Copy link
Member

ltratt commented Oct 14, 2015

OK, I've been thinking about this a bit more, and I'm wondering if we can simplify things to something like:

type WrappedNode = Rc<RefCell<Box<Node>>>;

trait Node {
    fn get_wrapped(&self) -> WrappedNode;
    fn set_parent(&mut self, parent: Node);
}

struct ColumnNode {
    self_wrapped : Option<WrappedNode>,
    parent: Option<WrappedNode>
}

impl ColumnNode {
  fn new(parent: Rc<RefCell<Box<Node>>>) {
    let mut s = ColumnNode { self_wrapped: None, parent: parent };
    s.self_wrapped = Rc::new(RefCell::new(s));
    s
  }
}

impl Node for ColumnNode {
  fn set_parent(&mut self, parent: Option<WrappedNode>) {
    self.parent = parent;
  }
}

What I'm imagining is each Node being able to give a Wrapped version of itself. The bit I'm a little unsure about is whether the approach I've taken on ColumnNode::new would compile. It's probably not difficult to imagine a variant which will work (at worse, unsafe would do the trick).

If we can do something like this, then I can always guarantee to get a reference to the right Rc/RefCell for any given Node.

@Britefury
Copy link
Collaborator Author

My new solution:

Rc can also take on the responsibilities of Box, in that Rc<SomeTrait> will suffice; no need for Rc<Box<SomeTrait>>, as per this PR.

Given that the new ref type is Rc<Node>, this leaves the question of how do we get mutability? We could use RefCell, but we need to bring Box back, resulting in an additional allocation. I have a better idea for getting 'interior mutability'. Enter my very evil solution:

    pub fn muzzle<'a, T: ?Sized>(x: &'a T) -> &'a mut T {
        return unsafe{transmute(x)};
    }

Basically, it transmutes an immutable ref into a mutable one. It is to be used sparingly, and never with types that cross a thread boundary. Asides that however, it addresses the problem quite nicely. The muzzle function allows us to get mutable refs by forcing it. It also gets us around the borrow checker when we need it.

@ltratt
Copy link
Member

ltratt commented Oct 15, 2015

Is that a static method? Doesn't it also mean that I can now get multiple mutable aliases to a given run-time object?

@Britefury
Copy link
Collaborator Author

Its just a function. You could use it to do such things, yes.

@ltratt
Copy link
Member

ltratt commented Oct 15, 2015

OK, then we need to think very carefully about this, because it allows us to break many of Rust's rules unknowingly.

@ltratt
Copy link
Member

ltratt commented Oct 15, 2015

Let's set up some tests where we deliberately alias pointers and see what happens. I think I will need some convincing that we're not going to blow our feet off at some random point.

@ltratt
Copy link
Member

ltratt commented Feb 22, 2016

I think this PR can be closed?

@ltratt ltratt closed this Feb 22, 2016
@ltratt
Copy link
Member

ltratt commented Feb 22, 2016

Oops, didn't mean to close it quite yet. Geoff, what are your thoughts?

@ltratt ltratt reopened this Feb 22, 2016
@Britefury
Copy link
Collaborator Author

Yes, we can close it. This was resolved late December when the tree structure stuff was figured out.

@Britefury Britefury closed this Feb 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants