-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate semantics of state modifications #66
Comments
I should add that copy-on-write has some advantages too. First, it makes shared-memory parallelism really easy because threads won't trip over each other by making concurrent model changes. Second, it could also be implemented using a base class to maintain a conditional dependency structure on model parameters, so the code footprint might not be large. Third, copy time is likely to be small relative to peeling (but what about other models that don't involve peeling?). Computers have gotten pretty good at allocating&freeing memory chunks since I set out on my coding journey many moons ago. |
FWIW, Bio++ has some bits to deal with the event listener approach: It still suffers the graph traversal problem you mention... On Fri, Dec 14, 2012 at 6:01 AM, Aaron Darling [email protected]:
|
i didn't see that, thanks! it actually looks workable and might save us from reinventing the wheel. the graph traversal seems unavoidable to me in any of the three situations i outlined above, it's more a question of how burdensome the traversal is. i started to sketch out what an implementation of copy-on-dependency write would look like in code. It's a funky twist on the usual C++ clone(). Here's an example I came up with for a node class:
we would probably want to use shared_ptr and/or weak_ptr instead of new but I didn't work that out for this sketch... This function would be outlined in a base class of all model parameters, something like:
As with EDIT: there were a few problems with the above code... |
Here we go:
Derived parameter classes, e.g. Node will only need to implement the dependency_clone(invalid_to_new_map) function. We might actually break this up into two parts, one which does the clone and something like "depcopy_parts" which makes the two calls to depcopy(). This would make further subclassing of parameters possible. |
I'm not sure I totally follow - would |
yes, We could also create a hybrid of the versioning and EventListener approaches wherein each parameter has an integer version that gets bumped whenever it's notified of a change in a paramter it depends on. |
When the model state is modified by an MCMC move or SMC extension, the affected components can either be stored in a newly allocated object or remain in-place but have values change. The former approach follows "copy-on-write" semantics, the latter does not.
In copy-on-write, summaries and metadata on an object can be keyed on the object's memory address and will remain valid for the lifetime of the object. This is how Online_calculator's log likelihood cache currently works (* see below).
One major disadvantage of the copy-on-write approach becomes apparent when considering continuous rate parameters in phylogenetic models. Large parts of a phylogenetic model have conditional dependency on single continuous parameters such as mutation rate. When mutation rate changes the LL at a forest node becomes invalid. Because the LL are keyed on object memory address, preserving the functionality of this cache approach would require us to copy not just on write, but on a write to any model parameter for which a conditional dependency exists. For large trees it might become expensive to do such copying (though possibly still cheap relative to calculating log likelihoods), not to mention tedious. For example, why should one have to write and invoke code for a whole forest traversal & copy just to change one value. I will call this approach copy-on-dependency-write.
Before we start stacking in model parameters, it seems prudent to evaluate whether we want to continue with the copy-on-dependency-write approach.
There are some alternatives. If we want to allow unfettered write access to model parameters, we might actively notify objects that have registered an interest in that model parameter. This is the java-esque EventListener approach. The likelihood cache might register as a listener for nodes, which might also listen to the rate. When the rate changes the nodes would hear about it and have to propagate a message to the likelihood cache which could then delete any cached likelihood.
Another possibility is some kind of generic versioning system (e.g. a monotonically increasing integer) for objects & their dependencies. In this approach, each model parameter would contain a "revision" object which consists of an integer version number and also pointers to version objects for any model parameters on which a conditional dependency exists, along with the most recent version of those dependencies observed by the present object. Every time an object changes, the version number is incremented. Cached values such as the LL could still be keyed on object address, but the value stored would be both log likelihood and the associated object version. When deciding whether the cached likelihood is valid, the calculator can request the current version number of an object and compare it to the version of the cached likelihood. Upon a request for its object version, the model parameter will in turn request the version of any dependency variables and compare that version to what it has recorded as the most recently observed version of that dependency variable. If the dependency is newer, the version is incremented. I will call this approach lazy parameter versioning.
As compared to the EventListener approach, lazy parameter versioning has the advantage that many parts of the model can be updated without forcing a traversal of the conditional dependency graph for each and every change.
There are surely other ways we could handle this. Does anybody have other suggestions? In the absence of other opinions I would be inclined to go for lazy parameter versioning. In either the EventListener or lazy parameter versioning approaches, we could implement this as a base class for all model components that contains a versioning system and a means to register the dependencies among objects.
(*) In the current code we follow copy-on-write for branch length changes to nodes, mostly, except for Eb_bl_proposer which can use Online_calculator::invalidate() to clear a cached likelihood entry.
The text was updated successfully, but these errors were encountered: