Thinking about improving data updates #647

monfera · 2016-06-15T11:39:19Z

tl;dr
There's more and more code that couples the aspect of plotting logic with the aspect of incrementally propagating changes, e.g. see all things going on in Plotly.restyle. Would be good to discuss ways to improve on the situation. Manual code leads to a tangle and some small, simple library focused on change propagation e.g. MobX would be worth looking into.

Plotting turns a stream of user intent into a stream of side effects such as DOM updates

Plotting can be conceived of as a black box:

input streams are plot specifications, typically the payloads in Plotly.plot, Plotly.restyle, Plotly.relayout, animation inducing user calls as well as DOM events such as window.resize and mousedown
output is a stream of side effecting operations, e.g. DOM mutations, WebGL API calls, and sometimes event callbacks
currently, some output is provided by encouraging users to read directly from internal object state but it's something to move away from, by providing a query API and/or event callbacks with meaningful data, so I'll ignore this

The use of 'stream' highlights the fact that with user pointer operations, restyle/relayout, animation etc. generally make plotting a temporal process, rather than something that can be modeled with a function with some input JSON and an output SVG - even if some of the uses are as simple as this special case.

Plotting logic is a directed acyclic graph of computation nodes

We have multiple pieces of input (e.g. data[0].x) at the input and DOM mutating calls as the output. However there's complex calculation in the middle that can be thought of as a DAG. For example,

the above x vector serves as the basis for calculating a [min, max] domain that will determine the bounds of the X axis
the x vector is also trivially input to scatter point positions, however, a scale transform converts domain values to e.g. pixel coordinates
for things like the boxplot, there may be various aggregations building atop of the x vector
aesthetics might depend on things like how long the x vector is; maybe defaulting from scatterplot to a density plot at some threshold

All such calculations themselves can be input to downstream calculations.

Plotting needs to be economical

While it would be possible to make a single function whose inputs are {domRoot, userIntentHistory}, it's impractical: response times with a naive implementation would be too high (keeping `userIntentHistory is merely of modest size impact). There's no way to recompute everything from scratch and expect a 60FPS frame rate when turning a WebGL plot or animating something.

This means that there needs to be some kind of caching, therefore state management. The sole purpose of maintaining state is caching (besides this, we may retain userIntentHistory to allow time travel, and of course the output streams are linked to calls that modify the DOM).

Means of reducing recomputation costs

Ideally we'd like to

Only recompute what's strictly needed. For example, if I add a new highest value to vector x it needs to lead to an increased visible X axis domain, provided it's set to automatic. However, if the newly inserted value is inside the bounds, there's no need to recalculate anything that depends only on the [min, max] domain. Sure, sometimes there's no harm due to speed of recalculation or lack of need for speed, but there are cases when it's useful to be fairly granular about recalculations due to some specific performance need. Solving these specific performance needs one by one, without a formal change propagation approach is brittle.
It may even be useful, necessary and easy to pick calculation algorithms to be incremental. For example, a newly arriving X value can be directly used to update the [min, max] bounds, as opposed to inserting it in the preexisting large vector and applying the vector extent calculation. Similarly, many types of aggregates can be calculated on-line as well as batch. For example, mean, variance and standard deviation.

Some possible tools

Handwritten userland JavaScript isn't quite good for managing a dependency graph, because given enough nodes and optimization rounds, there will be inevitable cache invalidation issues, and potentially, memory leaks. Keeping things consistent and in in sync is also a challenge especially in the presence of asynchronous events. Most importantly, coupling the plot logic aspect with the incremental recalculation aspect makes both aspects hard to decipher, debug and further develop.

There are a lot of tools that provide some kind of framework for calculating and propagating values that can change over time, responding to input, inspired by Functional Reactive Programming. Without endorsing any of these excellent libraries (xstream, most.js etc.) perhaps MobX would feel closest to the current architecture in that it gives you objects that have properties acting like calculated spreadsheet cells, and as @etpinard suggested in the 2.0 wishlist, object-oriented, but investigation would be needed to see how it fits. All these libs are around 10k compressed.

History

We've touched on related topics in the past; a few inspirations:

Wishlist suggestions such as using more OO; not storing data in the DOM; using pure functions; more complete data in callbacks: wishlist for potential breaking changes since v1 #420
Customer filed PRs for faster (re)calculation and way faster incremental calculation, e.g. on large WebGL meshes
Most of the Plotly.restyle function, whose 500 lines do heavy amounts of manual work, e.g. https://github.com/plotly/plotly.js/blob/master/src/plot_api/plot_api.js#L1736-L1814 and Color gradient for scatter3d snail trail lines [WIP] #617 (comment)
Previous discussions e.g. on the animation topic

The text was updated successfully, but these errors were encountered:

etpinard · 2016-06-15T14:42:51Z

@monfera Thanks for the feedback.

I'd vote 👎 for bringing any functional reactive programming library.

Like you point our update system is in dire need of a refactor, but an in-house seems best in terms of scaling and maintenance.

We already have decent building block namely nestedProperty and our attribute declaration system. I should be that hard to come up with a performant and flexible update framework for our needs.

mdtusz · 2016-06-15T14:49:45Z

Beat me to it. In any case:

While I agree that we should improve our data model, I'm not sure that using something like mobx is the right choice - reactive programming is great for UI and scenarios where updates don't require immense calculation and/or are direct in their codepaths, but for our uses, I imagine we would quickly find ourselves patching things to fit our use case where sometimes the data needs to be transformed by A, then B, then A again, before we can render.

What they provide also loses value when not working with in-memory state - there is plenty of plotly.js state wrapped up in SVG dom, so until we separate that out, the transition would be quite rocky.

I hate to reinvent the wheel, but I'm of the opinion that what we may need is closer to a tractor tread. I'd advocate instead for creating a strict pattern for updates that works for us, and it very likely will be more of a puppetmaster pattern.

monfera · 2016-06-15T21:19:40Z

Whether we expect it from our own utilities/patterns or an external library, do we have roughly similar notions in mind about the needs?

allow incremental additions, removal and changes of data points or entire traces, individually or in batches, happening over time
similarly, incremental or grouped changes to configuration or layout aspects
aid the minimization of compute work/time to be done on updates caused by these, partly to respond to changes quickly and partly to avoid disruption, flashing, relayout of the output and retain object constancy of axes, lines and points
ensure that data or rendered output does not become stale, i.e. make it hard to not channel in some dependency (if Y depends on X and Y changed, X needs to change unless explicit logic infers that no change is needed or there's express strategy such as throttling, debounce etc. is in place to defer or batch work)
make it easy to understand where values came from and what transformations they went through, if there's a bug and we need to know the point at which something suspicious got in

plenty of plotly.js state wrapped up in SVG dom, so until we separate that out, the transition would be quite rocky

Things likeexisting codebase, test case coverage, documentation, examples etc. incorporate a lot of work already spent, and lessons learnt. Which is why I'd like to learn more about Puppetmaster (is it this one?)? Also what do you mean by tractor tread in this context?

monfera · 2016-06-15T21:33:08Z

@mdtusz on a second thought, you more likely mean Puppet (vs Chef), idempotence concept etc.

mdtusz · 2016-06-16T17:48:48Z

I wasn't really referring to Chef/Puppet at all - those aren't really relevant here. I meant more just a pattern where some section of code is in charge of orchestrating our update operations - albeit in a cleaner and more organized way than we currently do. Perhaps using the term puppetmaster was misleading.

monfera · 2016-06-18T22:45:11Z

Closing it in favor of #648.

mdtusz added the status: discussion needed label Jun 15, 2016

monfera closed this as completed Jun 18, 2016

backnotprop mentioned this issue Sep 22, 2016

Show/Hide all traces as a single event #968

Closed

monfera mentioned this issue Jul 4, 2017

Customized Click, Hover, and Selection Styles or Traces #1847

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thinking about improving data updates #647

Thinking about improving data updates #647

monfera commented Jun 15, 2016

etpinard commented Jun 15, 2016

mdtusz commented Jun 15, 2016

monfera commented Jun 15, 2016

monfera commented Jun 15, 2016

mdtusz commented Jun 16, 2016

monfera commented Jun 18, 2016

Thinking about improving data updates #647

Thinking about improving data updates #647

Comments

monfera commented Jun 15, 2016

etpinard commented Jun 15, 2016

mdtusz commented Jun 15, 2016

monfera commented Jun 15, 2016

monfera commented Jun 15, 2016

mdtusz commented Jun 16, 2016

monfera commented Jun 18, 2016