Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An intermediate level between Gadfly & Compose? #680

Open
timholy opened this issue Sep 4, 2015 · 23 comments
Open

An intermediate level between Gadfly & Compose? #680

timholy opened this issue Sep 4, 2015 · 23 comments

Comments

@timholy
Copy link
Collaborator

timholy commented Sep 4, 2015

I'm thinking about interactivity a fair bit these days. I'm still learning the code base, but I think I understand enough now to ask an architectural question: might we need a layer of "rendering" that lies between the top Gadfly level and Compose?

Here's one example: say I have a histogram with 10^9 points in it, and I want ~10^4 bins. I want to render this to some Gtk canvas that I can resize and zoom in on. If I have the canvas redraw starting from the top "Gadfly" level, it's going to have to re-bin the points, which from a performance perspective will be too awful to contemplate. So alternatively, I render to a Compose.Context, which makes redrawing very fast. However, now when I zoom in on a small region (my monitor does not have 10^4 pixels horizontally, so I can get more information by zooming), I can't update my tick labels because they've already been rendered in the Context.

I've played around with adding tags (Geom.histogram(tag=:myhist)) to Form objects and then searching the tree for tagged objects and updating them in-place. This is clearly a workable strategy. But I wonder if there's a general need for an intermediate "pre-processed" layer? As another example, when plotting a bunch of points in 2d, you might want to put them in a quad-tree so that updates get fast when you zoom in on a small region.

If there is an intermediate layer, it seems it could do things like computing data limits, etc. Basically any operation that will be of a different order than rendering itself could be.

@dcjones
Copy link
Collaborator

dcjones commented Sep 7, 2015

I've thought a little about this. I think partial re-renders might get very complicated, but with the right indexed data structure, subsetting and binning operations could possibly be made cheap enough that the entire plot could be re-rendered efficiently enough. That could also allow plotting data that's too big to read entirely into memory. One option I've looked at is sqlite's r-tree index, but it might be more efficient to implement something from scratch.

On the gadfly side all that needs to be done is to generalize things like binning, minimum, and maximum which currently assume the input is an array, which seems fairly manageable.

@timholy
Copy link
Collaborator Author

timholy commented Sep 7, 2015

Looking at it a bit more, I think one fairly easy step in this direction is to split render into render_prepare followed by the stuff that happens starting at the call to render_prepared.

@protogeezer
Copy link

I'd like to participate in the discussion by offering an alternative - I've met with Dan a couple of times to discuss the concept of replacing the Cairo based graphics layer with one based on Qt. Qt (vector) graphics would likely speed things up by making it possible to exploit the Qt notion of a graphics view, something completely missing from the Gnome/GTK window model.

Check out the Qt sample code called "40,000 chips." It's in the right ball park to demonstrate what performance could be like with window manager and hardware support for functions like zoom, rotate, and scale. It also demonstrates how the render would be done once and then the resultant qt graphic objects would be repainted as required. Thus the thought that Qt would be an alternative to trying to speed up user interactions with the intermediate layer between Gadfly and Compose.

There are other benefits of using Qt in Julia - one big one would be that Qt has a powerful visual programming paradigm (QML/Quick/Designer).

@lobingera
Copy link

@protogeezer
You should read-around on julia-users and maybe also on julia related repositories on githup: Enabling access to Qt libraries is major issue with julia that provides direct access to C-APIs but not C++ APIs like Qt. Gtk/Cairo is a simple and good working solution that works cross-OS and can be installed without access to a local C/C++ compiler. Qt has an own build process, C++ APIs and a lot of other 'special' things. And with that it creates some dependencies, that are not easy to avoid.

It's not like i disagree with you, Qt might be in the long run the option for everything on plotting/drawing/painting on screen. But for that it must be enabled to be used and this work is outside the scope of Gadfly/Compose.

@protogeezer
Copy link

@timholy, @Keno
Greetings - I'm here in Seattle this week, and previous plans to meet with Dan seem to have fallen through. I'm eager to contribute to Gadfly but not so much as to undertake a major development effort without some level of agreement that I'm working on something that will ultimately be useful.

The topics of the discussions with Dan were about creating a high performance Qt backend for Compose. As I got further into the design process it became more apparent that to achieve maximum performance and functionality that moving much of Compose into Qt made ever more sense. Similarly, eliminating all remnants of X11 also was necessary.

So, my question is, in the absence of Dan, who can I correspond with about my plans?

@lobingera
Copy link

@protogeezer

So you already have a working solution on interfacing julia code to Qt5?

@tbreloff
Copy link
Contributor

tbreloff commented Jan 8, 2016

I started messing around with Cxx and Qt5 a couple months ago. If it helps at all: https://github.com/tbreloff/Qt5.jl

I was able to do some very basic drawing, but it is nowhere close to a usable product. However maybe you want to use some of the paint callback interface.

On Jan 8, 2016, at 1:38 AM, Andreas Lobinger [email protected] wrote:

@protogeezer

So you already have a working solution on interfacing julia code to Qt5?


Reply to this email directly or view it on GitHub.

@lobingera
Copy link

I think i remember trying to test this, but both Cxx and Qt5 (which assumes a compositing WM) gave me headaches. Let's see what protogeezer has available.

@protogeezer
Copy link

I have a partial vision of one path to provide a julia graphics infrastructure. I really need to debate the alternatives before I (or anyone that wants to jump onboard) launches off on a pretty big development effort. I'll explain briefly. The ultimate design goal is to figure out what will accomplish what Dan and I talked about as (potential) graphics features for Gadfly.

First, I'm not intending to belly up to providing Qt bindings for Julia - unless there is work being done on the Cxx module. Bindings would require, I think, being able to extend Cxx classes in Julia. Julia isn't an OO language and I haven't seen any indication that is the direction things are going. So, I'm making the assumption that I'm going to start off trying to export a set of C functions back to Compose/3D. @Keno

The top level features we discussed: sufficient throughput to support driving animation directly from a plot, 3D, using MathJax (via the QtWebKit classes) for latex annotation (because Qt has an SVG rendering widget that can display -optionally- SVG output from MathJax), much easier compound layouts, using the Qt file generators rather than the myriad backends that exist now, and, as Tim suggested before, support for large scale plots.

I tried to figure out a way to make things work by tweaking GTK. I just couldn't get there from here. There is too much X11 baggage to navigate which makes it hard to envision how to exploit the capabilities of the underlying OS (obviously more so on MacOS and Windows). I kept finding that the features where already supported in Qt - and there were lots of examples of how to move time-critical features down to the native operating system. If I'm wrong about that conclusion I'd really like to know.

One of the side effects of Qt are it's alternatives for scripting. If handled properly those could turn into an uber-emersion capability.

3D may morph from building on the Qt 3D view classes to the VTK qt widget - VTK has many volumetric plots that are way to complex to tackle on a whim. In the long run, adding Blender to the mix could be possible for materials modelers since there is work being done to integrate with Blender with Qt.

The tests I've done so far have led me to conclude that it is necessary to circumvent the Julia RT to achieve reliably fast throughput for animation. (With 0.5 just write a loop that does a bit of computing and then print a value - on my 4GHz iMac there is an obvious stutter - 1,2,3,4,5,think-think-think, 6,7,8,9,10,think-think-think, etc)

One of the unknowns, at this point, is whether the REPL would need to be a Qt app to play nice with the Qt windows that would be used to present the graphics. If so, then would it make sense to replace libUV with the parallel Qt classes? A big but necessary step?

@Keno
Copy link
Collaborator

Keno commented Jan 8, 2016

@protogeezer
Copy link

Thanks, I'll try this to see if it makes it possible to manipulate the C++ objects directly.

@timholy
Copy link
Collaborator Author

timholy commented Jan 8, 2016

Question: can you rely on Qt for zooming? For example, let's say you're hoping to zoom in on data plotted on a graph. You'd like to zoom in on a small region of the data, but you still want the axes painted on-screen so you can see them (and in fact you want the axis ticks, limits to update to the zoom region).

@tbreloff
Copy link
Contributor

tbreloff commented Jan 8, 2016

You can access many layers of abstraction with Qt... You could choose to treat it as a replacement for Cairo, or a replacement for Compose, depending on the abstractions you choose. Qt supports a very complete layering of views in a scene, similar in concept to Compose, but you can access raw drawing commands as well. In theory there's nothing you can't do within Qt, but there are a lot of potential design decisions.

On Jan 8, 2016, at 3:05 PM, Tim Holy [email protected] wrote:

Question: can you rely on Qt for zooming? For example, let's say you're hoping to zoom in on data plotted on a graph. You'd like to zoom in on a small region of the data, but you still want the axes painted on-screen so you can see them (and in fact you want the axis ticks, limits to update to the zoom region).


Reply to this email directly or view it on GitHub.

@protogeezer
Copy link

Yes! Tom hit the nail on the head on all fronts. Qt offers may powerful features (in a completely cross platform way) - but there has to be structure to the way it's applied or chaos ensues.

The difference between Cairo and Qt is that Cairo is a vector front end to a pixel based backend - the app has to start at the beginning every time something changes. Qt is more OO so the scheme is that the app adds objects to the graphics view that know how to paint themselves when needed. Zooming in on some data is simple - translate and scale the coordinate system of the widget containing the plotted data, then invoke the repaint operation. No relayout. (Since Qt supports the intermediate layer that started this thread).

Qt also makes it possible to implement the sort of reconfiguring scheme Tim asks about. The object hierarchy would be a container of some sort at the top, with children implementing the axis/label objects and a widget for the plotted objects. In terms of the Compose scheme, there would be all the graphics routines that there are now - since nearly all used for the plotting the data content, but there would also be new higher level constructs for layouts of plots, and for the parts of the plots that need to reconfigure themselves based on a set of parameters.

This is where a good design is critical. To make the reconfiguring as fast and accurate as possible, the axis widget would have to exactly reproduce the range and tick marks that Gadfly would.

I'm on the road this week and next. I can do an example of a simple app to demo the concepts after I get home.

The rescaling is completely fluid using Qt on a Mac. Less so using X11 - but I haven't tried it using the new Wayland? or OpenGL based WMs. They aren't supported on the version of parallels I use.

@timholy
Copy link
Collaborator Author

timholy commented Jan 8, 2016

I've been thinking we should split out the algorithm that decides where to put tick marks into a separate package. Then it would be more modular. Also, I think there's a better algorithm in http://vis.stanford.edu/files/2010-TickLabels-InfoVis.pdf, but I haven't had any time to look into its implementation.

@lobingera
Copy link

@timholy, did you read all the limitations they list at the end of the paper? And that they actually don't have any good idea how to stop the infinite loop if the constraints of non-overlap are too massive?

I had once a project where i did a plotting facility from scratch (for reasonable fast update, mathplotlib was too slow) and i had the paper on the table, but like similar i read (i should check my bookmarks) there is no final solution to cover all tick/label problems.

Still i agree with you, modularisation in plotting to have dedicated algorithms for dedicated jobs would make some sense.

@timholy
Copy link
Collaborator Author

timholy commented Jan 9, 2016

I did not read it thoroughly, just far enough to see that at least certain elements seem to fix some issues that bothered me about Gadfly's current algorithm. I suspected that using "appearance" as a component of their criteria is the most complicated (but potentially rewarding) aspect, and it's not surprising they didn't get it perfect. One could skip that part (Gadfly doesn't currently consider appearance) and perhaps still achieve improvements.

@protogeezer
Copy link

Tim - it occurs to me we're talking a more complicated path than I first imagined if you literally meant it would be a common occurrence that a julia user would be trying to process 10^9 data points into a histogram. That magnitude of data is going to break julia not just gadfly.

To make that work there would be an intermediate layer but not between compose and gadfly - rather it would be between the file system and julia.

The sorts of (open) systems that handle part of that would be something like Kitware Paraview. Postgresql is a heavy iron dbms that could manage accessing views of huge tables. If that is a "primary" requirement, then some thought could be put into it.

@tbreloff
Copy link
Contributor

it occurs to me we're talking a more complicated path than I first imagined if you literally meant it would be a common occurrence that a julia user would be trying to process 10^9 data points into a histogram. That magnitude of data is going to break julia not just gadfly

If you try to load a too-large dataset into memory, when all you care about is a histogram, then yes you have a problem. This is one of those situations where OnlineStats (or maybe AverageShiftedHistograms) would be the appropriate solution. No need for something so heavyweight like a requirement on PostgreSQL. I, however, don't think this necessarily needs to be included in Gadfly. Plots maybe, but probably it should sit among other similar tools to aggregate and visualize large datasets.

@timholy
Copy link
Collaborator Author

timholy commented Jan 11, 2016

I just made that number up, but y'all are a bunch of wimps: I generally like to start my day by performing computations on two or three 100GB mmapped files before breakfast 😄.

More seriously, indeed there are tasks that are just too big to be "interactive." My point is that one might want to do some precomputations and then pass partially-digested results to Gadfly (e.g., pass a histogram taken at much higher resolution than you'd like to plot, and have Gadfly coarsen it appropriately given the current zoom region). All just examples, but you probably get the general point.

I should also say my immediate need was satisfied by #683, but I left this issue open because obviously a lot more could be done.

@protogeezer
Copy link

Tim:

I'm back from my travels, or I would have responded sooner.

(1) is it possible for you to make one of those he-man sized available so us wimps can work on our upper-body strength by experiencing the joy of over-the-top use case testing? There are big data features on OS X in particular that may be worth comparing to more open sourced and lighter weight alternatives such as HDF5... for example.

(2) I'm ready to either fork or clone Compose and Gadfly to begin implementing the (proposed) Qt backend. @tkelman, I think - since I can't find the thread right this instant, recommended a "gadfly mirror" to enable updates to gadfly while Dan is otherwise occupied. It would be helpful if that could be done (along with appointing the official committer). Even better, it would be great to work off a fork on the mirror.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2016

I think we only need to go to the length of using a different fork or mirror of any package repositories for cases where no one else has been granted commit access. For Gadfly itself, and probably Compose too, I think work can continue here assuming dcjones trusts the judgement of those who have been granted commit access.

@protogeezer
Copy link

Fair enough. For my purposes, Forking Gadfly and Compose will get things rolling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants