Rethink approach to grouping in plot phase of Plot #2894

mwaskom · 2022-07-11T00:34:44Z

Currently we can declare that a Mark property should form groups at the property level. The reason we need to do this is that different marks behave differently: e.g. each line has the same values for all its properties and is added separately, while a scatterplot can mix multiple property values in a single artist. But I don't think we have or will encounter cases where only a subset of a marks properties should group, so it is cumbersome to have to set grouping= for every property.

Instead, I think this can be determined within Mark._plot by adding a parameter to the split_gen generator, where the mark passes in the properties that should be grouping.

This also touches on a broader issue which is that the current grouping is relatively inefficient (e.g. see #2881). Ideally, we would do scaling over all data points and then group, which can be faster. The challenge has been that we no longer have a dataframe after scaling. The main reason is that working with colors as rgba tuples / n x 4 arrays is difficult in the context of a dataframe ... you can stick the tuples in a series, but then it has an object dtype that propagates through to the numpy array and works poorly with matplotlib. A few options would be:

Implement our own groupby logic on the dict of arrays / lists that we have after scaling
Store rgba values as separate columns in the dataframe we build while scaling (we could perhaps use a differnet internal color representation to facilitate things like luminance properties)
Implement some kind of RGBA extension array that lets a Series hold a 2d data structure (is this possible? I am not sure)

The text was updated successfully, but these errors were encountered:

mwaskom added internals objects-plot labels Jul 11, 2022

mwaskom mentioned this issue Jul 18, 2022

Add Est stat and Interval mark to show error bars #2912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink approach to grouping in plot phase of Plot #2894

Rethink approach to grouping in plot phase of Plot #2894

mwaskom commented Jul 11, 2022

Rethink approach to grouping in plot phase of Plot #2894

Rethink approach to grouping in plot phase of Plot #2894

Comments

mwaskom commented Jul 11, 2022