You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we can declare that a Mark property should form groups at the property level. The reason we need to do this is that different marks behave differently: e.g. each line has the same values for all its properties and is added separately, while a scatterplot can mix multiple property values in a single artist. But I don't think we have or will encounter cases where only a subset of a marks properties should group, so it is cumbersome to have to set grouping= for every property.
Instead, I think this can be determined within Mark._plot by adding a parameter to the split_gen generator, where the mark passes in the properties that should be grouping.
This also touches on a broader issue which is that the current grouping is relatively inefficient (e.g. see #2881). Ideally, we would do scaling over all data points and then group, which can be faster. The challenge has been that we no longer have a dataframe after scaling. The main reason is that working with colors as rgba tuples / n x 4 arrays is difficult in the context of a dataframe ... you can stick the tuples in a series, but then it has an object dtype that propagates through to the numpy array and works poorly with matplotlib. A few options would be:
Implement our own groupby logic on the dict of arrays / lists that we have after scaling
Store rgba values as separate columns in the dataframe we build while scaling (we could perhaps use a differnet internal color representation to facilitate things like luminance properties)
Implement some kind of RGBA extension array that lets a Series hold a 2d data structure (is this possible? I am not sure)
The text was updated successfully, but these errors were encountered:
Currently we can declare that a
Mark
property should form groups at the property level. The reason we need to do this is that different marks behave differently: e.g. each line has the same values for all its properties and is added separately, while a scatterplot can mix multiple property values in a single artist. But I don't think we have or will encounter cases where only a subset of a marks properties should group, so it is cumbersome to have to setgrouping=
for every property.Instead, I think this can be determined within
Mark._plot
by adding a parameter to thesplit_gen
generator, where the mark passes in the properties that should be grouping.This also touches on a broader issue which is that the current grouping is relatively inefficient (e.g. see #2881). Ideally, we would do scaling over all data points and then group, which can be faster. The challenge has been that we no longer have a dataframe after scaling. The main reason is that working with colors as rgba tuples / n x 4 arrays is difficult in the context of a dataframe ... you can stick the tuples in a series, but then it has an object dtype that propagates through to the numpy array and works poorly with matplotlib. A few options would be:
luminance
properties)The text was updated successfully, but these errors were encountered: