Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink approach to grouping in plot phase of Plot #2894

Open
mwaskom opened this issue Jul 11, 2022 · 0 comments
Open

Rethink approach to grouping in plot phase of Plot #2894

mwaskom opened this issue Jul 11, 2022 · 0 comments

Comments

@mwaskom
Copy link
Owner

mwaskom commented Jul 11, 2022

Currently we can declare that a Mark property should form groups at the property level. The reason we need to do this is that different marks behave differently: e.g. each line has the same values for all its properties and is added separately, while a scatterplot can mix multiple property values in a single artist. But I don't think we have or will encounter cases where only a subset of a marks properties should group, so it is cumbersome to have to set grouping= for every property.

Instead, I think this can be determined within Mark._plot by adding a parameter to the split_gen generator, where the mark passes in the properties that should be grouping.

This also touches on a broader issue which is that the current grouping is relatively inefficient (e.g. see #2881). Ideally, we would do scaling over all data points and then group, which can be faster. The challenge has been that we no longer have a dataframe after scaling. The main reason is that working with colors as rgba tuples / n x 4 arrays is difficult in the context of a dataframe ... you can stick the tuples in a series, but then it has an object dtype that propagates through to the numpy array and works poorly with matplotlib. A few options would be:

  • Implement our own groupby logic on the dict of arrays / lists that we have after scaling
  • Store rgba values as separate columns in the dataframe we build while scaling (we could perhaps use a differnet internal color representation to facilitate things like luminance properties)
  • Implement some kind of RGBA extension array that lets a Series hold a 2d data structure (is this possible? I am not sure)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant