Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show categorical legend with hvplot's by keyword and datashader. #1238

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

JelmerBot
Copy link

Currently using hvplot with datashader and the by keyword does not show a legend. This PR makes the by keyword behave as if only a categorical aggregator was specified when datashader is used, ensuring a legend is created.

A small example where the change is effective:

import numpy as np
import pandas as pd
import hvplot
import hvplot.pandas
import datashader as ds
import holoviews.operation.datashader as hd

df = pd.DataFrame(
    np.random.normal(size=(20, 2)),
    columns=['x', 'y']
)
df['cat'] = df.x.apply(lambda x: 'c0' if x < 0 else 'c1')
df['cat'] = df.cat.astype('category')

df.hvplot.points(
    x='x',
    y='y',
    by='cat',
    # aggregator=ds.by('cat'),
    color_key={'c0': '#ff0000', 'c1': '#0000ff'},
    datashade=True,
    dynspread=True,
)

Shows categorical legend when hvplot's `by='column'` keyword is used with datashader enabled. Now results in the same output as `aggregator=ds.count_cat('columns')`.
@maximlt
Copy link
Member

maximlt commented Jan 28, 2024

Hi @JelmerBot, sorry I took so long to come back to you! And thanks for opening a PR :) I checked the effects of your changes on the code you shared and the results look good at first glance.

Before:
image

With the changes as of b098ea3:
image

However, I noticed that setting subplots=True no longer works after the changes:
image

I need to spend a bit more time trying to understand the various things at stake here (likely related to more or less recent developments in HoloViews too). Once it's clearer in my mind, I'll offer you to keep working on the PR, for instance it will require some unit tests.

@maximlt maximlt marked this pull request as draft January 28, 2024 21:57
@JelmerBot
Copy link
Author

Hi @maximlt, thanks for looking at my PR!

I indeed overlooked the subplots option and the interaction between parameters is more complex than I anticipated. For example aggregator= overrides the effect of by= when datashader is used, potentially removing any visual difference between the created overlays. Fundamentally, the questions is about how hvplot's overlays and datashader's categorical shading should be combined. Are they trying to achieve the same thing or is there a difference? Should both be possible simultaneously or are overlays in combination with datashader not ideal?

I chose to avoid creating overlays when using datashader, because that was simpler than trying to figure out the legend-creation code. In addition, it leaves all categorical blending to datashader. So, updating the logic to something like this would work:

if self.by and (self.subplots or not self.datashade):
    obj = Dataset(data, self.by+kdims, vdims).to(element, kdims, vdims, self.by, **params)
    if self.subplots:
        obj = obj.layout(sort=False)
    else:
        obj = obj.overlay(sort=False)
else:
    obj = element(data, kdims, vdims, **params)

Another change is needed to make the aggregator= and by= options work together at:

hvplot/hvplot/converter.py

Lines 1318 to 1333 in 95e39e7

if self.by and not self.subplots:
opts['aggregator'] = reductions.count_cat(self.by[0])
categorical = True
elif ((isinstance(self.y, list) and len(self.y) > 1 or self.y is None and len(set(self.variables) - set(self.indexes)) > 1) and
self.kind in ('scatter', 'line', 'area')):
opts['aggregator'] = reductions.count_cat(self.group_label)
categorical = True
if self.aggregator:
import datashader as ds
agg = self.aggregator
if isinstance(agg, str) and self._color_dim:
categorical = agg == 'count_cat'
agg = getattr(reductions, agg)(self._color_dim)
else:
categorical = isinstance(agg, (ds.count_cat, ds.by))
opts['aggregator'] = agg

For example, by='category' with aggregator=ds.mean('x') could result in ds.by('category', ds.mean('x')). On the other hand, by='category_1' with aggregator=ds.by('category_2') should probably fail, as there is no easy way to make the overlays visually distinct (right?), effectively removing the effect of by='category_1'.

Regarding unit tests for this PR, can you give me some pointers how to tackle those? Are there already tests for legend creation in hvplot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants