Show categorical legend with hvplot's `by` keyword and datashader. #1238

JelmerBot · 2023-12-24T16:29:21Z

Currently using hvplot with datashader and the by keyword does not show a legend. This PR makes the by keyword behave as if only a categorical aggregator was specified when datashader is used, ensuring a legend is created.

A small example where the change is effective:

import numpy as np
import pandas as pd
import hvplot
import hvplot.pandas
import datashader as ds
import holoviews.operation.datashader as hd

df = pd.DataFrame(
    np.random.normal(size=(20, 2)),
    columns=['x', 'y']
)
df['cat'] = df.x.apply(lambda x: 'c0' if x < 0 else 'c1')
df['cat'] = df.cat.astype('category')

df.hvplot.points(
    x='x',
    y='y',
    by='cat',
    # aggregator=ds.by('cat'),
    color_key={'c0': '#ff0000', 'c1': '#0000ff'},
    datashade=True,
    dynspread=True,
)

Shows categorical legend when hvplot's `by='column'` keyword is used with datashader enabled. Now results in the same output as `aggregator=ds.count_cat('columns')`.

maximlt · 2024-01-28T21:57:22Z

Hi @JelmerBot, sorry I took so long to come back to you! And thanks for opening a PR :) I checked the effects of your changes on the code you shared and the results look good at first glance.

Before:

With the changes as of b098ea3:

However, I noticed that setting subplots=True no longer works after the changes:

I need to spend a bit more time trying to understand the various things at stake here (likely related to more or less recent developments in HoloViews too). Once it's clearer in my mind, I'll offer you to keep working on the PR, for instance it will require some unit tests.

JelmerBot · 2024-02-05T12:30:13Z

Hi @maximlt, thanks for looking at my PR!

I indeed overlooked the subplots option and the interaction between parameters is more complex than I anticipated. For example aggregator= overrides the effect of by= when datashader is used, potentially removing any visual difference between the created overlays. Fundamentally, the questions is about how hvplot's overlays and datashader's categorical shading should be combined. Are they trying to achieve the same thing or is there a difference? Should both be possible simultaneously or are overlays in combination with datashader not ideal?

I chose to avoid creating overlays when using datashader, because that was simpler than trying to figure out the legend-creation code. In addition, it leaves all categorical blending to datashader. So, updating the logic to something like this would work:

if self.by and (self.subplots or not self.datashade):
    obj = Dataset(data, self.by+kdims, vdims).to(element, kdims, vdims, self.by, **params)
    if self.subplots:
        obj = obj.layout(sort=False)
    else:
        obj = obj.overlay(sort=False)
else:
    obj = element(data, kdims, vdims, **params)

Another change is needed to make the aggregator= and by= options work together at:

hvplot/hvplot/converter.py

Lines 1318 to 1333 in 95e39e7

    
           if self.by and not self.subplots: 
        
               opts['aggregator'] = reductions.count_cat(self.by[0]) 
        
               categorical = True 
        
           elif ((isinstance(self.y, list) and len(self.y) > 1 or self.y is None and len(set(self.variables) - set(self.indexes)) > 1) and 
        
                 self.kind in ('scatter', 'line', 'area')): 
        
               opts['aggregator'] = reductions.count_cat(self.group_label) 
        
               categorical = True 
        
           if self.aggregator: 
        
               import datashader as ds 
        
               agg = self.aggregator 
        
               if isinstance(agg, str) and self._color_dim: 
        
                   categorical = agg == 'count_cat' 
        
                   agg = getattr(reductions, agg)(self._color_dim) 
        
               else: 
        
                   categorical = isinstance(agg, (ds.count_cat, ds.by)) 
        
               opts['aggregator'] = agg

For example, by='category' with aggregator=ds.mean('x') could result in ds.by('category', ds.mean('x')). On the other hand, by='category_1' with aggregator=ds.by('category_2') should probably fail, as there is no easy way to make the overlays visually distinct (right?), effectively removing the effect of by='category_1'.

Regarding unit tests for this PR, can you give me some pointers how to tackle those? Are there already tests for legend creation in hvplot?

JelmerBot added 2 commits December 24, 2023 16:53

Align hvplot by and datashade by

1241c76

Shows categorical legend when hvplot's `by='column'` keyword is used with datashader enabled. Now results in the same output as `aggregator=ds.count_cat('columns')`.

Also update simple charts

b098ea3

maximlt marked this pull request as draft January 28, 2024 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show categorical legend with hvplot's `by` keyword and datashader. #1238

Show categorical legend with hvplot's `by` keyword and datashader. #1238

JelmerBot commented Dec 24, 2023

maximlt commented Jan 28, 2024

JelmerBot commented Feb 5, 2024

Show categorical legend with hvplot's by keyword and datashader. #1238

Are you sure you want to change the base?

Show categorical legend with hvplot's by keyword and datashader. #1238

Conversation

JelmerBot commented Dec 24, 2023

maximlt commented Jan 28, 2024

JelmerBot commented Feb 5, 2024

Show categorical legend with hvplot's `by` keyword and datashader. #1238

Show categorical legend with hvplot's `by` keyword and datashader. #1238