Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature suggestion] Add structured and jittered layout options to categorical scatter plots #3332

Open
joelostblom opened this issue Dec 22, 2018 · 12 comments
Labels
type: enhancement Minor feature or improvement to an existing feature

Comments

@joelostblom
Copy link
Contributor

Currently, it is possible to create scatter plots with one categorical axis, but the data points are always laid out in a straight line. I think one of the main strengths of categorical scatter plots is that they promote showing all data points instead of (or in addition to) estimations of distribution statistics such as the mean, median etc. When all plots are laid out in one line, this advantage is severely reduced since there will be significant overlap and many point will be completely obfuscated.

Addition of random jitter (dot/stripplots) or a structured data point layout ((bee)swarmplots) would allow for the creation of more informative categorical scatter plots. Great examples of these can be found in the seaborn gallery: stripplot, swarmplot

Are these layout options possible to add to Holoviews or would they need to be added to the underlying plotting libraries?

@philippjfr philippjfr added the type: enhancement Minor feature or improvement to an existing feature label Dec 22, 2018
@philippjfr
Copy link
Member

The Scatter element already has a jitter option, but a swarmplot would be a welcome addition.

@joelostblom
Copy link
Contributor Author

Thanks for the quick response and sorry for missing that. Would a swarmplot preferably be added a a new element in Holoviews or as a transform function in Bokeh (similar to jitter)?

@philippjfr
Copy link
Member

Good question, I suspect implementing it as an element would be easier, it would simply follow the same structure as BoxWhisker and Violin plots.

@ahuang11
Copy link
Collaborator

Hi @joelostblom, I was wondering if you have plans to make a pull request to implement swarm plots? I made a prototype using pandas, but I think it might have to be translated into numpy if implemented in HoloViews though?

def swarm_plot(df, x, y, bin_size=0.2, spread=2, straighten=True):
    """
    Args:
        df (pd.DataFrame)
        x (str)
        y (str)
        bin_size (float) - y-threshold where dots are aggregated into a line
        spread (int) - how far dots are spread apart (not fully functional)
        straighten (bool) - whether each row of dots is rounded to nearest y
        
    Returns:
        swarm_plot (hv.Scatter)
    """
    x_offset = f'{x}_offset'
    y_bins = f'{y}_bins'

    df_srt = df.sort_values(x)
    bins = np.arange(df_srt[y].min(), df_srt[y].max(), bin_size)
    df_srt[y_bins] = pd.cut(df_srt[y], bins, include_lowest=True)
    df_srt_grp = df_srt.groupby([x, y_bins])
    size_delta = df_srt[x].diff().replace(0, np.nan).mean()
    for (size, y_bin), group in df_srt_grp:
        # prepare the maximum range that the dots can be in
        step = size_delta / len(group)
        offset_range = np.arange(-size_delta + step,
                                 size_delta + step,
                                 step * 2)
        
        # we want to select the percentiles around 50
        # so that dots are all centered around the original x
        # and not all messily scattered around the edges of x
        percentiles = pd.Series(range(50 - len(group),
                                      50 + len(group),
                                      spread))
        percentiles += spread / 2

        # prepare our new x
        offsets = np.percentile(offset_range, percentiles.values)
        df_srt.loc[group.index, x_offset] = (df_srt.loc[group.index, x] +
                                             offsets)

        if straighten:
            df_srt.loc[group.index, y] = y_bin.mid

    return df_srt.hvplot.scatter(x_offset, y, hover_cols=['size']
                                 ).options(size=2.5 * spread / 2.,
                                           padding=0.05)

image

@ahuang11
Copy link
Collaborator

ahuang11 commented Dec 30, 2018

But not very friendly towards stacking with a box whisker/violin because of categorical axes (and datetime)
image

@joelostblom
Copy link
Contributor Author

@ahuang11 I do not currently plan to make a PR on this. I recommend looking at the seaborn swarmplot implementation and the pybeeswarm package for inspiration. In my testing, the seaborn implementation was a lot faster and scaled much better to larger data sets. The pybeeswarm package has more layout options.

@philippjfr
Copy link
Member

@ahuang11 with 1.11 now being released I'm hoping to start generalizing the categorical support ASAP, so it might make sense to wait for that before building another plot class.

@ea42gh
Copy link
Contributor

ea42gh commented Dec 31, 2018

Re categorical axes: support for placing text and annotations would be great:
I currently fake it using axes with custom tickmarks

@aeronth
Copy link

aeronth commented Mar 5, 2021

Hello Holoviz folks. Any plans for adding beeswarm plots to category plotting as per the awesome plots available in Seaborn?

I am happy to contribute myself if someone could point out where to start in the code. :)

I much prefer Holoviews to most of the other python plotting libraries and it would be huge to have better support for Violin plots with overlaid point clouds, such as this plot:

seaborn-swarmplot-9
https://seaborn.pydata.org/generated/seaborn.swarmplot.html

@MarcSkovMadsen
Copy link
Collaborator

@jlstevens . Would you know how @aeronth could get started?

@mhebrard
Copy link

stacking violin and scatter works fine

from bokeh.sampledata.sprint import sprint as df

x = 'Medal'
y = 'Time'

violin = df.hvplot.violin(y=y, by=x, c=x, padding=0.5, legend=False)
dots = df.hvplot.scatter(x=x, y=y, c=x, legend=False).opts(jitter=0.5)

violin * dots

image

@jbednar
Copy link
Member

jbednar commented Apr 27, 2022

Seems like that ought to be an example in the gallery!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

8 participants