Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate legend entries for histplot #3115

Closed
jedludlow opened this issue Oct 27, 2022 · 2 comments · Fixed by #3116
Closed

Duplicate legend entries for histplot #3115

jedludlow opened this issue Oct 27, 2022 · 2 comments · Fixed by #3116
Labels
Milestone

Comments

@jedludlow
Copy link

jedludlow commented Oct 27, 2022

Observed behavior

When creating a histplot with a legend, duplicate labels show up in the legend. It seems to be creating entries in the legend for more than one BarContainer artist.

Expected behavior

A single entry in the legend for the histogram. In my use case, this histogram is part of a collection of overlaid plots on the same axes, and the legend is required so that the various plot artists are clearly labeled.

Minimal example

import numpy as np
import seaborn as sns

rng = np.random.default_rng(43)
data = rng.random(1000)
ax = sns.histplot(data, label="Random Data", stat="density")
ax.legend()

# Multiple artists are shown here
ax.get_legend_handles_labels()

Outputs from minimal example

image

([<BarContainer object of 1 artists>, <BarContainer object of 11 artists>],
 ['Random Data', 'Random Data'])

Version

Version where behavior is observed: 0.12.1
This behavior was not present in 0.11.2.

@bjsco
Copy link

bjsco commented Oct 28, 2022

It appears that there is an issue with how the plots are being created.

In distributions.py line 558, this method ax = self._get_axes(sub_vars) is called, which triggers creation of the plot object. If you inspect the ax object, it gets initialized with and empty BarContainer object.

e.g.

([<BarContainer object of 1 artists>], ['Random Data'])

Later in lines 566-582, the kwargs are build to actually make the BarChart itself. This code ends up adding a second plot to the origins

artist_kws = self._artist_kws(
                plot_kws, fill, element, multiple, sub_color, alpha
            )

            if element == "bars":

                # Use matplotlib bar plotting

                plot_func = ax.bar if self.data_variable == "x" else ax.barh
                artists = plot_func(
                    hist["edges"],
                    hist["heights"] - bottom,
                    hist["widths"],
                    bottom,
                    align="edge",
                    **artist_kws,
                )

If you remove the **artist_kws from artists`, it plots only one plot, though it ignores all the artist colorings from seaboard.

If you trace down the issue, it has to do with how _oldcore.py is building the axis object with _get_axes

This code in particular

  def _get_axes(self, sub_vars):
        """Return an Axes object based on existence of row/col variables."""
        row = sub_vars.get("row", None)
        col = sub_vars.get("col", None)
        if row is not None and col is not None:
            return self.facets.axes_dict[(row, col)]
        elif row is not None:
            return self.facets.axes_dict[row]
        elif col is not None:
            return self.facets.axes_dict[col]
        elif self.ax is None:
           return self.facets.ax
        else:
           return self.ax

Seems to be the issue in that the return self.ax triggers the object creation which places an empty BarContainer in the plot with a label (in the test case above 'Random Data'.

It looks like fixing this will require redesign of how the plot object is built? It looks like a lot of work was done in Matplotlib on the function calls that Seaborn is interfacing with in Matplotlib. In version 0.12.1 it appears that _core.py was renamed to _oldcore.py and that the _get_axes() function was substantially rewritten.

I tried poking around to fix the issue, but I had trouble keeping on top of how objects were being passed around to come up with a solution that does not break other types of distribution plots.

@mwaskom
Copy link
Owner

mwaskom commented Oct 28, 2022

Thanks for looking into this @bjsco but I think you're a little off track there. This was very likely introduced in #2449.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants