Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pairplot fails with hue_order not containing all hue values in seaborn 0.11.1 #2419

Closed
frbourassa opened this issue Jan 6, 2021 · 6 comments · Fixed by #2848
Closed

pairplot fails with hue_order not containing all hue values in seaborn 0.11.1 #2419

frbourassa opened this issue Jan 6, 2021 · 6 comments · Fixed by #2848

Comments

@frbourassa
Copy link

frbourassa commented Jan 6, 2021

In seaborn < 0.11, one could plot only a subset of the values in the hue column, by passing a hue_order list containing only the desired values. Points with hue values not in the list were simply not plotted.

iris = sns.load_dataset("iris")`
# The hue column contains three different species; here we want to plot two
sns.pairplot(iris, hue="species", hue_order=["setosa", "versicolor"])

This no longer works in 0.11.1. Passing a hue_order list that does not contain some of the values in the hue column raises a long, ugly error traceback. The first exception arises in seaborn/_core.py:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

seaborn version: 0.11.1
matplotlib version: 3.3.2
matplotlib backends: MacOSX, Agg or jupyter notebook inline.

@frbourassa frbourassa changed the title pairplot fails with hue_order does not contain all hue values in seaborn 0.11.1 pairplot fails with hue_order not containing all hue values in seaborn 0.11.1 Jan 6, 2021
@mwaskom
Copy link
Owner

mwaskom commented Jan 6, 2021

Easiest workaround is probably

hue_order = ["setosa", "versicolor"]
sns.pairplot(iris.query("species in @hue_order"), hue="species", hue_order=hue_order)

@mwaskom mwaskom added this to the v0.12.0 milestone Jan 6, 2021
@mwaskom
Copy link
Owner

mwaskom commented Apr 14, 2021

Looking into this, it doesn't seem to be a problem with PairGrid per se ... I can replicate it using scatterplot directly:

sns.scatterplot(
    data=iris,
    x="sepal_length", y="sepal_width",
    hue="species", hue_order=["setosa", "versicolor"],
)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/code/seaborn/seaborn/_core.py in _lookup_single(self, key)
    145             # Use a value that's in the original data vector
--> 146             value = self.lookup_table[key]
    147         except KeyError:

KeyError: 'virginica'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/code/seaborn/seaborn/_core.py in _lookup_single(self, key)
    150             try:
--> 151                 normed = self.norm(key)
    152             except TypeError as err:

TypeError: 'NoneType' object is not callable

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-da02a893be41> in <module>
----> 1 sns.scatterplot(
      2     data=iris,
      3     x="sepal_length", y="sepal_width",
      4     hue="species", hue_order=["setosa", "versicolor"]
      5 )

~/code/seaborn/seaborn/_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~/code/seaborn/seaborn/relational.py in scatterplot(x, y, hue, style, size, data, palette, hue_order, hue_norm, sizes, size_order, size_norm, markers, style_order, x_bins, y_bins, units, estimator, ci, n_boot, alpha, x_jitter, y_jitter, legend, ax, **kwargs)
    770     kwargs["color"] = _default_color(ax.scatter, hue, color, kwargs)
    771 
--> 772     p.plot(ax, kwargs)
    773 
    774     return ax

~/code/seaborn/seaborn/relational.py in plot(self, ax, kws)
    574 
    575         if "hue" in self.variables:
--> 576             points.set_facecolors(self._hue_map(data["hue"]))
    577 
    578         if "size" in self.variables:

~/code/seaborn/seaborn/_core.py in __call__(self, key, *args, **kwargs)
     63         """Get the attribute(s) values for the data key."""
     64         if isinstance(key, (list, np.ndarray, pd.Series)):
---> 65             return [self._lookup_single(k, *args, **kwargs) for k in key]
     66         else:
     67             return self._lookup_single(key, *args, **kwargs)

~/code/seaborn/seaborn/_core.py in <listcomp>(.0)
     63         """Get the attribute(s) values for the data key."""
     64         if isinstance(key, (list, np.ndarray, pd.Series)):
---> 65             return [self._lookup_single(k, *args, **kwargs) for k in key]
     66         else:
     67             return self._lookup_single(key, *args, **kwargs)

~/code/seaborn/seaborn/_core.py in _lookup_single(self, key)
    151                 normed = self.norm(key)
    152             except TypeError as err:
--> 153                 if np.isnan(key):
    154                     value = (0, 0, 0, 0)
    155                 else:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@mwaskom
Copy link
Owner

mwaskom commented Apr 14, 2021

Hm, OK ... most functions loop over the various semantic mapping variables using the ordering lists (either what is supplied or what is derived from the input data). But scatterplot works a bit differently in that it only plots once, with the full dataset, and then modifies the scatter elements in the single collection that gets produced to reflect the semantic mappings.

So this is a little tricky ... the internal representation of the data needs to be subset at some point to remove rows that do not appear in the semantic mapping order lists, but it's not immediately obvious to me whether that's something that should happen within the core code (and if so where) or within the specific logic of scatterplot.

@kurchi1205
Copy link

Hi , I am a first time contributer . Can I work on this issue?

@mwaskom
Copy link
Owner

mwaskom commented Aug 21, 2021

Hi @kurchi1205 thanks for your interest. I think this would be a tough one for a first-timer, because it requires some detailed knowledge of the internals and an architectural decision.

@mwaskom
Copy link
Owner

mwaskom commented Jun 11, 2022

Closed with #2848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants