-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
histplot common normalization ignores weights #2655
Comments
Yes, good catch. seaborn doesn't actually do anything with the weights other than pass them through to matplotlib, but I think you're right that they need to be used by |
BTW I found the stacked bars in your example a little hard to parse, this makes more sense to me as a demonstration of the issue: f, axs = plt.subplots(2, 2, constrained_layout=True, figsize=(7, 6))
params = dict(x=[1, 2, 3], weights=[1, 2, 2], discrete=True)
sns.histplot(**params, ax=axs.flat[0])
sns.histplot(**params, hue=["a", "b", "b"], ax=axs.flat[1])
sns.histplot(**params, hue=["a", "b", "b"], stat="proportion", ax=axs.flat[2])
sns.histplot(**params, hue=["a", "b", "b"], stat="proportion", common_norm=False, ax=axs.flat[3]) |
Great, I do something like this: # added code:
sum_weight = 0.
if common_norm:
for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
if "weights" in self.variables:
sum_weight += sub_data["weights"].sum()
# First pass through the data to compute the histograms
for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
...
# Apply scaling to normalize across groups (added code)
if common_norm and weights is None:
hist *= len(sub_data) / len(all_data)
elif common_norm:
hist *= weights.sum() / sum_weight this seems to work nicely. |
I think it would be cleaner to have default weights of 1, then you can just proceed from there by summing the weights for each group to compute the relevant numerator/denominator of the scaling factor and don't need to repeat conditionals in multiple places. |
I don't have any preference, this was my quick hack ;) |
The diff for fixing this is: diff --git a/seaborn/distributions.py b/seaborn/distributions.py
index 5f63289..8329807 100644
--- a/seaborn/distributions.py
+++ b/seaborn/distributions.py
@@ -424,6 +424,12 @@ class _DistributionPlotter(VectorPlotter):
warn_singular=False,
)
+ sum_weight = 0.
+ if common_norm:
+ for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
+ if "weights" in self.variables:
+ sum_weight += sub_data["weights"].sum()
+
# First pass through the data to compute the histograms
for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
@@ -464,12 +470,21 @@ class _DistributionPlotter(VectorPlotter):
hist = pd.Series(heights, index=index, name="heights")
# Apply scaling to normalize across groups
- if common_norm:
+ if common_norm and weights is None:
hist *= len(sub_data) / len(all_data)
+ elif common_norm:
+ hist *= weights.sum() / sum_weight
# Store the finalized histogram data for future plotting
histograms[key] = hist It can be altered as needed. |
When using
weights
inhistplot
for probability plots it seems seaborn does not take into accounts the weights when usingcommon_norm
See this small example:
Which yields.
I would have assumed that weights are taken into account when calculating the probability. From the code:
seaborn/seaborn/distributions.py
Line 480 in 78e9c08
I
has a probability of 20%1/(1+2+2)
. While the documentation forweights
isit does not mention anything about other than
stat='count'
plots it would make sense to take this into account?In any case being able to weigh the probability with the weights using a common-norm would be useful if the histplot does as intended.
The text was updated successfully, but these errors were encountered: