Skip to content

Commit

Permalink
Refactor swarmplot (#2447)
Browse files Browse the repository at this point in the history
* Refactor swarmplot to match new stripplot (copying too much code)

* Refactor some shared components of swarm and strip plot

* Add dropna logic to iter_data

* Don't try to swarm empty collection

* Fix rst syntax

* Improve droopna logic and copy dataframe to avoid warnings

* Fix bug with jitter on empty category

* Remove original swarmplot code

* Transition catplot with kind='swarm' to new code

* Add more swarmplot tests

* Fix small test issues

* Mark a puzzling pinned test failure as xfail

* Delay log scale query in beeswarm until draw time

* Fix single point jitter

* Added control over the swarmplot warning with warn_thresh

* Update datalim and force autoscale at draw-time

* Update swarmplot API examples

* Refactor common parts of strip and swarm plot tests

* Always update datalim while swarming
  • Loading branch information
mwaskom authored Jan 24, 2021
1 parent 4190c24 commit 80fc0a8
Show file tree
Hide file tree
Showing 9 changed files with 717 additions and 610 deletions.
10 changes: 5 additions & 5 deletions doc/docstrings/stripplot.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"Assigning a second variable splits the strips of poins to compare categorical levels of that variable:"
"Assigning a second variable splits the strips of points to compare categorical levels of that variable:"
]
},
{
Expand Down Expand Up @@ -67,7 +67,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"Prior to version 0.12, the levels of the categorical variable had different colors. To get the same effect, assign the `hue` variable explicitly:"
"Prior to version 0.12, the levels of the categorical variable had different colors by default. To get the same effect, assign the `hue` variable explicitly:"
]
},
{
Expand Down Expand Up @@ -99,7 +99,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"If the `hue` variable is numeric, it will be mapped with a quantitative palette by default (this was not the case prior to version 0.12):"
"If the `hue` variable is numeric, it will be mapped with a quantitative palette by default (note that this was not the case prior to version 0.12):"
]
},
{
Expand Down Expand Up @@ -163,7 +163,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If plotting in wide-form mode, each column of the dataframe will be mapped to both `x` and `hue`:"
"If plotting in wide-form mode, each numeric column of the dataframe will be mapped to both `x` and `hue`:"
]
},
{
Expand Down Expand Up @@ -250,7 +250,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"Further visual customization can be achieved by passing matplotlib keyword arguments:"
"Further visual customization can be achieved by passing keyword arguments for :func:`matplotlib.axes.Axes.scatter`:"
]
},
{
Expand Down
285 changes: 285 additions & 0 deletions doc/docstrings/swarmplot.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide"
]
},
"outputs": [],
"source": [
"import seaborn as sns\n",
"sns.set_theme(style=\"whitegrid\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Assigning a single numeric variable shows its univariate distribution with points adjusted along on the other axis such that they don't overlap:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips = sns.load_dataset(\"tips\")\n",
"sns.swarmplot(data=tips, x=\"total_bill\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Assigning a second variable splits the groups of points to compare categorical levels of that variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Show vertically-oriented swarms by swapping the assignment of the categorical and numerical variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"day\", y=\"total_bill\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Prior to version 0.12, the levels of the categorical variable had different colors by default. To get the same effect, assign the `hue` variable explicitly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"day\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Or you can assign a distinct variable to `hue` to show a multidimensional relationship:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"If the `hue` variable is numeric, it will be mapped with a quantitative palette by default (note that this was not the case prior to version 0.12):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"size\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Use `palette` to control the color mapping, including forcing a categorical mapping by passing the name of a qualitative palette:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"size\", palette=\"deep\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"By default, the different levels of the `hue` variable are intermingled in each swarm, but setting `dodge=True` will split them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"day\", hue=\"sex\", dodge=True)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"The \"orientation\" of the plot (defined as the direction along which quantitative relationships are preserved) is usualy inferred automatically. But in ambiguous cases, such as when both axis variables are numeric, it can be specified:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"size\", orient=\"h\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"When the local density of points is too high, they will be forced to overlap in the \"gutters\" of each swarm and a warning will be issued. Decreasing the size of the points can help to avoid this problem:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(data=tips, x=\"total_bill\", y=\"size\", orient=\"h\", size=3)"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"By default, the categorical variable will be mapped to discrete indices with a fixed scale (0, 1, ...), even when it is numeric:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(\n",
" data=tips.query(\"size in [2, 3, 5]\"),\n",
" x=\"total_bill\", y=\"size\", orient=\"h\",\n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"To disable this behavior and use the original scale of the variable, set `fixed_scale=False` (notice how this also changes the order of the variables on the y axis):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(\n",
" data=tips.query(\"size in [2, 3, 5]\"),\n",
" x=\"total_bill\", y=\"size\", orient=\"h\",\n",
" fixed_scale=False,\n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"Further visual customization can be achieved by passing keyword arguments for :func:`matplotlib.axes.Axes.scatter`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.swarmplot(\n",
" data=tips, x=\"total_bill\", y=\"day\", hue=\"time\",\n",
" marker=\"x\", linewidth=1, \n",
")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"To make a plot with multiple facets, it is safer to use :func:`catplot` with `kind=\"swarm\"` than to work with :class:`FacetGrid` directly, because :func:`catplot` will ensure that the categorical and hue variables are properly synchronized in each facet:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.catplot(\n",
" data=tips, kind=\"swarm\",\n",
" x=\"time\", y=\"total_bill\", hue=\"sex\", col=\"day\",\n",
" aspect=.5\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "seaborn-py38-latest",
"language": "python",
"name": "seaborn-py38-latest"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
4 changes: 3 additions & 1 deletion doc/releases/v0.12.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ v0.12.0 (Unreleased)

- |API| |Feature| |Enhancement| TODO (Flesh this out further). Increased flexibility of what can be shown by the internally-calculated errorbars (:pr:2407).

- |Fix| |Enhancement| Improved robustness to missing data, including additional support for the `pd.NA` type (:pr:`2417).
- |Fix| |Enhancement| Improved robustness to missing data, including additional support for the `pd.NA` type (:pr:`2417`).

- TODO function specific categorical enhancements, including:

Expand All @@ -14,6 +14,8 @@ v0.12.0 (Unreleased)

- In :func:`swarmplot`, the order of the points in each swarm now matches the order in the original dataset; previously they were sorted. This affects only the underlying data stored in the matplotlib artist, not the visual representation (:pr:`2443`).

- In :func:`swarmplot`, the proportion of points that must overlap before issuing a warning can now be controlled with the `warn_thresh` parameter (:pr:`2447`).


- Made `scipy` an optional dependency and added `pip install seaborn[all]` as a method for ensuring the availability of compatible `scipy` and `statsmodels` libraries at install time. This has a few minor implications for existing code, which are explained in the Github pull request (:pr:`2398`).

Expand Down
Loading

0 comments on commit 80fc0a8

Please sign in to comment.