Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit to categorical data section and the addition of box and whisker plots #32

Merged
merged 2 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ _site/
!/preview/html

.DS_Store
.Rhistory
16 changes: 10 additions & 6 deletions textbook/09/1/Libraries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@
"id": "a7571e9f",
"metadata": {},
"source": [
"To create data visualizations, we will need some additional libraries. In Python, libraries are distributed as *packages*. We will be using the packages `matplotlib` and `seaborn`, which are popular libraries that can be used to visualize data from pandas DataFrames.\n",
"To create data visualizations, we will need to import the necessary libraries and packages. We will be using `seaborn` and `matplotlib` libraries, as well as the `pyplot` package, which are popular tools for visualizing data from `pandas` dataframes. \n",
"\n",
"For our visualizations, we will be using a `seaborn` style with a white grid background. A list of other styles, as well as documentation for `matplotlib` can be found at the end of this section."
"The `seaborn` library is multifaceted because it offers many visualization style options. While it can be used to create many types of visualizations, data scientists sometimes use both `seaborn` and `matplotlib` to create graphs and charts that are suited for their visualization needs. Here, we will be using a combination of `matplotlib` and `seaborn` to create our visualizations.\n",
"\n",
"A link to documentation for the visualization libraries used can be found at the end of this section.\n",
"\n",
"Let's start by importing our libraries:"
]
},
{
Expand Down Expand Up @@ -330,8 +334,7 @@
"id": "729aaca8",
"metadata": {},
"source": [
"The data consist of seven columns or features. (We've set one of these, *Year*, as our row index, leaving six remaining features.)\n",
" \n",
"The data consist of an index and six columns. When importing the data, we used the `index_col` argument to set the *Year* column to our index. This will make things easier down the line when we want to extract data for a particular year of interest, rather than thinking of which index corresponds to our year of interest. Information about our data is listed below:\n",
"\n",
"**Year**\n",
": The year of the collected data\n",
Expand All @@ -354,7 +357,7 @@
"**USA-USD**\n",
": Amount of money (in billions, USD) spent on the military in the United States\n",
"\n",
"In the upcoming exercises, we will explore these data via various visualizations. With these visualizations, we can construct a narrative of what the data show and mean."
"In the upcoming exercises, we will explore these data using various visualizations. With these visualizations, we can construct a narrative of what the data show and mean."
]
},
{
Expand All @@ -368,6 +371,7 @@
"\n",
"- <a target=\"_blank\" href=\"https://matplotlib.org/stable/api/matplotlib_configuration_api.html\">Matplotlib documentation</a>\n",
"- <a target=\"_blank\" href=\"https://matplotlib.org/stable/api/pyplot_summary.html\">Pyplot documentation</a>\n",
"- <a target=\"_blank\" href=\"https://seaborn.pydata.org\">Seaborn documentation</a>\n",
"- <a target=\"_blank\" href=\"https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html\">List of styles for plots in matplotlib</a>\n",
"- <a target=\"_blank\" href=\"https://data.worldbank.org/indicator/MS.MIL.XPND.CD\">World Bank Data on Military Expenditure (in USD - MS.MIL.XPND.CD)</a>\n",
"- <a target=\"_blank\" href=\"https://data.worldbank.org/indicator/MS.MIL.XPND.GD.ZS\">World Bank Data on Military Expenditure (% of GDP - MS.MIL.XPND.GD.ZS)</a>\n",
Expand All @@ -393,7 +397,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.12"
}
},
"nbformat": 4,
Expand Down
530 changes: 366 additions & 164 deletions textbook/09/2/Categorical_Data.ipynb

Large diffs are not rendered by default.

Binary file added textbook/09/2/img/boxandwhisker.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
179 changes: 97 additions & 82 deletions textbook/09/3/Numerical_Data.ipynb

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions textbook/09/data-visualization.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Data Visualization

The distillation of large and complex data sets into more easily digestible forms is an important component of data science. Through effective data visualization, we can concisely communicate the story that our data is telling. We do this by choosing appropriate visual depictions of our data to accurately represent what the data means.
<i>Evelyn Campbell, Ph.D.</i>

Distilling large and complex sets of data into more easily digestible forms is an important component of data science. Through effective data visualization, we can concisely communicate the story that our data is telling. We do this by choosing appropriate visual depictions of our data to accurately represent what the data means.

The visual that we choose is dependent on the type of data. Two major data types that can be visualized graphically are **numerical data** and **categorical data**.

Numerical data is commonly visualized using *histograms*, *scatter plots* and *line graphs*. Categorical data can be depicted using *bar graphs* and *pie charts*. There are a vast number of other methods to visualize these data types, (*e.g.* box plots, cartograms, heatmaps, *etc.*); but, the aforementioned graphs are the most commonly used among data scientists.
Numerical data is commonly visualized using <i>histograms</i>, <i>scatter plots</i> and <i>line graphs</i>. Categorical data can be depicted using <i>bar graphs</i>, <i>box and whisker plots</i>, and <i>pie charts</i>. There are a vast number of other methods to visualize these data types, (<i>e.g.</i> cartograms, heatmaps, etc.), but the aforementioned graphs are the most commonly used among data scientists.