diff --git a/week4/tutorial/tutorial-04.qmd b/week4/tutorial/tutorial-04.qmd new file mode 100644 index 0000000..a2c4d74 --- /dev/null +++ b/week4/tutorial/tutorial-04.qmd @@ -0,0 +1,235 @@ +--- +title: "Tutorial-04" +author: Kate Saunders +format: + html: + toc: true + css: tutorial.css + embed-resources: true + pdf: + toc: true +editor: visual +--- + +## Visualisation in R + +### Learning Objectives + +- Practice how to create plots in R. + +- This will involve understanding the grammar of graphics and what each of the different layers are. + +- You will learn to create a range of different plots. + +### Preparation + +- Ensure you installed R and RStudio. + +- Ensure you have installed the `ggplot2` package. You can install this package directly, but this package is also part of the `tidyverse` package. + +- Before you get started set yourself up an R project. This will help you to direct R to where your data is installed.\ + +- If you need any help doing the above, refer to the Lecture 1 and Tutorial 1 material. + +- Download the dataset from Moodle, `boston_celtics.csv,` and place it in a folder called `data` within your R Project. + +### Task + +Today you will be creating visualisations in `ggplot2` to analyse sporting statistics from the Boston Celtics NBA basketball team. + +```{r echo = FALSE} +show_solutions = FALSE +``` + +### Exercise 1 + +1. Read your data set into R and follow the Reading Data Checklist from the lecture.\ + \ + We recommend installing the package `here` to help keep things organised when referencing files. + +```{r echo = TRUE, warning = FALSE, message = FALSE} +if(!require(here)){ + install.packages("here") +} +library(here) +``` + +```{r echo = TRUE, warning = FALSE, message = FALSE} +library(tidyverse) +boston_celtics <- read_csv(here("data", "boston_celtics.csv")) +``` + +If you have an trouble run through the check list: +```{r echo = TRUE, eval = FALSE, warning = FALSE, message = FALSE} + +# Check your working directory +getwd() + +# Check your data file is where you think it is +file.exists(here("data","boston_celtics.csv")) + +# Read in your data +library(tidyverse) +boston_celtics <- read_csv(here("data", "boston_celtics.csv")) + +# Look at your data +View(boston_celtics) + +# Look at a summary +summary(boston_celtics) +``` + + +2. Create a scatter plot that shows the `game_date` and `team_scores`. Change the colour of the points using `team_winner` to show if the team won the game. Make the colour of the points `forestgreen`. Focus on the geometry and aesthetic layer for now. + +```{r eval = FALSE} + ggplot(data = ???) + + geom_???(aes(x = ???, y = ???), col = ???) +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score), col = "forestgreen") +``` + +3. Colour the points according to if the team won or lost. Use the hex colour codes from the [Boston Celtics team]("https://teamcolorcodes.com/boston-celtics-color-codes/") to set the colour scheme. + +```{r eval = FALSE} + ggplot(data = ???) + + geom_???(aes(x = ???, y = ???, col = ???)) +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score, col = team_winner)) + + scale_colour_manual(label = c("Loss", "Win"), values = c("TRUE" = "#007A33", "FALSE" = "#BA9653")) +``` + +4. As there are a lot of points, change the size and the transparency to reduce overplotting. + +```{r eval = FALSE} +ggplot(data = ???) + + geom_???( + aes(x = ???, y = ???, col = ???), + size = ???, alpha = ??? + ) +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score, col = team_winner), size = 0.75, alpha = 0.4) + + scale_colour_manual(label = c("Loss", "Win"), values = c("TRUE" = "#007A33", "FALSE" = "#BA9653")) +``` + +5.Look at the [theme options]("https://ggplot2.tidyverse.org/reference/ggtheme.html") and try using a few different ones, before deciding on the theme for your plot. + +```{r eval = FALSE} +ggplot(data = ???) + + geom_???( + aes(x = ???, y = ???, col = ???), + size = ???, alpha = ??? + ) + + theme_???() +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score, col = team_winner), size = 0.75, alpha = 0.4) + + scale_colour_manual(label = c("Loss", "Win"), values = c("TRUE" = "#007A33", "FALSE" = "#BA9653")) + + theme_bw() +``` + +5. Add appropriate labels to your plot and edit the legend position. + +```{r eval = FALSE} +ggplot(data = ???) + + geom_???( + aes(x = ???, y = ???, col = ???), + size = ???, alpha = ??? + ) + + theme_???() + + labs(title = ???, subtitle = ???, x = ???, y = ???) +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score, col = team_winner), size = 0.75, alpha = 0.4) + + scale_colour_manual(label = c("Loss", "Win"), values = c("TRUE" = "#007A33", "FALSE" = "#BA9653")) + + theme_bw() + + labs(title = "Boston Celtics Team Score", + subtitle = "2021 - Current Season", + x = "Date", + y = "Points") +``` +6. Move the legend on the plot to increase the data-density. *(Note the solutions show a more advanced way to move the legend that the lecture notes.)* + +```{r eval = FALSE} +ggplot(data = ???) + + geom_???( + aes(x = ???, y = ???, col = ???), + size = ???, alpha = ??? + ) + + theme_???() + + labs(title = ???, subtitle = ???, x = ???, y = ???) + theme(legend.position = ???) +``` + +```{r echo = show_solutions, eval = show_solutions} +ggplot(data = boston_celtics) + + geom_point(aes(x = game_date, y = team_score, col = team_winner), size = 0.75, alpha = 0.4) + + scale_colour_manual(label = c("Loss", "Win"), values = c("TRUE" = "#007A33", "FALSE" = "#BA9653")) + + theme_bw() + + labs(title = "Boston Celtics Team Score", + subtitle = "2021 - Current Season", + x = "Date", + y = "Points") + + theme(legend.position = c(0.8, 1.1), legend.title = element_blank(), legend.direction = "horizontal") +``` +### Exercise 2 + +Now we've mastered the basic layers, take some time to creating some other visualisations that are interesting to you! + +* You may like to produce plots looking at the distribution of some of the key variables. For example, `assists`, `rebounds`, `steals` etc. + +```{r echo = show_solutions, eval = show_solutions} +boston_celtics |> + ggplot(aes(x = assists)) + + geom_histogram(aes(y = ..density..)) + + geom_density() + +boston_celtics |> + ggplot(aes(x = steals)) + + geom_histogram(aes(y = ..density..)) + + geom_density() + +boston_celtics |> + ggplot() + + geom_histogram(aes(x = total_rebounds), fill = "blue", alpha = 0.3) + + geom_histogram(aes(x = offensive_rebounds), fill = "green", alpha = 0.3) + + geom_histogram(aes(x = defensive_rebounds), fill = "orange", alpha = 0.3) +``` + +* Maybe you are interested in how the shot percentage varies between `field_goal_pct` (2 pts) and `three_point_field_goal_pct` (3 pts). + +```{r echo = show_solutions, eval = show_solutions} +boston_celtics |> + filter(season == 2025) |> + ggplot() + + geom_point(aes(x = field_goal_pct, y = three_point_field_goal_pct)) +``` + +* Perhaps you want to create a bar chart (`geom_bar`) to see how many times they've played opposing teams this season. + +```{r echo = show_solutions, eval = show_solutions} +boston_celtics |> + filter(season == 2025) |> + ggplot() + + geom_bar(aes(y = opponent_team_name)) +``` +### Finishing Up + +At the end of the tutorial, share your favourite plot to the discussion forum. + +##### Material developed by Dr. Kate Saunders. + +##### © Copyright 2024 Monash University