Week_5/cm010_tibble_joins.Rmd

---
title: "cm010 Exercises: Tibble Joins"
output: 
  html_document:
    keep_md: true
    theme: paper
---

## Requirements

You will need Joey's `singer` R package for this exercise. And to install that, you'll need to install `devtools`. Running this code in your console should do the trick:

```
install.packages("devtools")
devtools::install_github("JoeyBernhardt/singer")
```

Load required packages:

```{r, echo = FALSE, warning = FALSE, message = FALSE}
library(tidyverse)
library(singer)
knitr::opts_chunk$set(fig.width=4, fig.height=3, warning = FALSE, fig.align = "center")
```

<!---The following chunk allows errors when knitting--->

```{r allow errors, echo = FALSE}
knitr::opts_chunk$set(error = TRUE)
```

## Exercise 1: `singer`

The package `singer` comes with two smallish data frames about songs. Let's take a look at them (after minor modifications by renaming and shuffling):

```{r}
(time <- as_tibble(songs) %>% 
   rename(song = title))
```

```{r}
(album <- as_tibble(locations) %>% 
   select(title, everything()) %>% 
   rename(album = release,
          song  = title))
```


1. We really care about the songs in `time`. But, which of those songs do we know its corresponding album?

```{r}
time %>% 
  semi_join(album, by=c("song", "artist_name")) # these are unique identifiers
```

2. Go ahead and add the corresponding albums to the `time` tibble, being sure to preserve rows even if album info is not readily available.

```{r}
time %>% 
 inner_join(album, by=c("song", "artist_name"))
```

3. Which songs do we have "year", but not album info?

```{r}
time %>% 
  inner_join(album, by = "song")
```

4. Which artists are in `time`, but not in `album`?

```{r}
time %>% 
  anti_join(album, by = "artist_name")
```


5. You've come across these two tibbles, and just wish all the info was available in one tibble. What would you do?

```{r}
time %>% 
  full_join(album, by = c("song", "artist_name")) 
```


## Exercise 2: LOTR

Load in the three Lord of the Rings tibbles that we saw last time:

```{r}
fell <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv")
ttow <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv")
retk <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv")
```

1. Combine these into a single tibble.

```{r}
bind_rows(fell, ttow,retk)
```

2. Which races are present in "The Fellowship of the Ring" (`fell`), but not in any of the other ones?

```{r}
fell %>% 
  anti_join(ttow, by = "Race") %>% 
  anti_join(retk, by = "Race")
```


## Exercise 3: Set Operations

Let's use three set functions: `intersect`, `union` and `setdiff`. We'll work with two toy tibbles named `y` and `z`, similar to Data Wrangling Cheatsheet

```{r}
(y <-  tibble(x1 = LETTERS[1:3], x2 = 1:3))
```

```{r}
(z <- tibble(x1 = c("B", "C", "D"), x2 = 2:4))
```

1. Rows that appear in both `y` and `z`

```{r}
intersect(y, z)
inner_join(y,z)
```

2. You collected the data in `y` on Day 1, and `z` in Day 2. Make a data set to reflect that.

```{r}
bind_rows(
  mutate(y, day = "Day 1"),
  mutate(z, day = "Day 2")
)
```

3. The rows contained in `z` are bad! Remove those rows from `y`.

```{r}
setdiff(y, z)
anti_join(y, z)
```