diff --git a/_quarto.yml b/_quarto.yml index 44a4177..8c380c5 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -35,6 +35,8 @@ website: text: Software - href: teaching/index.qmd text: Teaching + - href: teaching/courses/2017_lsa/index.qmd + text: "LSA 2017 Course" right: - href: https://jofrhwld.github.io/blog/ text: Blog @@ -88,7 +90,11 @@ website: href: "research/#2008" - text: "2007" href: "research/#2007" - + - title: "LSA 2017 Course" + contents: + - teaching/courses/2017_lsa/index.qmd + - auto: teaching/courses/2017_lsa/lectures/ + format: html: theme: @@ -101,6 +107,7 @@ format: - styles/dark.scss - styles/styles.scss toc: true - + smooth-scroll: true + editor: visual diff --git a/teaching/courses/2017_lsa/index.qmd b/teaching/courses/2017_lsa/index.qmd index 6d26a3a..d506454 100644 --- a/teaching/courses/2017_lsa/index.qmd +++ b/teaching/courses/2017_lsa/index.qmd @@ -1,5 +1,9 @@ --- title: "LSA 2017 Statistical Modelling with R" +listing: + contents: lectures + type: table + fields: [image, order, title, reading-time] --- - [Meeting 1: Introduction to R](lectures/Session_1.nb.html) diff --git a/teaching/courses/2017_lsa/lectures/Session_1.nb.html b/teaching/courses/2017_lsa/lectures/Session_1.nb.html deleted file mode 100644 index b3e1610..0000000 --- a/teaching/courses/2017_lsa/lectures/Session_1.nb.html +++ /dev/null @@ -1,1202 +0,0 @@ - - - - - - - - - - - - - -Introduction to R - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - -
-
-
-
-
- -
- - - - - - - - -
-

Hellos

-

Welcome to Statistical Modelling with R. If there is one thing to remeber from this course, it is that your analysis workflow should look something like this:

-
-
- - -
-
-
-
-
-

The process of learning R and Modelling

-

These are some of the core areas I figure are necessary to getting good at statistical modelling in R:

-
    -
  1. Using R (and RStudio) well
  2. -
  3. Feeling comfortable and fluid reorganizing and summarizing data
  4. -
  5. Visualizing Data
  6. -
  7. Deciding before you model what you want to compare to what
  8. -
  9. How to translate your analysis goals into R code
  10. -
  11. Understanding a little bit about statistics
  12. -
  13. When something goes wrong, being able to accurately attribute your difficulty to one of the above topics
  14. -
-

These are all skills you can achieve through practice, experience, and occasional guidance from someone more skilled than you. It is exactly like acquiring any other skill or craft. At first it will be confusing, you’ll make some mistakes, and it won’t look so good. I think

-
-
-

The first hat I ever knit:

-
- - -
-
-
-

The most recent hat I knit:

-
- - -
-
-
-

The way I improved my knitting is exactly the same as how you can improve your R programming ability:

-
    -
  • I knit a lot (almost every day).
  • -
  • I memorized a bunch of stuff.
  • -
  • Remembered where to look up the stuff I don’t have memorized.
  • -
  • My knitting became more “idiomatic” (i.e. I started knitting like how other knitters knit).
  • -
  • I learned how to identify and fix mistakes without undoing my entire project.
  • -
  • I developed good workspace hygene & organization.
  • -
  • As I got the basics down, I started researching and incorporating fussy little details into my work.
  • -
-

Most of the content of the course is devoted to core R programming (things you should be memorizing or remembering where to find help), but I’ll try my best to annotate portions of the notes which correspond to workspace hygiene, being idiomatic, etc, so that you can distinguish between them.

-
-
-
-

Course Outline

-

The course will follow the workflow outlined at the beginning: begin → summarize → visualize → analyze.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
WeekMondayThursday
1Intro - Basics & R Notebooks
2Data Frames & FactorsSplit-Apply-Combine, Reshaping
3ggplot2Fitting Linear Models
4map functions & fitting many modelsMixed Effects Linear models
5Bootstraps & Visualization
-
-

Workspace Hygiene

- -

If you have a directory planning structure that you’re happy with, go ahead and do that. But if how to organize your R analysis life is something you’d like to get out of this course, I’d recommend the following directory structure & naming conventions.

-
├── lsa_2017
-│   └── r_modelling*
-│       ├── assignments
-│       ├── data
-│       └── lectures
-        
-

The r_modelling directory will be the home directory for the course. I would recommend creating a new R Notebook for each lecture (more on that in a moment) and giving them a naming convention like:

-
01_lecture.Rmd
-02_lecture.Rmd
-

Right now eliminate the impulse to create any folders or file names with spaces in them.

-
-
-
-
-

R, RStudio and R Notebooks

-

We’re going to be using R, RStudio, and R Notebooks in this course, and it’s a little important to keep straight what these three things are:

-
-

R

-

R is a programming language that runs on your computer. At its barest bones, it looks like this:

-
-
- - -
-
-

You can type text into the prompt there, and if you’ve successfully memorized the right R commands, it’ll do some things.

-
-
-

RStudio

-

RStudio is like an Instagram filter over to of R, to make your R use experience better. It visually organizes some important components of using R into panes, and offers code completion suggestions. For example, if you rember there’s something called a “Wilcoxon test”, but you don’t remember what the function in R is, you can start typing in Wilc, and this will happen:

-
-
- - -
-
-

RStudio’s autocompletion is really useful for a lot of other things, like reminding you what the column names are in your data frame, what the names of all the arguments to a function are, etc.

-

But perhaps the most valuable component in R Studio these days is its authoring tools, like R Notebooks

-
-
-

R Notebooks

-

R Notebooks allow you to document your code in plain text, insert R Code chunks, and view the results of the R code all in one place, then compile it into a nice looking notebook.

-
-

~5 Minute Activity

-

Goals

-
    -
  1. Start a new RStudio Project.
  2. -
  3. Create a new R Notebook.
  4. -
  5. Run some code in the R Notebook.
  6. -
  7. Preview the R Notebook in HTML
  8. -
-

Start a new RStudio Project

-

Create a new RStudio Project, either by using the menu options File > New Project or by clicking on the icon in the top right corner of the RStudio window. If you have created directory structure above choose Existing Directory and choose r_modelling. Otherwise, select the options New Directory then Empty Project and tell it the projec name is r_modelling

-

Create a new R Notebook

-

Open a new R Notebook using the menu command File > New File > R Notebook. If this is the first time you’ve opened an R Notebook on your computer, you’ll probably be faced with the following prompt:

-
-
- - -
-
-

Click “Yes”, and wait for the installation to finish. A window with a bunch of gobbledygook will pop up, and that’s fine. Once it’s all finished, the new file should open.

-

Run some code in the R Notebook

-

First, run the R code chunk that comes automatically in a new R Notebook by clicking on the green “play” button in the top right corner of the code chunk.

-

Next, insert a new R code chunk at the bottom of the notebook (directions for how to do so are already included in the new R Notebook), and inside, enter:

-
"Hello World"
-

Then run this code chunk by clicking the play button.

-

Preview the R Notebook in HTML

-

Click the “Preview” button at the top of the R Notebook panel to compile it into an HTML document. You will need to save the notebook first. In the lectures folder, save it as 00_practice.Rmd

-
-
-
-

Discussion

-

I’m going to recommend (for now at least) that you run all of your code though an R Notebook. It is possible to just type things into the R console, but that’s kind of like dictating a paper into thin air. Once you’ve spoken the words, they dissapear and can be hard to recover.

-

My earlier advice would have been to write all of your code in an R script file, but that also separates the code from its results, which can be hard for beginners to keep track of.

-
- -
-
-
-

Installing R Packages

-

R comes with a lot of functionality installed, but one way that R is extentible is through users’ ability to contribute new code & data through it’s package management system. We’re going to using a number of these packages in the course, especially since a few of them have fundamentally changed the way R programming works in the past 3 years. There’s also a course R package I’ve created to easilly distribute sample datasets.

-

Here’s a basic diagram of how R packages work:

-
- - -
-
-

Installing Packages

-
-

install.packages()

-

Most R packages are distributed through CRAN (Comprehensive R Archive Network). When you run function install.packages("x"), R checks whether the package "x" exists on CRAN, and installs it on your computer if it does. You maybe asked to choose a “CRAN mirror” the first time you run install.packages(). This is because there are many copies of CRAN distributed aross the internet. I’d recommend choosing the first option called 0-Cloud.

-
-
-

install_github()

-

As a package developer, getting a package onto CRAN can be a bit of a pain, so some packages (and development versions of many) are also available on GitHub, which can be easilly installed with devtools::install_github("username/package").

-
-
-
-

Installing packages is different from loading packages

-

Installing a package is different from loading packages. Installing a package only downloads and configures the code on your computer. In order to use the contents of a package, you need to load it into your R session with library().

-
    -
  • You only need to run install.packages() once to install a package, or to update a package.
  • -
  • You need to run library() at the start of every new R session in order to use the functionality from that package.
  • -
-

For example, ggplot() is a function from the package ggplot2. I have already installed ggplot2 on my computer, but if I try to use ggplot() before loading the package with library(), I’ll get the error that the function was not found.

- - - -
foo <- ggplot()
- - -
Error in ggplot() : could not find function "ggplot"
- - - - - - -
library("ggplot2")
-foo <- ggplot()
- - - -
-

~2 Minute Activity

-

Let’s install all of the packages we’re going to use in the course. Double check that you’re connected to the internet.

-

Create a notebook for this lecture called 01_lecture.Rmd. Copy-paste the following into an R code chunk and run it:

- - - -
install.packages(
-  c("tidyverse",
-    "devtools")
-)
-
-library("devtools")
-
-install_github("jofrhwld/lsa2017")
- - - -
-
-
-
-
-

R Basics

-

We’re now going to run through some very basics of R, specifically:

-
    -
  • Basic Data Types
  • -
  • Basic Calculations
  • -
  • Assignment
  • -
  • Vectors
  • -
  • Indexing
  • -
-

Create a new R Notebook. Change the Title field to Intro to R, and save it as 0_lecture.Rmd in the folder lectures.

-

As we come to a code chunk in the lecture, either copy-paste or re-type it into a new code chunk in your lecture R notebook, and run it.

-
-

Basic Calculations

-

One way to think of R is as an overblown calculator.

- - - -
3+3 
- - -
[1] 6
- - -
2*4
- - -
[1] 8
- - -
(369-1)/6
- - -
[1] 61.33333
- - - -

But it’s not all that useful to do a bunch of calculations without saving the results for later, which is where assignment comes in.

-
-
-

Assignment

-

You can assign values to variables using the assignment operator: <- or -> (but in practice, only use <-).

-
-

variable <- value

-
- - - -
x <- 10
-y <- 2*3
- - - -

Once you’ve assigned a value to a variable, you can reuse the value stored in that variable for other purposes, like just printing it out again

- - - -
x
- - -
[1] 10
- - -
y
- - -
[1] 6
- - - -

Or adding the two values together

- - - -
x + y 
- - -
[1] 16
- - - -

In short, you can use these variables x and y like they are the values you assigned to them. If this is your first time programming, here are a few things to clarify:

-

Note

-
    -
  • x and y didn’t exist before you created them by assigning values to them.
  • -
  • You could have chosen almost any name for these variables.
  • -
  • You can just as easilly assign new values to these variables.
  • -
-
-

Idiom

-

Naming Things

-

x and y are lousy names for variables. When it comes to naming variables, there’s a famous saying:

-
-

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

-
-

For best practices on naming variables, I’ll refer you to the tidyverse style guide by Hadley Wickham. To briefly summarize it:

-
    -
  • Use only lowercase letters and numbers.
  • -
  • Use _ to separate words in a a variable name.
  • -
  • You’re actually not able to start a variable name with a number.
  • -
-

Also, be guided by The Principle of Least Effort. Use the minimal ammount of characters that are still clearly interpretable.

- - - -
# Good Names
-model_1
-model_full
-
-
-# Bad Names
-the_first_model_I_ever_fit
-just_trying_out_a_model_with_all_predictors
-m_01
-m_agdf
- - - -

Also, just use good judgment. There is nothing in R preventing you from doing stuff like this to yourself.

- - - -
five <- 10
-ten <- 5
-
-yellow <- "green"
- - - -
-

Another thing to keep in mind is that R can’t handle any other characters in numeric values other than 0 through 9 and decimal places. All of these will fail:

- - - -
# no commas
-thousand <- 1,000
- - -
Error: unexpected ',' in "thousand <- 1,"
- - - - - - -
# no spaces
-thousand <- 1 000
- - -
Error: unexpected numeric constant in "thousand <- 1 000"
- - - - - - -
# like this
-thousand <- 1000
- - - - - - -
# no currencies
-dollars <- $1000
- - -
Error: unexpected '$' in "dollars <- $"
- - - - - - -
# no percentages
-percent <- 51%
- - -
Error: unexpected input in "percent <- 51%"
- - - -
-

Additional data types

-

In addition to numbers, other basic data types in R are character and logical.

- - - -
# character data
-digital_words <- c("fam",
-                   "Harambe",
-                   "tweetstorm",
-                   "@")
- - - - - - -
# logical values
-TRUE
- - -
[1] TRUE
- - -
# a logical test
-(10/2) < 3
- - -
[1] FALSE
- - - -
-

On using quotes

-

When you enter characters without quotes around them, R assumes you’re referring to a variable. If you tried to do the assignment above without the quotes, you’ll get an error.

- - - -
digital_words_fail <- c(fam,
-                        Harambe,
-                        tweetstorm)
- - -
Error: object 'fam' not found
- - - -

Here, R saw fam, which isn’t in quotes, searched the environment for any variables named fam and couldn’t find any.

-

When you put characters in quotes, R assumes it’s a character value, even if there’s a variable by the same name.

- - - -
digital_words
- - -
[1] "fam"        "Harambe"    "tweetstorm" "@"         
- - -
"digital_words"
- - -
[1] "digital_words"
- - - -
-
-
-
-

Vectors

-

Vectors are essentially lists of data, and can contain characters, numbers, or TRUE FALSE values. There are a number of ways to create vectors in R, and frequently doing data manipulation will produce subvectors of data.

-
    -
  • 1:10 -
      -
    • This produces a vector of integers from 1 to 10. Reversing the order of the numbers will produce a vector of decreasing values.
    • -
  • -
  • c(...) -
      -
    • This produces a vector of whatever is passed as an argument to c(). -
        -
      • c(1,2,3,4)
      • -
    • -
  • -
  • seq(from,to,...) -
      -
    • This produces a sequence of numbers either by a given increment or evenly spaced to a given length. -
        -
      • seq(1,10,by=0.5)
      • -
      • seq(1,10,length=11)
      • -
    • -
  • -
  • rep(x,...) -
      -
    • This produces a vector of repetitions of x by a given number of times. -
        -
      • rep(1,6)
      • -
      • rep(1:3,2)
      • -
      • rep("hello world",4)
      • -
    • -
  • -
-
-

Vector Arithmetic

-
-

Vector and A Number

-

A pretty cool and unique feature of R is how you can do arithmetic with vectors. For example, let’s say you’ve interviewed a bunch of speakers of the following ages

- - - -
ages <- c(18, 35, 41, 62)
- - - -

If you wanted to know the year of birth of these speakers, it’s as easy as:

- - - -
2017 - ages
- - -
[1] 1999 1982 1976 1955
- - - -

R has taken each value in ages, and subtracted it from 2017, and created a new vector with the results.

-

Or, if you wanted to know in which year these speakers turned 17, it’s as easy as:

- - - -
(2017 - ages) + 17
- - -
[1] 2016 1999 1993 1972
- - - -
-
-

Vector and a Vector

-

Or, let’s say these speakers weren’t all interviewed the same year. Half were interviewed in the 90s, and half in the 2000s.

- - - -
interview_year <- c(1995, 1996, 2003, 2004)
- - - -

Getting each speaker’s date of birth is as simple as:

- - - -
interview_year - ages
- - -
[1] 1977 1961 1962 1942
- - - -

This worked because the two vectors, interview_year and ages were the same length. R took the first values of age and subtracted it from the first value of interview_year, the second value of age and subtracted it from the second value of interview_year, etc, creating new vector of the result. You could easilly assign this output to a new variable.

- - - -
dob <- interview_year - ages
- - - -

Of course, if you now wanted to know what year these speakers turned 17, you could do it like so:

- - - -
(interview_year - ages) + 17
- - -
[1] 1994 1978 1979 1959
- - - -
-

~5 Minute Activity

-

A Starbucks Grande filter coffee in the UK currently costs £1.85. The value of £1 before the Brexit vote was about $1.49. After the vote, it dropped down to about $1.31, and lately it’s been closer to $1.27.

-

Using vector arithmetic as much as possible, find out how the value in dollars of my coffee has changed.

-
-
-
-
-
-

Indexing

-

If you have a bunch of values stored in a vecor, and you want to pull out specific ones, you can do so by indexing it with square brackets [].

-
-

Indexing by Position

-

Let’s start by indexing by position.

-
-

vector[position]

-
-

R has some built in vectors for you to use, like one called letters. We haven’t defined letters, and it’s not listed as being in your R environment, but it’s there.

- - - -
letters
- - -
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
-[19] "s" "t" "u" "v" "w" "x" "y" "z"
- - - -

The first value in a vector has index 1, the second index 2, and so on. If you’ve forgotten what the 19th letter of the alphabet is, you can find it out like so:

- - - -
letters[19]
- - -
[1] "s"
- - - -

If instead of just one number, you use another vector to index letters, you’ll get back out another vector.

- - - -
yes <- c(25, 5, 19)
-letters[yes]
- - -
[1] "y" "e" "s"
- - -
abba <- c(1, 2, 2, 1)
-letters[abba]
- - -
[1] "a" "b" "b" "a"
- - - -
-
-

Logical Indexing

-

You can also index by logical values.

-
-

vector[true false vector]

-
-

Let’s come back to our vector of speaker’s ages

- - - -
ages
- - -
[1] 18 35 41 62
- - - -

If we make another vector of TRUE and FALSE values of the same length, we can use it to index test_vec.

- - - -
logical_vec <- c(T, F, T, F)
-ages[logical_vec]
- - -
[1] 18 41
- - - -

You only get back values where the index vector was TRUE.

-

Of course, what you’ll usually do is generate a vectore of TRUE and FALSE values by using a logical operator.

- - - -
ages > 40
-ages[ages > 40]
- - - -
-

~2 Minute Activity

-

Let’s assume our speakers had the following names:

- - - -
speaker_names <- c("Charlie", "Skyler", "Sawyer", "Jamie")
- - - -

Using logical indexing and the ages in ages and year of interview in interview_year (or just dob, if you assigned anything to that variable), find out which speakers were born after 1960.

-
-
-
-
-

Logical Operators

-

The following operators will return a vector of TRUE and FALSE values.

-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OperatorMeaning
==exactly equal to
!=not equal to
>greater than
<less than
>=greater than or equal to
<less than
<=less than or equal to
-
-

You can use these to compare vectors to single values, as we’ve seen, but you can also compare vectors to vectors if they are the same length. Comparison is done elementwise.

- - - -
group_a <- c(20, 10, 13, 60)
-group_b <- c(11, 31,  2,  9)
-group_a < group_b
- - -
[1] FALSE  TRUE FALSE FALSE
- - - -

There are three more operators that have an effect on TRUE and FALSE vectors.

-
- - - - - - - - - - - - - - - - - - - - - -
OperatorMeaning
!not x
changes all T to F and F to T
|x or y
&x and y
-
- - - -
x <- c(T, T, F, F)
-y <- c(T, F, T, F)
- - - - - - -
cbind(
-  x = x,
-  y = y,
-  and = x&y, 
-  or = x|y
-)
- - -
         x     y   and    or
-[1,]  TRUE  TRUE  TRUE  TRUE
-[2,]  TRUE FALSE FALSE  TRUE
-[3,] FALSE  TRUE FALSE  TRUE
-[4,] FALSE FALSE FALSE FALSE
- - - -
-
-

%in%

-

This gets its own heading because it’s so useful, and you’ll use it a lot. If you say a %in% b, R checks every value in a to see if it’s in b.

-
-

value %in% vector

-
- - - -
# Was Sage in our study?
-"Sage" %in% speaker_names
- - -
[1] FALSE
- - - - - - -
# Was Schuyler in our study?
-"Schuyler" %in% speaker_names
- - -
[1] FALSE
- - -
# Yes, but not spelled that way.
-"Skyler" %in% speaker_names
- - -
[1] TRUE
- - - -

The first item can also be a vector.

- - - -
# How about all of these people?
-check_names <- c("Oakley", "Charlie", "Azaria", "Landry", "Skyler", "Justice")
-check_names %in% speaker_names
- - -
[1] FALSE  TRUE FALSE FALSE  TRUE FALSE
- - -
check_names[check_names %in% speaker_names]
- - -
[1] "Charlie" "Skyler" 
- - -
check_names[!(check_names %in% speaker_names)]
- - -
[1] "Oakley"  "Azaria"  "Landry"  "Justice"
- - - -
- - -
-
- -
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFIiCm91dHB1dDogCiAgaHRtbF9ub3RlYm9vazogCiAgICBjb2RlX2ZvbGRpbmc6IG5vbmUKICAgIGNzczogLi4vY3VzdG9tLmNzcwogICAgdGhlbWU6IGZsYXRseQogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCiAgICB0b2NfZGVwdGg6IDMKLS0tCgoKIyBIZWxsb3MKCldlbGNvbWUgdG8gKlN0YXRpc3RpY2FsIE1vZGVsbGluZyB3aXRoIFIqLiBJZiB0aGVyZSBpcyBvbmUgdGhpbmcgdG8gcmVtZWJlciBmcm9tIHRoaXMgY291cnNlLCBpdCBpcyB0aGF0IHlvdXIgYW5hbHlzaXMgd29ya2Zsb3cgc2hvdWxkIGxvb2sgc29tZXRoaW5nIGxpa2UgdGhpczoKCjxkaXYgY2xhc3MgPSAiaGFsZi1pbWciPgohW10oZmlndXJlcy93b3JrZmxvdy5zdmcpCjwvZGl2PgoKLS0tLS0tLS0tLS0tLS0KCiMgVGhlIHByb2Nlc3Mgb2YgbGVhcm5pbmcgUiBhbmQgTW9kZWxsaW5nCgpUaGVzZSBhcmUgc29tZSBvZiB0aGUgY29yZSBhcmVhcyBJIGZpZ3VyZSBhcmUgbmVjZXNzYXJ5IHRvIGdldHRpbmcgZ29vZCBhdCBzdGF0aXN0aWNhbCBtb2RlbGxpbmcgaW4gUjoKCjEuIFVzaW5nIFIgKGFuZCBSU3R1ZGlvKSB3ZWxsCjIuIEZlZWxpbmcgY29tZm9ydGFibGUgYW5kIGZsdWlkIHJlb3JnYW5pemluZyBhbmQgc3VtbWFyaXppbmcgZGF0YQozLiAqKlZpc3VhbGl6aW5nIERhdGEqKgo0LiBEZWNpZGluZyBiZWZvcmUgeW91IG1vZGVsIHdoYXQgeW91IHdhbnQgdG8gY29tcGFyZSB0byB3aGF0CjUuIEhvdyB0byB0cmFuc2xhdGUgeW91ciBhbmFseXNpcyBnb2FscyBpbnRvIFIgY29kZQo1LiBVbmRlcnN0YW5kaW5nIGEgbGl0dGxlIGJpdCBhYm91dCBzdGF0aXN0aWNzCjYuIFdoZW4gc29tZXRoaW5nIGdvZXMgd3JvbmcsIGJlaW5nIGFibGUgdG8gYWNjdXJhdGVseSBhdHRyaWJ1dGUgeW91ciBkaWZmaWN1bHR5IHRvIG9uZSBvZiB0aGUgYWJvdmUgdG9waWNzCgpUaGVzZSBhcmUgYWxsIHNraWxscyB5b3UgY2FuIGFjaGlldmUgdGhyb3VnaCBwcmFjdGljZSwgZXhwZXJpZW5jZSwgYW5kIG9jY2FzaW9uYWwgZ3VpZGFuY2UgZnJvbSBzb21lb25lIG1vcmUgc2tpbGxlZCB0aGFuIHlvdS4gSXQgaXMgZXhhY3RseSBsaWtlIGFjcXVpcmluZyBhbnkgb3RoZXIgc2tpbGwgb3IgY3JhZnQuIEF0IGZpcnN0IGl0IHdpbGwgYmUgY29uZnVzaW5nLCB5b3UnbGwgbWFrZSBzb21lIG1pc3Rha2VzLCBhbmQgaXQgd29uJ3QgbG9vayBzbyBnb29kLiBJIHRoaW5rCgo8ZGl2IHN0eWxlPSJ3aWR0aDoxMDAlO2Zsb2F0OmxlZnQ7Ij4KPGRpdiBzdHlsZSA9ICJ3aWR0aDozNSU7ZmxvYXQ6bGVmdDttYXJnaW4tbGVmdDoxMCU7bWFyZ2luLXJpZ2h0OjUlO21hcmdpbi1ib3R0b206NSU7Ij4KClRoZSBmaXJzdCBoYXQgSSBldmVyIGtuaXQ6CgohW10oZmlndXJlcy9maXJzdGhhdC5qcGcpCgo8L2Rpdj4KCjxkaXYgc3R5bGUgPSAid2lkdGg6MzUlO2Zsb2F0OmxlZnQ7bWFyZ2luczphdXRvO21hcmdpbi1yaWdodDoxMCU7bWFyZ2luLWxlZnQ6NSU7bWFyZ2luLWJvdHRvbTo1JTsiPgoKVGhlIG1vc3QgcmVjZW50IGhhdCBJIGtuaXQ6IAoKIVtdKGZpZ3VyZXMvbGFzdGhhdC5qcGcpCgo8L2Rpdj4KCgo8L2Rpdj4KCgpUaGUgd2F5IEkgaW1wcm92ZWQgbXkga25pdHRpbmcgaXMgZXhhY3RseSB0aGUgc2FtZSBhcyBob3cgeW91IGNhbiBpbXByb3ZlIHlvdXIgUiBwcm9ncmFtbWluZyBhYmlsaXR5OgoKKiBJIGtuaXQgYSBsb3QgKGFsbW9zdCBldmVyeSBkYXkpLgoqIEkgbWVtb3JpemVkIGEgYnVuY2ggb2Ygc3R1ZmYuCiogUmVtZW1iZXJlZCB3aGVyZSB0byBsb29rIHVwIHRoZSBzdHVmZiBJIGRvbid0IGhhdmUgbWVtb3JpemVkLgoqIE15IGtuaXR0aW5nIGJlY2FtZSBtb3JlICJpZGlvbWF0aWMiIChpLmUuIEkgc3RhcnRlZCBrbml0dGluZyBsaWtlIGhvdyBvdGhlciBrbml0dGVycyBrbml0KS4KKiBJIGxlYXJuZWQgaG93IHRvIGlkZW50aWZ5IGFuZCBmaXggbWlzdGFrZXMgd2l0aG91dCB1bmRvaW5nIG15IGVudGlyZSBwcm9qZWN0LgoqIEkgZGV2ZWxvcGVkIGdvb2Qgd29ya3NwYWNlIGh5Z2VuZSAmIG9yZ2FuaXphdGlvbi4KKiBBcyBJIGdvdCB0aGUgYmFzaWNzIGRvd24sIEkgc3RhcnRlZCByZXNlYXJjaGluZyBhbmQgaW5jb3Jwb3JhdGluZyBmdXNzeSBsaXR0bGUgZGV0YWlscyBpbnRvIG15IHdvcmsuCgpNb3N0IG9mIHRoZSBjb250ZW50IG9mIHRoZSBjb3Vyc2UgaXMgZGV2b3RlZCB0byBjb3JlIFIgcHJvZ3JhbW1pbmcgKHRoaW5ncyB5b3Ugc2hvdWxkIGJlIG1lbW9yaXppbmcgb3IgcmVtZW1iZXJpbmcgd2hlcmUgdG8gZmluZCBoZWxwKSwgYnV0IEknbGwgdHJ5IG15IGJlc3QgdG8gYW5ub3RhdGUgcG9ydGlvbnMgb2YgdGhlIG5vdGVzIHdoaWNoIGNvcnJlc3BvbmQgdG8gd29ya3NwYWNlIGh5Z2llbmUsIGJlaW5nIGlkaW9tYXRpYywgZXRjLCBzbyB0aGF0IHlvdSBjYW4gZGlzdGluZ3Vpc2ggYmV0d2VlbiB0aGVtLgoKCi0tLS0tLS0tLS0tLS0tLS0KCiMgQ291cnNlIE91dGxpbmUKClRoZSBjb3Vyc2Ugd2lsbCBmb2xsb3cgdGhlIHdvcmtmbG93IG91dGxpbmVkIGF0IHRoZSBiZWdpbm5pbmc6IGBiZWdpbiDihpIgc3VtbWFyaXplIOKGkiB2aXN1YWxpemUg4oaSIGFuYWx5emVgLiAKCgoKfCBXZWVrIHwgTW9uZGF5IHwgVGh1cnNkYXkgfAp8IC0tLS0tLTogfCA6LS0tLS0tOiB8IDotLS0tLS0tOiAgfAp8ICAgMSAgICAgfCAgICAtLSAgfCBJbnRybyAtIEJhc2ljcyAmIFIgTm90ZWJvb2tzCXwKfCAgIDIgICAgIHwgRGF0YSBGcmFtZXMgJiBGYWN0b3JzCXwgU3BsaXQtQXBwbHktQ29tYmluZSwgUmVzaGFwaW5nIHwKfCAgIDMgICAgIHwgZ2dwbG90MgkgICAgICAgICAgICAgIHwgRml0dGluZyBMaW5lYXIgTW9kZWxzCSB8CnwgICA0ICAgICB8IG1hcCBmdW5jdGlvbnMgJiBmaXR0aW5nIG1hbnkgbW9kZWxzCSB8IE1peGVkIEVmZmVjdHMgTGluZWFyIG1vZGVscyB8CnwgICA1ICAgICB8IEJvb3RzdHJhcHMgJiBWaXN1YWxpemF0aW9uCSB8IC0tIHwKCgoKCjxkaXYgY2xhc3MgPSAiYm94IGh5Z2llbmUiPgo8c3BhbiBjbGFzcyA9ICJsYWJlbCI+V29ya3NwYWNlIEh5Z2llbmU8L3NwYW4+CgojIyBSZWNvbW1lbmRlZCBDb3Vyc2UgRGlyZWN0b3J5IFN0cnVjdHVyZQoKSWYgeW91IGhhdmUgYSBkaXJlY3RvcnkgcGxhbm5pbmcgc3RydWN0dXJlIHRoYXQgeW91J3JlIGhhcHB5IHdpdGgsIGdvIGFoZWFkIGFuZCBkbyB0aGF0LiBCdXQgaWYgaG93IHRvIG9yZ2FuaXplIHlvdXIgUiBhbmFseXNpcyBsaWZlIGlzIHNvbWV0aGluZyB5b3UnZCBsaWtlIHRvIGdldCBvdXQgb2YgdGhpcyBjb3Vyc2UsIEknZCByZWNvbW1lbmQgdGhlIGZvbGxvd2luZyBkaXJlY3Rvcnkgc3RydWN0dXJlICYgbmFtaW5nIGNvbnZlbnRpb25zLgoKICAgIOKUnOKUgOKUgCBsc2FfMjAxNwogICAg4pSCwqDCoCDilJTilIDilIAgcl9tb2RlbGxpbmcqCiAgICDilILCoMKgICAgICDilJzilIDilIAgYXNzaWdubWVudHMKICAgIOKUgsKgwqAgICAgIOKUnOKUgOKUgCBkYXRhCiAgICDilILCoMKgICAgICDilJTilIDilIAgbGVjdHVyZXMKICAgICAgICAgICAgCgpUaGUgcl9tb2RlbGxpbmcgZGlyZWN0b3J5IHdpbGwgYmUgdGhlIGhvbWUgZGlyZWN0b3J5IGZvciB0aGUgY291cnNlLiBJIHdvdWxkIHJlY29tbWVuZCBjcmVhdGluZyBhIG5ldyBSIE5vdGVib29rIGZvciBlYWNoIGxlY3R1cmUgKG1vcmUgb24gdGhhdCBpbiBhIG1vbWVudCkgYW5kIGdpdmluZyB0aGVtIGEgbmFtaW5nIGNvbnZlbnRpb24gbGlrZToKCiAgICAwMV9sZWN0dXJlLlJtZAogICAgMDJfbGVjdHVyZS5SbWQKICAgIApSaWdodCBub3cgKiplbGltaW5hdGUgdGhlIGltcHVsc2UgdG8gY3JlYXRlIGFueSBmb2xkZXJzIG9yIGZpbGUgbmFtZXMgd2l0aCBzcGFjZXMgaW4gdGhlbSoqLgoKPC9kaXY+CgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tCgojIFIsIFJTdHVkaW8gYW5kIFIgTm90ZWJvb2tzCgpXZSdyZSBnb2luZyB0byBiZSB1c2luZyBSLCBSU3R1ZGlvLCBhbmQgUiBOb3RlYm9va3MgaW4gdGhpcyBjb3Vyc2UsIGFuZCBpdCdzIGEgbGl0dGxlIGltcG9ydGFudCB0byBrZWVwIHN0cmFpZ2h0IHdoYXQgdGhlc2UgdGhyZWUgdGhpbmdzIGFyZToKCiMjIFIKCioqUioqIGlzIGEgcHJvZ3JhbW1pbmcgbGFuZ3VhZ2UgdGhhdCBydW5zIG9uIHlvdXIgY29tcHV0ZXIuIEF0IGl0cyBiYXJlc3QgYm9uZXMsIGl0IGxvb2tzIGxpa2UgdGhpczoKCjxkaXYgY2xhc3MgPSAiaGFsZi1pbWciPgohW10oZmlndXJlcy8yX19SLnBuZykKPC9kaXY+CgpZb3UgY2FuIHR5cGUgdGV4dCBpbnRvIHRoZSBwcm9tcHQgdGhlcmUsIGFuZCBpZiB5b3UndmUgc3VjY2Vzc2Z1bGx5IG1lbW9yaXplZCB0aGUgcmlnaHQgUiBjb21tYW5kcywgaXQnbGwgZG8gc29tZSB0aGluZ3MuCgoKIyMgUlN0dWRpbwoKKipSU3R1ZGlvKiogaXMgbGlrZSBhbiBJbnN0YWdyYW0gZmlsdGVyIG92ZXIgdG8gb2YgUiwgdG8gbWFrZSB5b3VyIFIgdXNlIGV4cGVyaWVuY2UgYmV0dGVyLiBJdCB2aXN1YWxseSBvcmdhbml6ZXMgc29tZSBpbXBvcnRhbnQgY29tcG9uZW50cyBvZiB1c2luZyBSIGludG8gcGFuZXMsIGFuZCBvZmZlcnMgKmNvZGUgY29tcGxldGlvbiogc3VnZ2VzdGlvbnMuIEZvciBleGFtcGxlLCBpZiB5b3UgcmVtYmVyIHRoZXJlJ3Mgc29tZXRoaW5nIGNhbGxlZCBhICJXaWxjb3hvbiB0ZXN0IiwgYnV0IHlvdSBkb24ndCByZW1lbWJlciB3aGF0IHRoZSBmdW5jdGlvbiBpbiBSIGlzLCB5b3UgY2FuIHN0YXJ0IHR5cGluZyBpbiBgV2lsY2AsIGFuZCB0aGlzIHdpbGwgaGFwcGVuOgoKPGRpdiBjbGFzcyA9ICJoYWxmLWltZyI+CiFbXShmaWd1cmVzL2NvZGVDb21wbGV0aW9uLnBuZykKPC9kaXY+CgpSU3R1ZGlvJ3MgYXV0b2NvbXBsZXRpb24gaXMgcmVhbGx5IHVzZWZ1bCBmb3IgYSBsb3Qgb2Ygb3RoZXIgdGhpbmdzLCBsaWtlIHJlbWluZGluZyB5b3Ugd2hhdCB0aGUgY29sdW1uIG5hbWVzIGFyZSBpbiB5b3VyIGRhdGEgZnJhbWUsIHdoYXQgdGhlIG5hbWVzIG9mIGFsbCB0aGUgYXJndW1lbnRzIHRvIGEgZnVuY3Rpb24gYXJlLCBldGMuIAoKQnV0IHBlcmhhcHMgdGhlIG1vc3QgdmFsdWFibGUgY29tcG9uZW50IGluIFIgU3R1ZGlvIHRoZXNlIGRheXMgaXMgaXRzIGF1dGhvcmluZyB0b29scywgbGlrZSBSIE5vdGVib29rcwoKIyMgUiBOb3RlYm9va3MKClIgTm90ZWJvb2tzIGFsbG93IHlvdSB0byBkb2N1bWVudCB5b3VyIGNvZGUgaW4gcGxhaW4gdGV4dCwgaW5zZXJ0IFIgQ29kZSBjaHVua3MsIGFuZCB2aWV3IHRoZSByZXN1bHRzIG9mIHRoZSBSIGNvZGUgYWxsIGluIG9uZSBwbGFjZSwgdGhlbiBjb21waWxlIGl0IGludG8gYSBuaWNlIGxvb2tpbmcgbm90ZWJvb2suCgoKCgo8ZGl2IGNsYXNzPSJib3ggYnJlYWsiPgo8c3BhbiBjbGFzcz0iYmlnLWxhYmVsIj5+NSBNaW51dGUgQWN0aXZpdHk8L3NwYW4+CgpHb2FscwoKMS4gU3RhcnQgYSBuZXcgUlN0dWRpbyBQcm9qZWN0LgoyLiBDcmVhdGUgYSBuZXcgUiBOb3RlYm9vay4KMy4gUnVuIHNvbWUgY29kZSBpbiB0aGUgUiBOb3RlYm9vay4KNC4gUHJldmlldyB0aGUgUiBOb3RlYm9vayBpbiBIVE1MCgoKIyMjIyBTdGFydCBhIG5ldyBSU3R1ZGlvIFByb2plY3QKCkNyZWF0ZSBhIG5ldyBSU3R1ZGlvIFByb2plY3QsIGVpdGhlciBieSB1c2luZyB0aGUgbWVudSBvcHRpb25zIGBGaWxlID4gTmV3IFByb2plY3RgIG9yIGJ5IGNsaWNraW5nIG9uIHRoZSA8aW1nIHNyYyA9ICJmaWd1cmVzL1JQcm9qZWN0LnBuZyIgc3R5bGU9IndpZHRoOjIlOyI+PC9pbWc+IGljb24gaW4gdGhlIHRvcCByaWdodCBjb3JuZXIgb2YgdGhlIFJTdHVkaW8gd2luZG93LiBJZiB5b3UgaGF2ZSBjcmVhdGVkIGRpcmVjdG9yeSBzdHJ1Y3R1cmUgYWJvdmUgY2hvb3NlICpFeGlzdGluZyBEaXJlY3RvcnkqIGFuZCBjaG9vc2UgYHJfbW9kZWxsaW5nYC4gT3RoZXJ3aXNlLCBzZWxlY3QgdGhlIG9wdGlvbnMgKk5ldyBEaXJlY3RvcnkqIHRoZW4gKkVtcHR5IFByb2plY3QqIGFuZCB0ZWxsIGl0IHRoZSBwcm9qZWMgbmFtZSBpcyBgcl9tb2RlbGxpbmdgCgojIyMjIENyZWF0ZSBhIG5ldyBSIE5vdGVib29rCgpPcGVuIGEgbmV3IFIgTm90ZWJvb2sgdXNpbmcgdGhlIG1lbnUgY29tbWFuZCBgRmlsZSA+IE5ldyBGaWxlID4gUiBOb3RlYm9va2AuIElmIHRoaXMgaXMgdGhlIGZpcnN0IHRpbWUgeW91J3ZlIG9wZW5lZCBhbiBSIE5vdGVib29rIG9uIHlvdXIgY29tcHV0ZXIsIHlvdSdsbCBwcm9iYWJseSBiZSBmYWNlZCB3aXRoIHRoZSBmb2xsb3dpbmcgcHJvbXB0OgoKPGRpdiBjbGFzcz0iaGFsZi1pbWciPgoKIVtdKGZpZ3VyZXMvaW5zdGFsbC5wbmcpCgo8L2Rpdj4KCgpDbGljayAiWWVzIiwgYW5kIHdhaXQgZm9yIHRoZSBpbnN0YWxsYXRpb24gdG8gZmluaXNoLiBBIHdpbmRvdyB3aXRoIGEgYnVuY2ggb2YgZ29iYmxlZHlnb29rIHdpbGwgcG9wIHVwLCBhbmQgdGhhdCdzIGZpbmUuIE9uY2UgaXQncyBhbGwgZmluaXNoZWQsIHRoZSBuZXcgZmlsZSBzaG91bGQgb3Blbi4KCiMjIyMgUnVuIHNvbWUgY29kZSBpbiB0aGUgUiBOb3RlYm9vawoKRmlyc3QsIHJ1biB0aGUgUiBjb2RlIGNodW5rIHRoYXQgY29tZXMgYXV0b21hdGljYWxseSBpbiBhIG5ldyBSIE5vdGVib29rIGJ5IGNsaWNraW5nIG9uIHRoZSBncmVlbiAicGxheSIgYnV0dG9uIGluIHRoZSB0b3AgcmlnaHQgY29ybmVyIG9mIHRoZSBjb2RlIGNodW5rLgoKTmV4dCwgaW5zZXJ0IGEgbmV3IFIgY29kZSBjaHVuayBhdCB0aGUgYm90dG9tIG9mIHRoZSBub3RlYm9vayAoZGlyZWN0aW9ucyBmb3IgaG93IHRvIGRvIHNvIGFyZSBhbHJlYWR5IGluY2x1ZGVkIGluIHRoZSBuZXcgUiBOb3RlYm9vayksIGFuZCBpbnNpZGUsIGVudGVyOgoKYGBgCiJIZWxsbyBXb3JsZCIKYGBgCgpUaGVuIHJ1biB0aGlzIGNvZGUgY2h1bmsgYnkgY2xpY2tpbmcgdGhlIHBsYXkgYnV0dG9uLgoKCiMjIyMgUHJldmlldyB0aGUgUiBOb3RlYm9vayBpbiBIVE1MCgpDbGljayB0aGUgIlByZXZpZXciIGJ1dHRvbiBhdCB0aGUgdG9wIG9mIHRoZSBSIE5vdGVib29rIHBhbmVsIHRvIGNvbXBpbGUgaXQgaW50byBhbiBIVE1MIGRvY3VtZW50LiBZb3Ugd2lsbCBuZWVkIHRvIHNhdmUgdGhlIG5vdGVib29rIGZpcnN0LiBJbiB0aGUgYGxlY3R1cmVzYCBmb2xkZXIsIHNhdmUgaXQgYXMgYDAwX3ByYWN0aWNlLlJtZGAKPC9kaXY+CgoKIyMgRGlzY3Vzc2lvbgoKSSdtIGdvaW5nIHRvIHJlY29tbWVuZCAoZm9yIG5vdyBhdCBsZWFzdCkgdGhhdCB5b3UgcnVuIGFsbCBvZiB5b3VyIGNvZGUgdGhvdWdoIGFuIFIgTm90ZWJvb2suIEl0IGlzIHBvc3NpYmxlIHRvIGp1c3QgdHlwZSB0aGluZ3MgaW50byB0aGUgUiBjb25zb2xlLCBidXQgdGhhdCdzIGtpbmQgb2YgbGlrZSBkaWN0YXRpbmcgYSBwYXBlciBpbnRvIHRoaW4gYWlyLiBPbmNlIHlvdSd2ZSBzcG9rZW4gdGhlIHdvcmRzLCB0aGV5IGRpc3NhcGVhciBhbmQgY2FuIGJlIGhhcmQgdG8gcmVjb3Zlci4KCk15IGVhcmxpZXIgYWR2aWNlIHdvdWxkIGhhdmUgYmVlbiB0byB3cml0ZSBhbGwgb2YgeW91ciBjb2RlIGluIGFuIFIgc2NyaXB0IGZpbGUsIGJ1dCB0aGF0IGFsc28gc2VwYXJhdGVzIHRoZSBjb2RlIGZyb20gaXRzIHJlc3VsdHMsIHdoaWNoIGNhbiBiZSBoYXJkIGZvciBiZWdpbm5lcnMgdG8ga2VlcCB0cmFjayBvZi4gCgo8aHIgPjwvaHI+CgoKIyBJbnN0YWxsaW5nIFIgUGFja2FnZXMKClIgY29tZXMgd2l0aCBhIGxvdCBvZiBmdW5jdGlvbmFsaXR5IGluc3RhbGxlZCwgYnV0IG9uZSB3YXkgdGhhdCBSIGlzIGV4dGVudGlibGUgaXMgdGhyb3VnaCB1c2VycycgYWJpbGl0eSB0byBjb250cmlidXRlIG5ldyBjb2RlICYgIGRhdGEgdGhyb3VnaCBpdCdzIHBhY2thZ2UgbWFuYWdlbWVudCBzeXN0ZW0uIFdlJ3JlIGdvaW5nIHRvIHVzaW5nIGEgbnVtYmVyIG9mIHRoZXNlIHBhY2thZ2VzIGluIHRoZSBjb3Vyc2UsIGVzcGVjaWFsbHkgc2luY2UgYSBmZXcgb2YgdGhlbSBoYXZlIGZ1bmRhbWVudGFsbHkgY2hhbmdlZCB0aGUgd2F5IFIgcHJvZ3JhbW1pbmcgd29ya3MgaW4gdGhlIHBhc3QgMyB5ZWFycy4gIFRoZXJlJ3MgYWxzbyBhIGNvdXJzZSBSIHBhY2thZ2UgSSd2ZSBjcmVhdGVkIHRvIGVhc2lsbHkgZGlzdHJpYnV0ZSBzYW1wbGUgZGF0YXNldHMuCgpIZXJlJ3MgYSBiYXNpYyBkaWFncmFtIG9mIGhvdyBSIHBhY2thZ2VzIHdvcms6CgohW10oZmlndXJlcy9jcmFuX3BhY2thZ2UucG5nKQogCgoKIyMgSW5zdGFsbGluZyBQYWNrYWdlcwoKIyMjIGBpbnN0YWxsLnBhY2thZ2VzKClgCgpNb3N0IFIgcGFja2FnZXMgYXJlIGRpc3RyaWJ1dGVkIHRocm91Z2ggQ1JBTiAoQ29tcHJlaGVuc2l2ZSBSIEFyY2hpdmUgTmV0d29yaykuIFdoZW4geW91IHJ1biBmdW5jdGlvbiBgaW5zdGFsbC5wYWNrYWdlcygieCIpYCwgUiBjaGVja3Mgd2hldGhlciB0aGUgcGFja2FnZSBgIngiYCBleGlzdHMgb24gQ1JBTiwgYW5kIGluc3RhbGxzIGl0IG9uIHlvdXIgY29tcHV0ZXIgaWYgaXQgZG9lcy4gWW91IG1heWJlIGFza2VkIHRvIGNob29zZSBhICJDUkFOIG1pcnJvciIgdGhlIGZpcnN0IHRpbWUgeW91IHJ1biBgaW5zdGFsbC5wYWNrYWdlcygpYC4gVGhpcyBpcyBiZWNhdXNlIHRoZXJlIGFyZSBtYW55IGNvcGllcyBvZiBDUkFOIGRpc3RyaWJ1dGVkIGFyb3NzIHRoZSBpbnRlcm5ldC4gSSdkIHJlY29tbWVuZCBjaG9vc2luZyB0aGUgZmlyc3Qgb3B0aW9uIGNhbGxlZCBgMC1DbG91ZGAuCgoKIyMjIGBpbnN0YWxsX2dpdGh1YigpYAoKQXMgYSBwYWNrYWdlIGRldmVsb3BlciwgZ2V0dGluZyBhIHBhY2thZ2Ugb250byBDUkFOIGNhbiBiZSBhIGJpdCBvZiBhIHBhaW4sIHNvIHNvbWUgcGFja2FnZXMgKGFuZCBkZXZlbG9wbWVudCB2ZXJzaW9ucyBvZiBtYW55KSBhcmUgYWxzbyBhdmFpbGFibGUgb24gR2l0SHViLCB3aGljaCBjYW4gYmUgZWFzaWxseSBpbnN0YWxsZWQgd2l0aCBgZGV2dG9vbHM6Omluc3RhbGxfZ2l0aHViKCJ1c2VybmFtZS9wYWNrYWdlIilgLgoKCiMjIEluc3RhbGxpbmcgcGFja2FnZXMgaXMgZGlmZmVyZW50IGZyb20gbG9hZGluZyBwYWNrYWdlcwoKKipJbnN0YWxsaW5nKiogYSBwYWNrYWdlIGlzIGRpZmZlcmVudCBmcm9tICoqbG9hZGluZyoqIHBhY2thZ2VzLiBJbnN0YWxsaW5nIGEgcGFja2FnZSBvbmx5IGRvd25sb2FkcyBhbmQgY29uZmlndXJlcyB0aGUgY29kZSBvbiB5b3VyIGNvbXB1dGVyLiBJbiBvcmRlciB0byAqdXNlKiB0aGUgY29udGVudHMgb2YgYSBwYWNrYWdlLCB5b3UgbmVlZCB0byBsb2FkIGl0IGludG8geW91ciBSIHNlc3Npb24gd2l0aCBgbGlicmFyeSgpYC4KCi0gWW91IG9ubHkgbmVlZCB0byBydW4gYGluc3RhbGwucGFja2FnZXMoKWAgb25jZSB0byBpbnN0YWxsIGEgcGFja2FnZSwgb3IgdG8gdXBkYXRlIGEgcGFja2FnZS4KLSBZb3UgbmVlZCB0byBydW4gYGxpYnJhcnkoKWAgYXQgdGhlIHN0YXJ0IG9mIGV2ZXJ5IG5ldyBSIHNlc3Npb24gaW4gb3JkZXIgdG8gdXNlIHRoZSBmdW5jdGlvbmFsaXR5IGZyb20gdGhhdCBwYWNrYWdlLgoKRm9yIGV4YW1wbGUsIGBnZ3Bsb3QoKWAgaXMgYSBmdW5jdGlvbiBmcm9tIHRoZSBwYWNrYWdlIGBnZ3Bsb3QyYC4gSSBoYXZlIGFscmVhZHkgaW5zdGFsbGVkIGBnZ3Bsb3QyYCBvbiBteSBjb21wdXRlciwgYnV0IGlmIEkgdHJ5IHRvIHVzZSBgZ2dwbG90KClgIGJlZm9yZSBsb2FkaW5nIHRoZSBwYWNrYWdlIHdpdGggYGxpYnJhcnkoKWAsIEknbGwgZ2V0IHRoZSBlcnJvciB0aGF0IHRoZSBmdW5jdGlvbiB3YXMgbm90IGZvdW5kLgoKYGBge3J9CmZvbyA8LSBnZ3Bsb3QoKQpgYGAKCmBgYHtyfQpsaWJyYXJ5KCJnZ3Bsb3QyIikKZm9vIDwtIGdncGxvdCgpCmBgYAoKCjxkaXYgY2xhc3MgPSAiYm94IGJyZWFrIj4KPHNwYW4gY2xhc3M9ImJpZy1sYWJlbCI+fjIgTWludXRlIEFjdGl2aXR5PC9zcGFuPgoKTGV0J3MgaW5zdGFsbCBhbGwgb2YgdGhlIHBhY2thZ2VzIHdlJ3JlIGdvaW5nIHRvIHVzZSBpbiB0aGUgY291cnNlLiBEb3VibGUgY2hlY2sgdGhhdCB5b3UncmUgY29ubmVjdGVkIHRvIHRoZSBpbnRlcm5ldC4KCgpDcmVhdGUgYSBub3RlYm9vayBmb3IgdGhpcyBsZWN0dXJlIGNhbGxlZCBgMDFfbGVjdHVyZS5SbWRgLiBDb3B5LXBhc3RlIHRoZSBmb2xsb3dpbmcgaW50byBhbiBSIGNvZGUgY2h1bmsgYW5kIHJ1biBpdDoKCmBgYHtyfQppbnN0YWxsLnBhY2thZ2VzKAogIGMoInRpZHl2ZXJzZSIsCiAgICAiZGV2dG9vbHMiKQopCgpsaWJyYXJ5KCJkZXZ0b29scyIpCgppbnN0YWxsX2dpdGh1Yigiam9mcmh3bGQvbHNhMjAxNyIpCmBgYAoKPC9kaXY+CgotLS0tLS0tLS0tLS0tLS0tLS0tCgojIFIgQmFzaWNzCgpXZSdyZSBub3cgZ29pbmcgdG8gcnVuIHRocm91Z2ggc29tZSB2ZXJ5IGJhc2ljcyBvZiBSLCBzcGVjaWZpY2FsbHk6CgotIEJhc2ljIERhdGEgVHlwZXMKLSBCYXNpYyBDYWxjdWxhdGlvbnMKLSBBc3NpZ25tZW50Ci0gVmVjdG9ycwotIEluZGV4aW5nCgpDcmVhdGUgYSBuZXcgUiBOb3RlYm9vay4gQ2hhbmdlIHRoZSBgVGl0bGVgIGZpZWxkIHRvIGBJbnRybyB0byBSYCwgYW5kIHNhdmUgaXQgYXMgYDBfbGVjdHVyZS5SbWRgIGluIHRoZSBmb2xkZXIgYGxlY3R1cmVzYC4KCkFzIHdlIGNvbWUgdG8gYSBjb2RlIGNodW5rIGluIHRoZSBsZWN0dXJlLCBlaXRoZXIgY29weS1wYXN0ZSBvciByZS10eXBlIGl0IGludG8gYSBuZXcgY29kZSBjaHVuayBpbiB5b3VyIGxlY3R1cmUgUiBub3RlYm9vaywgYW5kIHJ1biBpdC4KCiMjIEJhc2ljIENhbGN1bGF0aW9ucwoKT25lIHdheSB0byB0aGluayBvZiBSIGlzIGFzIGFuIG92ZXJibG93biBjYWxjdWxhdG9yLgoKYGBge3J9CjMrMyAKMio0CigzNjktMSkvNgpgYGAKCkJ1dCBpdCdzIG5vdCBhbGwgdGhhdCB1c2VmdWwgdG8gZG8gYSBidW5jaCBvZiBjYWxjdWxhdGlvbnMgd2l0aG91dCBzYXZpbmcgdGhlIHJlc3VsdHMgZm9yIGxhdGVyLCB3aGljaCBpcyB3aGVyZSBhc3NpZ25tZW50IGNvbWVzIGluLgoKIyMgQXNzaWdubWVudApZb3UgY2FuIGFzc2lnbiAqKnZhbHVlcyoqIHRvICoqdmFyaWFibGVzKiogdXNpbmcgdGhlIGFzc2lnbm1lbnQgb3BlcmF0b3I6IGA8LWAgb3IgYC0+YCAoYnV0IGluIHByYWN0aWNlLCBvbmx5IHVzZSBgPC1gKS4KCjxkaXYgc3R5bGU9ImZvbnQtZmFtaWx5Om1vbm9zcGFjZTtmb250LXNpemU6eHgtbGFyZ2U7dGV4dC1hbGlnbjpjZW50ZXI7Ij4KPHNwYW4gc3R5bGU9ImNvbG9yOiM3NDc0NzQiPnZhcmlhYmxlPC9zcGFuPiA8c3BhbiBzdHlsZT0iY29sb3I6cmVkIj48LTwvc3Bhbj4gPHNwYW4gc3R5bGU9ImNvbG9yOiM3NDc0NzQiPnZhbHVlPC9zcGFuPgo8L2Rpdj4KCmBgYHtyfQp4IDwtIDEwCnkgPC0gMiozCmBgYAoKT25jZSB5b3UndmUgYXNzaWduZWQgYSB2YWx1ZSB0byBhIHZhcmlhYmxlLCB5b3UgY2FuIHJldXNlIHRoZSB2YWx1ZSBzdG9yZWQgaW4gdGhhdCB2YXJpYWJsZSBmb3Igb3RoZXIgcHVycG9zZXMsIGxpa2UganVzdCBwcmludGluZyBpdCBvdXQgYWdhaW4KYGBge3J9CngKeQpgYGAKCk9yIGFkZGluZyB0aGUgdHdvIHZhbHVlcyB0b2dldGhlcgpgYGB7cn0KeCArIHkgCmBgYAoKSW4gc2hvcnQsIHlvdSBjYW4gdXNlIHRoZXNlIHZhcmlhYmxlcyBgeGAgYW5kIGB5YCBsaWtlIHRoZXkgKmFyZSogdGhlIHZhbHVlcyB5b3UgYXNzaWduZWQgdG8gdGhlbS4gSWYgdGhpcyBpcyB5b3VyIGZpcnN0IHRpbWUgcHJvZ3JhbW1pbmcsIGhlcmUgYXJlIGEgZmV3IHRoaW5ncyB0byBjbGFyaWZ5OgoKKipOb3RlKioKCi0gYHhgIGFuZCBgeWAgZGlkbid0IGV4aXN0IGJlZm9yZSB5b3UgY3JlYXRlZCB0aGVtIGJ5IGFzc2lnbmluZyB2YWx1ZXMgdG8gdGhlbS4KLSBZb3UgY291bGQgaGF2ZSBjaG9zZW4gKmFsbW9zdCogYW55IG5hbWUgZm9yIHRoZXNlIHZhcmlhYmxlcy4KLSBZb3UgY2FuIGp1c3QgYXMgZWFzaWxseSBhc3NpZ24gKm5ldyogIHZhbHVlcyB0byB0aGVzZSB2YXJpYWJsZXMuCgoKCjxkaXYgY2xhc3MgPSAiYm94IGlkaW9tIj4KPHNwYW4gY2xhc3MgPSAibGFiZWwiPklkaW9tPC9zcGFuPgoKIyMjIyBOYW1pbmcgVGhpbmdzCgpgeGAgYW5kIGB5YCBhcmUgbG91c3kgbmFtZXMgZm9yIHZhcmlhYmxlcy4gV2hlbiBpdCBjb21lcyB0byBuYW1pbmcgdmFyaWFibGVzLCB0aGVyZSdzIGEgZmFtb3VzIHNheWluZzoKCj4g4oCcVGhlcmUgYXJlIG9ubHkgdHdvIGhhcmQgdGhpbmdzIGluIENvbXB1dGVyIFNjaWVuY2U6IGNhY2hlIGludmFsaWRhdGlvbiBhbmQgbmFtaW5nIHRoaW5ncy7igJ0gCj4g4oCUIFBoaWwgS2FybHRvbgoKRm9yIGJlc3QgcHJhY3RpY2VzIG9uIG5hbWluZyB2YXJpYWJsZXMsIEknbGwgcmVmZXIgeW91IHRvIFt0aGUgdGlkeXZlcnNlIHN0eWxlIGd1aWRlIGJ5IEhhZGxleSBXaWNraGFtXShodHRwOi8vc3R5bGUudGlkeXZlcnNlLm9yZy9zeW50YXguaHRtbCNvYmplY3QtbmFtZXMpLiBUbyBicmllZmx5IHN1bW1hcml6ZSBpdDoKCgoKCi0gVXNlIG9ubHkgbG93ZXJjYXNlIGxldHRlcnMgYW5kIG51bWJlcnMuCi0gVXNlIGBfYCB0byBzZXBhcmF0ZSB3b3JkcyBpbiBhIGEgdmFyaWFibGUgbmFtZS4KLSBZb3UncmUgYWN0dWFsbHkgbm90IGFibGUgdG8gc3RhcnQgYSB2YXJpYWJsZSBuYW1lIHdpdGggYSBudW1iZXIuCgpBbHNvLCBiZSBndWlkZWQgYnkgVGhlIFByaW5jaXBsZSBvZiBMZWFzdCBFZmZvcnQuIFVzZSB0aGUgbWluaW1hbCBhbW1vdW50IG9mIGNoYXJhY3RlcnMgdGhhdCBhcmUgc3RpbGwgY2xlYXJseSBpbnRlcnByZXRhYmxlLgoKYGBge3J9CiMgR29vZCBOYW1lcwptb2RlbF8xCm1vZGVsX2Z1bGwKCgojIEJhZCBOYW1lcwp0aGVfZmlyc3RfbW9kZWxfSV9ldmVyX2ZpdApqdXN0X3RyeWluZ19vdXRfYV9tb2RlbF93aXRoX2FsbF9wcmVkaWN0b3JzCm1fMDEKbV9hZ2RmCmBgYAoKQWxzbywganVzdCB1c2UgZ29vZCBqdWRnbWVudC4gVGhlcmUgaXMgbm90aGluZyBpbiBSIHByZXZlbnRpbmcgeW91IGZyb20gZG9pbmcgc3R1ZmYgbGlrZSB0aGlzIHRvIHlvdXJzZWxmLgoKYGBge3J9CmZpdmUgPC0gMTAKdGVuIDwtIDUKCnllbGxvdyA8LSAiZ3JlZW4iCmBgYAoKPC9kaXY+CgoKQW5vdGhlciB0aGluZyB0byBrZWVwIGluIG1pbmQgaXMgdGhhdCBSIGNhbid0IGhhbmRsZSBhbnkgb3RoZXIgY2hhcmFjdGVycyBpbiBudW1lcmljIHZhbHVlcyBvdGhlciB0aGFuIGAwYCB0aHJvdWdoIGA5YCBhbmQgZGVjaW1hbCBwbGFjZXMuIEFsbCBvZiB0aGVzZSB3aWxsIGZhaWw6CgpgYGB7cn0KIyBubyBjb21tYXMKdGhvdXNhbmQgPC0gMSwwMDAKYGBgCgpgYGB7cn0KIyBubyBzcGFjZXMKdGhvdXNhbmQgPC0gMSAwMDAKYGBgCgpgYGB7cn0KIyBsaWtlIHRoaXMKdGhvdXNhbmQgPC0gMTAwMApgYGAKCmBgYHtyfQojIG5vIGN1cnJlbmNpZXMKZG9sbGFycyA8LSAkMTAwMApgYGAKCmBgYHtyfQojIG5vIHBlcmNlbnRhZ2VzCgpwZXJjZW50IDwtIDUxJQpgYGAKCiMjIyBBZGRpdGlvbmFsIGRhdGEgdHlwZXMKCkluIGFkZGl0aW9uIHRvIG51bWJlcnMsIG90aGVyIGJhc2ljIGRhdGEgdHlwZXMgaW4gUiBhcmUgKipjaGFyYWN0ZXIqKiBhbmQgKipsb2dpY2FsKiouCgpgYGB7cn0KIyBjaGFyYWN0ZXIgZGF0YQpkaWdpdGFsX3dvcmRzIDwtIGMoImZhbSIsCiAgICAgICAgICAgICAgICAgICAiSGFyYW1iZSIsCiAgICAgICAgICAgICAgICAgICAidHdlZXRzdG9ybSIsCiAgICAgICAgICAgICAgICAgICAiQCIpCmBgYAoKYGBge3J9CiMgbG9naWNhbCB2YWx1ZXMKVFJVRQoKIyBhIGxvZ2ljYWwgdGVzdAooMTAvMikgPCAzCmBgYAoKCiMjIyMgT24gdXNpbmcgcXVvdGVzCgpXaGVuIHlvdSBlbnRlciBjaGFyYWN0ZXJzICp3aXRob3V0KiBxdW90ZXMgYXJvdW5kIHRoZW0sIFIgYXNzdW1lcyB5b3UncmUgcmVmZXJyaW5nIHRvIGEgdmFyaWFibGUuIElmIHlvdSB0cmllZCB0byBkbyB0aGUgYXNzaWdubWVudCBhYm92ZSB3aXRob3V0IHRoZSBxdW90ZXMsIHlvdSdsbCBnZXQgYW4gZXJyb3IuCgpgYGB7cn0KZGlnaXRhbF93b3Jkc19mYWlsIDwtIGMoZmFtLAogICAgICAgICAgICAgICAgICAgICAgICBIYXJhbWJlLAogICAgICAgICAgICAgICAgICAgICAgICB0d2VldHN0b3JtKQpgYGAKCkhlcmUsIFIgc2F3IGBmYW1gLCB3aGljaCBpc24ndCBpbiBxdW90ZXMsIHNlYXJjaGVkIHRoZSBlbnZpcm9ubWVudCBmb3IgYW55IHZhcmlhYmxlcyBuYW1lZCBgZmFtYCBhbmQgY291bGRuJ3QgZmluZCBhbnkuCgpXaGVuIHlvdSBwdXQgY2hhcmFjdGVycyBpbiBxdW90ZXMsIFIgYXNzdW1lcyBpdCdzIGEgY2hhcmFjdGVyIHZhbHVlLCAqZXZlbiBpZiB0aGVyZSdzIGEgdmFyaWFibGUgYnkgdGhlIHNhbWUgbmFtZSouCgpgYGB7cn0KZGlnaXRhbF93b3JkcwoiZGlnaXRhbF93b3JkcyIKYGBgCgoKIyMgVmVjdG9ycwoKVmVjdG9ycyBhcmUgZXNzZW50aWFsbHkgbGlzdHMgb2YgZGF0YSwgYW5kIGNhbiBjb250YWluIGNoYXJhY3RlcnMsIG51bWJlcnMsIG9yIFRSVUUgRkFMU0UgdmFsdWVzLiBUaGVyZSBhcmUgYSBudW1iZXIgb2Ygd2F5cyB0byBjcmVhdGUgdmVjdG9ycyBpbiBSLCBhbmQgZnJlcXVlbnRseSBkb2luZyBkYXRhIG1hbmlwdWxhdGlvbiB3aWxsIHByb2R1Y2Ugc3VidmVjdG9ycyBvZiBkYXRhLiAKCi0gYDE6MTBgCiAgICAtIFRoaXMgcHJvZHVjZXMgYSB2ZWN0b3Igb2YgaW50ZWdlcnMgZnJvbSAxIHRvIDEwLiBSZXZlcnNpbmcgdGhlIG9yZGVyIG9mIHRoZSBudW1iZXJzIHdpbGwgcHJvZHVjZSBhIHZlY3RvciBvZiBkZWNyZWFzaW5nIHZhbHVlcy4KLSBgYyguLi4pYAogICAgLSBUaGlzIHByb2R1Y2VzIGEgdmVjdG9yIG9mIHdoYXRldmVyIGlzIHBhc3NlZCBhcyBhbiBhcmd1bWVudCB0byBgYygpYC4KICAgICAgICAtIGBjKDEsMiwzLDQpYAotIGBzZXEoZnJvbSx0bywuLi4pYAogICAgLSBUaGlzIHByb2R1Y2VzIGEgc2VxdWVuY2Ugb2YgbnVtYmVycyBlaXRoZXIgYnkgYSBnaXZlbiBpbmNyZW1lbnQgb3IgZXZlbmx5IHNwYWNlZCB0byBhIGdpdmVuIGxlbmd0aC4KICAgICAgICAtIGBzZXEoMSwxMCxieT0wLjUpYAogICAgICAgIC0gYHNlcSgxLDEwLGxlbmd0aD0xMSlgCi0gYHJlcCh4LC4uLilgCiAgICAtIFRoaXMgcHJvZHVjZXMgYSB2ZWN0b3Igb2YgcmVwZXRpdGlvbnMgb2YgeCBieSBhIGdpdmVuIG51bWJlciBvZiB0aW1lcy4KICAgICAgICAtIGByZXAoMSw2KWAKICAgICAgICAtIGByZXAoMTozLDIpYAogICAgICAgIC0gYHJlcCgiaGVsbG8gd29ybGQiLDQpYAoKIyMjIFZlY3RvciBBcml0aG1ldGljCgojIyMjIFZlY3RvciBhbmQgQSBOdW1iZXIKCkEgcHJldHR5IGNvb2wgYW5kIHVuaXF1ZSBmZWF0dXJlIG9mIFIgaXMgaG93IHlvdSBjYW4gZG8gYXJpdGhtZXRpYyB3aXRoIHZlY3RvcnMuIEZvciBleGFtcGxlLCBsZXQncyBzYXkgeW91J3ZlIGludGVydmlld2VkIGEgYnVuY2ggb2Ygc3BlYWtlcnMgb2YgdGhlIGZvbGxvd2luZyBhZ2VzCgpgYGB7cn0KYWdlcyA8LSBjKDE4LCAzNSwgNDEsIDYyKQpgYGAKCklmIHlvdSB3YW50ZWQgdG8ga25vdyB0aGUgeWVhciBvZiBiaXJ0aCBvZiB0aGVzZSBzcGVha2VycywgaXQncyBhcyBlYXN5IGFzOgoKYGBge3J9CjIwMTcgLSBhZ2VzCmBgYAoKUiBoYXMgdGFrZW4gZWFjaCB2YWx1ZSBpbiBgYWdlc2AsIGFuZCBzdWJ0cmFjdGVkIGl0IGZyb20gYDIwMTdgLCBhbmQgY3JlYXRlZCBhIG5ldyB2ZWN0b3Igd2l0aCB0aGUgcmVzdWx0cy4gCgoKT3IsIGlmIHlvdSB3YW50ZWQgdG8ga25vdyBpbiB3aGljaCB5ZWFyIHRoZXNlIHNwZWFrZXJzIHR1cm5lZCAxNywgaXQncyBhcyBlYXN5IGFzOgoKYGBge3J9CigyMDE3IC0gYWdlcykgKyAxNwpgYGAKCiMjIyMgVmVjdG9yIGFuZCBhIFZlY3RvcgoKT3IsIGxldCdzIHNheSB0aGVzZSBzcGVha2VycyB3ZXJlbid0IGFsbCBpbnRlcnZpZXdlZCB0aGUgc2FtZSB5ZWFyLiBIYWxmIHdlcmUgaW50ZXJ2aWV3ZWQgaW4gdGhlIDkwcywgYW5kIGhhbGYgaW4gdGhlIDIwMDBzLgoKYGBge3J9CmludGVydmlld195ZWFyIDwtIGMoMTk5NSwgMTk5NiwgMjAwMywgMjAwNCkKYGBgCgpHZXR0aW5nIGVhY2ggc3BlYWtlcidzIGRhdGUgb2YgYmlydGggaXMgYXMgc2ltcGxlIGFzOgoKYGBge3J9CmludGVydmlld195ZWFyIC0gYWdlcwpgYGAKClRoaXMgd29ya2VkIGJlY2F1c2UgdGhlIHR3byB2ZWN0b3JzLCBgaW50ZXJ2aWV3X3llYXJgIGFuZCBgYWdlc2Agd2VyZSB0aGUgc2FtZSBsZW5ndGguIFIgdG9vayB0aGUgZmlyc3QgdmFsdWVzIG9mIGBhZ2VgIGFuZCBzdWJ0cmFjdGVkIGl0IGZyb20gdGhlIGZpcnN0IHZhbHVlIG9mIGBpbnRlcnZpZXdfeWVhcmAsIHRoZSBzZWNvbmQgdmFsdWUgb2YgYGFnZWAgYW5kIHN1YnRyYWN0ZWQgaXQgZnJvbSB0aGUgc2Vjb25kIHZhbHVlIG9mIGBpbnRlcnZpZXdfeWVhcmAsIGV0YywgY3JlYXRpbmcgbmV3IHZlY3RvciBvZiB0aGUgcmVzdWx0LiBZb3UgY291bGQgZWFzaWxseSBhc3NpZ24gdGhpcyBvdXRwdXQgdG8gYSBuZXcgdmFyaWFibGUuCgpgYGB7cn0KZG9iIDwtIGludGVydmlld195ZWFyIC0gYWdlcwpgYGAKCgoKT2YgY291cnNlLCBpZiB5b3Ugbm93IHdhbnRlZCB0byBrbm93IHdoYXQgeWVhciB0aGVzZSBzcGVha2VycyB0dXJuZWQgMTcsIHlvdSBjb3VsZCBkbyBpdCBsaWtlIHNvOgoKYGBge3J9CihpbnRlcnZpZXdfeWVhciAtIGFnZXMpICsgMTcKYGBgCgoKCjxkaXYgY2xhc3MgPSAiYm94IGJyZWFrIj4KPHNwYW4gY2xhc3MgPSAiYmlnLWxhYmVsIj5+NSBNaW51dGUgQWN0aXZpdHk8L3NwYW4+CgpBIFN0YXJidWNrcyBHcmFuZGUgZmlsdGVyIGNvZmZlZSBpbiB0aGUgVUsgY3VycmVudGx5IGNvc3RzIMKjMS44NS4gVGhlIHZhbHVlIG9mIMKjMSBiZWZvcmUgdGhlIEJyZXhpdCB2b3RlIHdhcyBhYm91dCBcJDEuNDkuIEFmdGVyIHRoZSB2b3RlLCBpdCBkcm9wcGVkIGRvd24gdG8gYWJvdXQgXCQxLjMxLCBhbmQgbGF0ZWx5IGl0J3MgYmVlbiBjbG9zZXIgdG8gXCQxLjI3LgoKVXNpbmcgdmVjdG9yIGFyaXRobWV0aWMgYXMgbXVjaCBhcyBwb3NzaWJsZSwgZmluZCBvdXQgaG93IHRoZSB2YWx1ZSBpbiBkb2xsYXJzIG9mIG15IGNvZmZlZSBoYXMgY2hhbmdlZC4KCjwvZGl2PgoKCgoKCiMjIEluZGV4aW5nCgpJZiB5b3UgaGF2ZSBhIGJ1bmNoIG9mIHZhbHVlcyBzdG9yZWQgaW4gYSB2ZWNvciwgYW5kIHlvdSB3YW50IHRvIHB1bGwgb3V0IHNwZWNpZmljIG9uZXMsIHlvdSBjYW4gZG8gc28gYnkgaW5kZXhpbmcgaXQgd2l0aCBzcXVhcmUgYnJhY2tldHMgYFtdYC4gCgoKCiMjIyBJbmRleGluZyBieSBQb3NpdGlvbgpMZXQncyBzdGFydCBieSBpbmRleGluZyBieSBwb3NpdGlvbi4KCjxkaXYgc3R5bGU9ImZvbnQtZmFtaWx5Om1vbm9zcGFjZTtmb250LXNpemU6eHgtbGFyZ2U7dGV4dC1hbGlnbjpjZW50ZXI7Ij4KPHNwYW4gc3R5bGU9ImNvbG9yOiM3NDc0NzQiPnZlY3Rvcjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5bPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5wb3NpdGlvbjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5dPC9zcGFuPgo8L2Rpdj4KClIgaGFzIHNvbWUgYnVpbHQgaW4gdmVjdG9ycyBmb3IgeW91IHRvIHVzZSwgbGlrZSBvbmUgY2FsbGVkIGBsZXR0ZXJzYC4gV2UgaGF2ZW4ndCBkZWZpbmVkIGBsZXR0ZXJzYCwgYW5kIGl0J3Mgbm90IGxpc3RlZCBhcyBiZWluZyBpbiB5b3VyIFIgZW52aXJvbm1lbnQsIGJ1dCBpdCdzIHRoZXJlLgoKYGBge3J9CmxldHRlcnMKYGBgCgpUaGUgZmlyc3QgdmFsdWUgaW4gYSB2ZWN0b3IgaGFzIGluZGV4IGAxYCwgdGhlIHNlY29uZCBpbmRleCBgMmAsIGFuZCBzbyBvbi4KSWYgeW91J3ZlIGZvcmdvdHRlbiB3aGF0IHRoZSAxOXRoIGxldHRlciBvZiB0aGUgYWxwaGFiZXQgaXMsIHlvdSBjYW4gZmluZCBpdCBvdXQgbGlrZSBzbzoKCmBgYHtyfQpsZXR0ZXJzWzE5XQpgYGAKCklmIGluc3RlYWQgb2YganVzdCBvbmUgbnVtYmVyLCB5b3UgdXNlIGFub3RoZXIgdmVjdG9yIHRvIGluZGV4IGBsZXR0ZXJzYCwgeW91J2xsIGdldCBiYWNrIG91dCBhbm90aGVyIHZlY3Rvci4KCmBgYHtyfQp5ZXMgPC0gYygyNSwgNSwgMTkpCmxldHRlcnNbeWVzXQoKYWJiYSA8LSBjKDEsIDIsIDIsIDEpCmxldHRlcnNbYWJiYV0KYGBgCgojIyMgTG9naWNhbCBJbmRleGluZwoKWW91IGNhbiBhbHNvIGluZGV4IGJ5IGxvZ2ljYWwgdmFsdWVzLiAKCjxkaXYgc3R5bGU9ImZvbnQtZmFtaWx5Om1vbm9zcGFjZTtmb250LXNpemU6eHgtbGFyZ2U7dGV4dC1hbGlnbjpjZW50ZXI7Ij4KPHNwYW4gc3R5bGU9ImNvbG9yOiM3NDc0NzQiPnZlY3Rvcjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5bPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij50cnVlIGZhbHNlIHZlY3Rvcjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5dPC9zcGFuPgo8L2Rpdj4KCkxldCdzIGNvbWUgYmFjayB0byBvdXIgdmVjdG9yIG9mIHNwZWFrZXIncyBhZ2VzCgpgYGB7cn0KYWdlcwpgYGAKCklmIHdlIG1ha2UgYW5vdGhlciB2ZWN0b3Igb2YgYFRSVUVgIGFuZCBgRkFMU0VgIHZhbHVlcyBvZiB0aGUgc2FtZSBsZW5ndGgsIHdlIGNhbiB1c2UgaXQgdG8gaW5kZXggYHRlc3RfdmVjYC4KCmBgYHtyfQpsb2dpY2FsX3ZlYyA8LSBjKFQsIEYsIFQsIEYpCmFnZXNbbG9naWNhbF92ZWNdCmBgYAoKWW91IG9ubHkgZ2V0IGJhY2sgdmFsdWVzIHdoZXJlIHRoZSBpbmRleCB2ZWN0b3Igd2FzIGBUUlVFYC4KCk9mIGNvdXJzZSwgd2hhdCB5b3UnbGwgdXN1YWxseSBkbyBpcyBnZW5lcmF0ZSBhIHZlY3RvcmUgb2YgYFRSVUVgIGFuZCBgRkFMU0VgIHZhbHVlcyBieSB1c2luZyBhIGxvZ2ljYWwgb3BlcmF0b3IuCgpgYGB7cn0KYWdlcyA+IDQwCmFnZXNbYWdlcyA+IDQwXQpgYGAKCgo8ZGl2IGNsYXNzID0gImJveCBicmVhayI+CjxzcGFuIGNsYXNzID0gImJpZy1sYWJlbCI+fjIgTWludXRlIEFjdGl2aXR5PC9zcGFuPgoKTGV0J3MgYXNzdW1lIG91ciBzcGVha2VycyBoYWQgdGhlIGZvbGxvd2luZyBuYW1lczoKCmBgYHtyfQpzcGVha2VyX25hbWVzIDwtIGMoIkNoYXJsaWUiLCAiU2t5bGVyIiwgIlNhd3llciIsICJKYW1pZSIpCmBgYAoKVXNpbmcgbG9naWNhbCBpbmRleGluZyBhbmQgdGhlIGFnZXMgaW4gYGFnZXNgIGFuZCB5ZWFyIG9mIGludGVydmlldyBpbiBgaW50ZXJ2aWV3X3llYXJgIChvciBqdXN0IGBkb2JgLCBpZiB5b3UgYXNzaWduZWQgYW55dGhpbmcgdG8gdGhhdCB2YXJpYWJsZSksIGZpbmQgb3V0IHdoaWNoIHNwZWFrZXJzIHdlcmUgYm9ybiBhZnRlciAxOTYwLgoKPC9kaXY+CgojIyBMb2dpY2FsIE9wZXJhdG9ycwoKVGhlIGZvbGxvd2luZyBvcGVyYXRvcnMgd2lsbCByZXR1cm4gYSB2ZWN0b3Igb2YgYFRSVUVgIGFuZCBgRkFMU0VgIHZhbHVlcy4KCjxkaXYgc3R5bGU9IndpZHRoPTUwJSI+CnxPcGVyYXRvciB8IE1lYW5pbmcgfAp8Oi0tLXw6LS0tfAp8YD09YCB8IGV4YWN0bHkgZXF1YWwgdG98CnxgIT1gIHwgbm90IGVxdWFsIHRvIHwKfGA+YHwgZ3JlYXRlciB0aGFufAp8YDxgIHwgbGVzcyB0aGFuIHwKfGA+PWAgfCBncmVhdGVyIHRoYW4gb3IgZXF1YWwgdG98CnxgPGB8IGxlc3MgdGhhbiB8CnxgPD1gIHwgbGVzcyB0aGFuIG9yIGVxdWFsIHRvfAo8L2Rpdj4KCllvdSBjYW4gdXNlIHRoZXNlIHRvIGNvbXBhcmUgdmVjdG9ycyB0byBzaW5nbGUgdmFsdWVzLCBhcyB3ZSd2ZSBzZWVuLCBidXQgeW91IGNhbiBhbHNvIGNvbXBhcmUgdmVjdG9ycyB0byB2ZWN0b3JzICppZiB0aGV5IGFyZSB0aGUgc2FtZSBsZW5ndGgqLiBDb21wYXJpc29uIGlzIGRvbmUgZWxlbWVudHdpc2UuCgpgYGB7cn0KZ3JvdXBfYSA8LSBjKDIwLCAxMCwgMTMsIDYwKQpncm91cF9iIDwtIGMoMTEsIDMxLCAgMiwgIDkpCgpncm91cF9hIDwgZ3JvdXBfYgpgYGAKCgpUaGVyZSBhcmUgdGhyZWUgbW9yZSBvcGVyYXRvcnMgdGhhdCBoYXZlIGFuIGVmZmVjdCBvbiBgVFJVRWAgYW5kIGBGQUxTRWAgdmVjdG9ycy4KCjxkaXYgc3R5bGU9IndpZHRoOjUwJSI+CnxPcGVyYXRvciB8IE1lYW5pbmcgfAp8Oi0tLXw6LS0tfAp8YCFgIHwgbm90IHggPGJyPiBjaGFuZ2VzIGFsbCBgVGAgdG8gYEZgIGFuZCBgRmAgdG8gYFRgfAp8YHxgIHwgeCBvciB5IHwKfGAmYHwgeCBhbmQgeXwKPC9kaXY+CgpgYGB7cn0KeCA8LSBjKFQsIFQsIEYsIEYpCnkgPC0gYyhULCBGLCBULCBGKQpgYGAKCmBgYHtyfQpjYmluZCgKICB4ID0geCwKICB5ID0geSwKICBhbmQgPSB4JnksIAogIG9yID0geHx5CikKYGBgCgojIyBgJWluJWAKClRoaXMgZ2V0cyBpdHMgb3duIGhlYWRpbmcgYmVjYXVzZSBpdCdzIHNvIHVzZWZ1bCwgYW5kIHlvdSdsbCB1c2UgaXQgYSBsb3QuIElmIHlvdSBzYXkgYGEgJWluJSBiYCwgUiBjaGVja3MgZXZlcnkgdmFsdWUgaW4gYGFgIHRvIHNlZSBpZiBpdCdzIGluIGBiYC4KCjxkaXYgc3R5bGU9ImZvbnQtZmFtaWx5Om1vbm9zcGFjZTtmb250LXNpemU6eHgtbGFyZ2U7dGV4dC1hbGlnbjpjZW50ZXI7Ij4KPHNwYW4gc3R5bGU9ImNvbG9yOiM3NDc0NzQiPnZhbHVlPC9zcGFuPiA8c3BhbiBzdHlsZT0iY29sb3I6cmVkIj4laW4lPC9zcGFuPiA8c3BhbiBzdHlsZT0iY29sb3I6Izc0NzQ3NCI+dmVjdG9yPC9zcGFuPgo8L2Rpdj4KCmBgYHtyfQojIFdhcyBTYWdlIGluIG91ciBzdHVkeT8KCiJTYWdlIiAlaW4lIHNwZWFrZXJfbmFtZXMKYGBgCgpgYGB7cn0KIyBXYXMgU2NodXlsZXIgaW4gb3VyIHN0dWR5PwoKIlNjaHV5bGVyIiAlaW4lIHNwZWFrZXJfbmFtZXMKCiMgWWVzLCBidXQgbm90IHNwZWxsZWQgdGhhdCB3YXkuCgoiU2t5bGVyIiAlaW4lIHNwZWFrZXJfbmFtZXMKYGBgCgpUaGUgZmlyc3QgaXRlbSBjYW4gYWxzbyBiZSBhIHZlY3Rvci4KCmBgYHtyfQojIEhvdyBhYm91dCBhbGwgb2YgdGhlc2UgcGVvcGxlPwoKY2hlY2tfbmFtZXMgPC0gYygiT2FrbGV5IiwgIkNoYXJsaWUiLCAiQXphcmlhIiwgIkxhbmRyeSIsICJTa3lsZXIiLCAiSnVzdGljZSIpCmNoZWNrX25hbWVzICVpbiUgc3BlYWtlcl9uYW1lcwpjaGVja19uYW1lc1tjaGVja19uYW1lcyAlaW4lIHNwZWFrZXJfbmFtZXNdCmNoZWNrX25hbWVzWyEoY2hlY2tfbmFtZXMgJWluJSBzcGVha2VyX25hbWVzKV0KYGBgCgoKCjxociAvPgoKPCEtLS0gaHR0cDovL3d3dy5iYWJ5bmFtZXMxMDAwLmNvbS9nZW5kZXItbmV1dHJhbC8gLS0+
- - -
-
- -
- - - - - - - - diff --git a/teaching/courses/2017_lsa/lectures/Session_1.nb.qmd b/teaching/courses/2017_lsa/lectures/Session_1.nb.qmd new file mode 100755 index 0000000..f7f1e62 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/Session_1.nb.qmd @@ -0,0 +1,624 @@ +--- +title: "Introduction to R" +image: figures/cran_package.png +order: 1 +knitr: + opts_chunk: + error: true + warning: false +--- + +## Hellos + +Welcome to *Statistical Modelling with R*. If there is one thing to remember from this course, it is that your analysis workflow should look something like this: + +![](figures/workflow.svg){fig-align="center"} + +------------------------------------------------------------------------ + +## The process of learning R and Modelling + +These are some of the core areas I figure are necessary to getting good at statistical modelling in R: + +1. Using R (and RStudio) well +2. Feeling comfortable and fluid reorganizing and summarizing data +3. **Visualizing Data** +4. Deciding before you model what you want to compare to what +5. How to translate your analysis goals into R code +6. Understanding a little bit about statistics +7. When something goes wrong, being able to accurately attribute your difficulty to one of the above topics + +These are all skills you can achieve through practice, experience, and occasional guidance from someone more skilled than you. It is exactly like acquiring any other skill or craft. At first it will be confusing, you'll make some mistakes, and it won't look so good. I think + +::: {layout="[45,-10,45]"} +![The first hat I ever knit](figures/firsthat.jpg) + +![The most recent hat I knit](figures/lasthat.jpg) +::: + +The way I improved my knitting is exactly the same as how you can improve your R programming ability: + +- I knit a lot (almost every day). +- I memorized a bunch of stuff. +- Remembered where to look up the stuff I don't have memorized. +- My knitting became more "idiomatic" (i.e. I started knitting like how other knitters knit). +- I learned how to identify and fix mistakes without undoing my entire project. +- I developed good workspace hygiene & organization. +- As I got the basics down, I started researching and incorporating fussy little details into my work. + +Most of the content of the course is devoted to core R programming (things you should be memorizing or remembering where to find help), but I'll try my best to annotate portions of the notes which correspond to workspace hygiene, being idiomatic, etc, so that you can distinguish between them. + +------------------------------------------------------------------------ + +## Course Outline + +The course will follow the workflow outlined at the beginning: `begin → summarize → visualize → analyze`. + +| Week | Monday | Thursday | +|------------------:|:-------------------------:|:------------------------:| +| 1 | -- | Intro - Basics & R Notebooks | +| 2 | Data Frames & Factors | Split-Apply-Combine, Reshaping | +| 3 | ggplot2 | Fitting Linear Models | +| 4 | map functions & fitting many models | Mixed Effects Linear models | +| 5 | Bootstraps & Visualization | -- | + +::: callout-tip +## Workspace Hygiene + +### Recommended Course Directory Structure + +If you have a directory planning structure that you're happy with, go ahead and do that. But if how to organize your R analysis life is something you'd like to get out of this course, I'd recommend the following directory structure & naming conventions. + +``` +├── lsa_2017 +│   └── r_modelling* +│   ├── assignments +│   ├── data +│   └── lectures + +``` + +The r_modelling directory will be the home directory for the course. I would recommend creating a new R Notebook for each lecture (more on that in a moment) and giving them a naming convention like: + +``` +01_lecture.Rmd +02_lecture.Rmd +``` + +Right now **eliminate the impulse to create any folders or file names with spaces in them**. +::: + +------------------------------------------------------------------------ + +## R, RStudio and R Notebooks + +We're going to be using R, RStudio, and R Notebooks in this course, and it's a little important to keep straight what these three things are: + +### R + +**R** is a programming language that runs on your computer. At its barest bones, it looks like this: + +![](figures/2__R.png){fig-align="center" width="80%"} + +You can type text into the prompt there, and if you've successfully memorized the right R commands, it'll do some things. + +### RStudio + +**RStudio** is like an Instagram filter over to of R, to make your R use experience better. It visually organizes some important components of using R into panes, and offers *code completion* suggestions. For example, if you ember there's something called a "Wilcoxon test", but you don't remember what the function in R is, you can start typing in `Wilc`, and this will happen: + +![](figures/codeCompletion.png){fig-align="center" width="80%"} + +RStudio's autocompletion is really useful for a lot of other things, like reminding you what the column names are in your data frame, what the names of all the arguments to a function are, etc. + +But perhaps the most valuable component in R Studio these days is its authoring tools, like R Notebooks + +### R Notebooks + +R Notebooks allow you to document your code in plain text, insert R Code chunks, and view the results of the R code all in one place, then compile it into a nice looking notebook. + +::: callout-note +## \~5 Minute Activity + +#### Goals + +1. Start a new RStudio Project. +2. Create a new R Notebook. +3. Run some code in the R Notebook. +4. Preview the R Notebook in HTML + +#### Start a new RStudio Project + +Create a new RStudio Project, either by using the menu options `File > New Project` or by clicking on the icon in the top right corner of the RStudio window. If you have created directory structure above choose *Existing Directory* and choose `r_modelling`. Otherwise, select the options *New Directory* then *Empty Project* and tell it the project name is `r_modelling` + +#### Create a new R Notebook + +Open a new R Notebook using the menu command `File > New File > R Notebook`. If this is the first time you've opened an R Notebook on your computer, you'll probably be faced with the following prompt: + +::: half-img +![](figures/install.png) +::: + +Click "Yes", and wait for the installation to finish. A window with a bunch of gobbledygook will pop up, and that's fine. Once it's all finished, the new file should open. + +#### Run some code in the R Notebook + +First, run the R code chunk that comes automatically in a new R Notebook by clicking on the green "play" button in the top right corner of the code chunk. + +Next, insert a new R code chunk at the bottom of the notebook (directions for how to do so are already included in the new R Notebook), and inside, enter: + +``` +"Hello World" +``` + +Then run this code chunk by clicking the play button. + +#### Preview the R Notebook in HTML + +Click the "Preview" button at the top of the R Notebook panel to compile it into an HTML document. You will need to save the notebook first. In the `lectures` folder, save it as `00_practice.Rmd` +::: + +### Discussion + +I'm going to recommend (for now at least) that you run all of your code though an R Notebook. It is possible to just type things into the R console, but that's kind of like dictating a paper into thin air. Once you've spoken the words, they disappear and can be hard to recover. + +My earlier advice would have been to write all of your code in an R script file, but that also separates the code from its results, which can be hard for beginners to keep track of. + +------------------------------------------------------------------------ + +## Installing R Packages + +R comes with a lot of functionality installed, but one way that R is extensible is through users' ability to contribute new code & data through it's package management system. We're going to using a number of these packages in the course, especially since a few of them have fundamentally changed the way R programming works in the past 3 years. There's also a course R package I've created to easily distribute sample datasets. + +Here's a basic diagram of how R packages work: + +![](figures/cran_package.png){fig-align="center" width="100%"} + +### Installing Packages + +#### `install.packages()` + +Most R packages are distributed through CRAN (Comprehensive R Archive Network). When you run function `install.packages("x")`, R checks whether the package `"x"` exists on CRAN, and installs it on your computer if it does. You maybe asked to choose a "CRAN mirror" the first time you run `install.packages()`. This is because there are many copies of CRAN distributed across the internet. I'd recommend choosing the first option called `0-Cloud`. + +#### `install_github()` + +As a package developer, getting a package onto CRAN can be a bit of a pain, so some packages (and development versions of many) are also available on GitHub, which can be easily installed with `devtools::install_github("username/package")`. + +### Installing packages is different from loading packages + +**Installing** a package is different from **loading** packages. Installing a package only downloads and configures the code on your computer. In order to *use* the contents of a package, you need to load it into your R session with `library()`. + +- You only need to run `install.packages()` once to install a package, or to update a package. +- You need to run `library()` at the start of every new R session in order to use the functionality from that package. + +For example, `ggplot()` is a function from the package `ggplot2`. I have already installed `ggplot2` on my computer, but if I try to use `ggplot()` before loading the package with `library()`, I'll get the error that the function was not found. + +```{r} +foo <- ggplot() +``` + +```{r} +library("ggplot2") +foo <- ggplot() +``` + +::: callout-note +## \~2 minute activity + +Let's install all of the packages we're going to use in the course. Double check that you're connected to the internet. + +Create a notebook for this lecture called `01_lecture.Rmd`. Copy-paste the following into an R code chunk and run it: + +```{r} +#| eval: false +install.packages( + c("tidyverse", + "devtools") +) + +library("devtools") + +install_github("jofrhwld/lsa2017") +``` +::: + +------------------------------------------------------------------------ + +## R Basics + +We're now going to run through some very basics of R, specifically: + +- Basic Data Types +- Basic Calculations +- Assignment +- Vectors +- Indexing + +Create a new R Notebook. Change the `Title` field to `Intro to R`, and save it as `0_lecture.Rmd` in the folder `lectures`. + +As we come to a code chunk in the lecture, either copy-paste or re-type it into a new code chunk in your lecture R notebook, and run it. + +### Basic Calculations + +One way to think of R is as an overblown calculator. + +```{r} +3+3 +2*4 +(369-1)/6 +``` + +But it's not all that useful to do a bunch of calculations without saving the results for later, which is where assignment comes in. + +### Assignment + +You can assign **values** to **variables** using the assignment operator: `<-` or `->` (but in practice, only use `<-`). + +::: {style="font-family:monospace;font-size:xx-large;text-align:center;"} +[variable]{style="color:#747474"} [\<-]{style="color:red"} [value]{style="color:#747474"} +::: + +```{r} +x <- 10 +y <- 2*3 +``` + +Once you've assigned a value to a variable, you can reuse the value stored in that variable for other purposes, like just printing it out again + +```{r} +x +y +``` + +Or adding the two values together + +```{r} +x + y +``` + +In short, you can use these variables `x` and `y` like they *are* the values you assigned to them. If this is your first time programming, here are a few things to clarify: + +**Note** + +- `x` and `y` didn't exist before you created them by assigning values to them. +- You could have chosen *almost* any name for these variables. +- You can just as easily assign *new* values to these variables. + +::: callout-tip +## Naming Things + +`x` and `y` are lousy names for variables. When it comes to naming variables, there's a famous saying: + +> "There are only two hard things in Computer Science: cache invalidation and naming things." --- Phil Karlton + +For best practices on naming variables, I'll refer you to [the tidyverse style guide by Hadley Wickham](http://style.tidyverse.org/syntax.html#object-names). To briefly summarize it: + +- Use only lowercase letters and numbers. +- Use `_` to separate words in a a variable name. +- You're actually not able to start a variable name with a number. + +Also, be guided by The Principle of Least Effort. Use the minimal amount of characters that are still clearly interpretable. + +``` +# Good Names +model_1 +model_full + + +# Bad Names +the_first_model_I_ever_fit +just_trying_out_a_model_with_all_predictors +m_01 +m_agdf +``` + +Also, just use good judgment. There is nothing in R preventing you from doing stuff like this to yourself. + +```{r} +five <- 10 +ten <- 5 + +yellow <- "green" +``` +::: + +Another thing to keep in mind is that R can't handle any other characters in numeric values other than `0` through `9` and decimal places. All of these will fail: + +```{r} +# no commas +thousand <- 1,000 +``` + +```{r} +# no spaces +thousand <- 1 000 +``` + +```{r} +# like this +thousand <- 1000 +``` + +```{r} +# no currencies +dollars <- $1000 +``` + +```{r} +# no percentages + +percent <- 51% +``` + +#### Additional data types + +In addition to numbers, other basic data types in R are **character** and **logical**. + +```{r} +# character data +digital_words <- c("fam", + "Harambe", + "tweetstorm", + "@") +``` + +```{r} +# logical values +TRUE + +# a logical test +(10/2) < 3 +``` + +#### On using quotes + +When you enter characters *without* quotes around them, R assumes you're referring to a variable. If you tried to do the assignment above without the quotes, you'll get an error. + +```{r} +digital_words_fail <- c(fam, + Harambe, + tweetstorm) +``` + +Here, R saw `fam`, which isn't in quotes, searched the environment for any variables named `fam` and couldn't find any. + +When you put characters in quotes, R assumes it's a character value, *even if there's a variable by the same name*. + +```{r} +digital_words +"digital_words" +``` + +### Vectors + +Vectors are essentially lists of data, and can contain characters, numbers, or TRUE FALSE values. There are a number of ways to create vectors in R, and frequently doing data manipulation will produce subvectors of data. + +- `1:10` + - This produces a vector of integers from 1 to 10. Reversing the order of the numbers will produce a vector of decreasing values. +- `c(...)` + - This produces a vector of whatever is passed as an argument to `c()`. + - `c(1,2,3,4)` +- `seq(from,to,...)` + - This produces a sequence of numbers either by a given increment or evenly spaced to a given length. + - `seq(1,10,by=0.5)` + - `seq(1,10,length=11)` +- `rep(x,...)` + - This produces a vector of repetitions of x by a given number of times. + - `rep(1,6)` + - `rep(1:3,2)` + - `rep("hello world",4)` + +#### Vector Arithmetic + +##### Vector and A Number + +A pretty cool and unique feature of R is how you can do arithmetic with vectors. For example, let's say you've interviewed a bunch of speakers of the following ages + +```{r} +ages <- c(18, 35, 41, 62) +``` + +If you wanted to know the year of birth of these speakers, it's as easy as: + +```{r} +2017 - ages +``` + +R has taken each value in `ages`, and subtracted it from `2017`, and created a new vector with the results. + +Or, if you wanted to know in which year these speakers turned 17, it's as easy as: + +```{r} +(2017 - ages) + 17 +``` + +##### Vector and a Vector + +Or, let's say these speakers weren't all interviewed the same year. Half were interviewed in the 90s, and half in the 2000s. + +```{r} +interview_year <- c(1995, 1996, 2003, 2004) +``` + +Getting each speaker's date of birth is as simple as: + +```{r} +interview_year - ages +``` + +This worked because the two vectors, `interview_year` and `ages` were the same length. R took the first values of `age` and subtracted it from the first value of `interview_year`, the second value of `age` and subtracted it from the second value of `interview_year`, etc, creating new vector of the result. You could easily assign this output to a new variable. + +```{r} +dob <- interview_year - ages +``` + +Of course, if you now wanted to know what year these speakers turned 17, you could do it like so: + +```{r} +(interview_year - ages) + 17 +``` + +::: callout-note +## \~5 minute activity + +A Starbucks Grande filter coffee in the UK currently costs £1.85. The value of £1 before the Brexit vote was about \$1.49. After the vote, it dropped down to about \$1.31, and lately it's been closer to \$1.27. + +Using vector arithmetic as much as possible, find out how the value in dollars of my coffee has changed. +::: + +### Indexing + +If you have a bunch of values stored in a vector, and you want to pull out specific ones, you can do so by indexing it with square brackets `[]`. + +#### Indexing by Position + +Let's start by indexing by position. + +::: {style="font-family:monospace;font-size:xx-large;text-align:center;"} +[vector]{style="color:#747474"}[\[]{style="color:red"}[position]{style="color:#747474"}[\]]{style="color:red"} +::: + +R has some built in vectors for you to use, like one called `letters`. We haven't defined `letters`, and it's not listed as being in your R environment, but it's there. + +```{r} +letters +``` + +The first value in a vector has index `1`, the second index `2`, and so on. If you've forgotten what the 19th letter of the alphabet is, you can find it out like so: + +```{r} +letters[19] +``` + +If instead of just one number, you use another vector to index `letters`, you'll get back out another vector. + +```{r} +yes <- c(25, 5, 19) +letters[yes] + +abba <- c(1, 2, 2, 1) +letters[abba] +``` + +#### Logical Indexing + +You can also index by logical values. + +::: {style="font-family:monospace;font-size:xx-large;text-align:center;"} +[vector]{style="color:#747474"}[\[]{style="color:red"}[true false vector]{style="color:#747474"}[\]]{style="color:red"} +::: + +Let's come back to our vector of speaker's ages + +```{r} +ages +``` + +If we make another vector of `TRUE` and `FALSE` values of the same length, we can use it to index `test_vec`. + +```{r} +logical_vec <- c(T, F, T, F) +ages[logical_vec] +``` + +You only get back values where the index vector was `TRUE`. + +Of course, what you'll usually do is generate a vector of `TRUE` and `FALSE` values by using a logical operator. + +```{r} +ages > 40 +ages[ages > 40] +``` + +::: callout-note +## \~2 minute activity + +Let's assume our speakers had the following names: + +```{r} +speaker_names <- c("Charlie", "Skyler", "Sawyer", "Jamie") +``` + +Using logical indexing and the ages in `ages` and year of interview in `interview_year` (or just `dob`, if you assigned anything to that variable), find out which speakers were born after 1960. +::: + +### Logical Operators + +The following operators will return a vector of `TRUE` and `FALSE` values. + +| Operator | Meaning | +|----------|--------------------------| +| `==` | exactly equal to | +| `!=` | not equal to | +| `>` | greater than | +| `<` | less than | +| `>=` | greater than or equal to | +| `<` | less than | +| `<=` | less than or equal to | + +You can use these to compare vectors to single values, as we've seen, but you can also compare vectors to vectors *if they are the same length*. Comparison is done elementwise. + +```{r} +group_a <- c(20, 10, 13, 60) +group_b <- c(11, 31, 2, 9) + +group_a < group_b +``` + +There are three more operators that have an effect on `TRUE` and `FALSE` vectors. + +| Operator | Meaning | +|----------|--------------------------------------------------| +| `!` | not x
changes all `T` to `F` and `F` to `T` | +| \| | `x` or `y` | +| `&` | `x` and `y` | + +```{r} +x <- c(T, T, F, F) +y <- c(T, F, T, F) +``` + +```{r} +cbind( + x = x, + y = y, + and = x&y, + or = x|y +) +``` + +### `%in%` + +This gets its own heading because it's so useful, and you'll use it a lot. If you say `a %in% b`, R checks every value in `a` to see if it's in `b`. + +::: {style="font-family:monospace;font-size:xx-large;text-align:center;"} +[value]{style="color:#747474"} [%in%]{style="color:red"} [vector]{style="color:#747474"} +::: + +```{r} +# Was Sage in our study? + +"Sage" %in% speaker_names +``` + +```{r} +# Was Schuyler in our study? + +"Schuyler" %in% speaker_names + +# Yes, but not spelled that way. + +"Skyler" %in% speaker_names +``` + +The first item can also be a vector. + +```{r} +# How about all of these people? + +check_names <- c("Oakley", "Charlie", "Azaria", "Landry", "Skyler", "Justice") +check_names %in% speaker_names +check_names[check_names %in% speaker_names] +check_names[!(check_names %in% speaker_names)] +``` + +------------------------------------------------------------------------ diff --git a/teaching/courses/2017_lsa/lectures/Session_2.nb.html b/teaching/courses/2017_lsa/lectures/Session_2.nb.html deleted file mode 100644 index 40e16a7..0000000 --- a/teaching/courses/2017_lsa/lectures/Session_2.nb.html +++ /dev/null @@ -1,1343 +0,0 @@ - - - - - - - - - - - - - -Data and Data Frames - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - -
-
-
-
-
- -
- - - - - - - - -
-

The Agenda

-
    -
  • Talk about organizing, structuring & storing your data.
  • -
  • Review some important data input/output options in R.
  • -
  • Review about how data frames work in R.
  • -
-
-

Setup

-
-

~2 minute setup

-

Make sure that your current RStudio project is set to your course project. Create and save your R notebook for today (I would recommend 02_lecture.Rmd). Clear the workspace of anything left over from last time with the menu options Session > Clear Workspace.

-

Load the important packages for today’s work:

- - - -
library(lsa2017)
-library(tidyverse)
- - - -
-
-
-
-
-

Data Collection and Storage

-
-

General Principles of Data Collection

-
-

Over-collect (for some things)

-

When collecting data in the first place, over-collect if at all possible or ethical. The world is a very complex place, so there is no way you could cram it all into a bottle, but give it your best shot! If during the course of your data analysis, you find that it would have been really useful to have data on, say, duration, as well as formant frequencies, it becomes costly to recollect that data, especially if you haven’t laid the proper trail for yourself. On the other hand, automation of acoustic analysis or data processing can cut down on this costliness.

-

This doesn’t go for personal information on human subjects, though. It’s important from an ethics standpoint to ask for everything you’ll need, but not more. You don’t want to collect an enormous demographic profile on your participants if you won’t wind up using it, especially if you know you won’t use it to begin with.

-
-
-

Preserve HiD Info

-

If, for instance, you’re collecting data on the effect of voicing on preceding vowel duration, preserve high dimensional data coding, like Lexical Item, or the transcription of the following segment. These high dimensional codings probably won’t be too useful for your immediate analysis, but they will allow you to procedurally extract additional features from them at a later time. For example, if you have a column called fol_seg, which is just a transcription of the following segment, it is easy create a new column called manner with code that looks like this:

- - - -
table(iy_ah$fol_seg)
- - -

-  AA0   AA1   AE1   AH0   AH1   AO1   AY1     B    CH     D    DH   EH1   ER0     F     G 
-    2     1     1   371     2     1    36  1588  1201  1920   507     2     5   124   140 
-   HH   IH0   IH1   IY1    JH     K     L     M     N    NG   OW0   OW2     P     R     S 
-   10   255     1     4   126  3156  2888  1589  5397    26     1     2  4963     1  1479 
-   SH    SP     T    TH     V     W     Y     Z    ZH 
-  217   107 12690    96  2167    13     4  3693    32 
- - - - - - -
iy_ah <- iy_ah %>%
-            mutate(manner = recode(fol_seg, B = 'stop',
-                                            CH = 'affricate',
-                                            D = 'stop',
-                                            DH = 'fricative',
-                                            `F` = 'fricative',
-                                            G = 'stop',
-                                            HH = 'fricative',
-                                            JH = 'affricate',
-                                            K = 'stop',
-                                            L = 'liquid',
-                                            M = 'nasal',
-                                            N = 'nasal',
-                                            NG = 'nasal',
-                                            P = 'stop',
-                                            R = 'liquid',
-                                            S = 'fricative',
-                                            SH = 'fricative',
-                                            SP = 'pause',
-                                            `T` = 'stop',
-                                            TH = 'fricative',
-                                            V = 'fricative',
-                                            W = 'glide',
-                                            Y = 'glide',
-                                            Z = 'fricative',
-                                            ZH = 'fricative',
-                                            .default = 'vowel'))
-table(iy_ah$manner)
- - -

-affricate fricative     glide    liquid     nasal     pause      stop     vowel 
-     1327      8325        17      2889      7012       107     24457       684 
- - - -
-
-

Leave A Trail of Crumbs

-

Be sure to answer this question: How can I preserve a record of this observation in such a way that I can quickly return to it and gather more data on it if necessary? If you fail to successfully answer this question, then you’ll be lost in the woods if you ever want to restudy, and the only way home is to replicate the study from scratch.

-
-
-

Give Meaningful Names

-

Give meaningful names to both the names of predictor columns, as well as to labels of nominal observations. Keeping a readme describing the data is still a good idea, but at least now the data is approachable at first glance.

-
-
-

Distinguish between 0 and NA

-

I have worked with some spreadsheets where missing data was given a value of 0, which will mess things up. For example, /oy/ is a fairly rarely occurring phoneme in English, and it’s possible that a speaker won’t produce any tokens in a short interview. In a spreadsheet of mean F1 and F2 for all vowels, that speaker should get an NA for /oy/, not 0.

-
-
-
-

Storing Data

-

When we store data, it should be:

-
    -
  1. Raw Raw data is the most useful data. It’s impossible to move down to smaller granularity from a coarser, summarized granularity. Summary tables etc. are nice for publishing in a paper document, but raw data is what we need for asking novel research questions with old data.

  2. -
  3. Open formatted Do not use proprietary database software for long term storage of your data. I have enough heard stories about interesting data sets that are no longer accessible for research either because the software they are stored in is defunct, or current versions are not backwards compatible. At that point, your data is property of Microsoft, or whoever. Store your data as raw text, delimited in some way (I prefer tabs).

  4. -
  5. Consistent I think this is most important when you may have data in many separate files. Each file and its headers should be consistently named and formatted. They should be consistently delimited and commented also. There is nothing worse than inconsistent headers and erratic comments, labels, headers or NA characters in a corpus. (Automation also helps here.)

  6. -
  7. Documented Produce a readme describing the data, how it was collected and processed, and describe every variable and its possible values.

  8. -
-
-
-
-
-

Structuring Data

-
-

Breaking Bad Spreadsheet Habits

-

Let’s start off by looking at a picture of a data organization approach that might look familiar, and is a very bad way to do things:

-
- - -
-

This spreadsheet has a fairly strict organizational structure, but is virtuously hopeless for doing any kind of serious statistical analysis. It’s also verging on irreparable using R. This because the data in this spreadsheet is organized to be easy to look at with your eyeballs 👀.

-

But looking at neatly organized data in a spreadsheet is not a statistical analysis technique. So we need to start organizing our data in a way that isn’t easy to look at, but is easy to graph and analyze.

-
-
-

Better Habits

-

Everyone working with data (in R or otherwise) should read Hadley Wickham’s paper on Tidy Data: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html If you are coming off of organizing your data like the picture above, there are a few guidelines not discussed in that paper, namely:

-
-

Follow these rules

-
    -
  1. The first row of the data must be the names of the data columns.
  2. -
  3. All other rows must be the data, and nothing else.
  4. -
  5. You cannot use empty rows or empty columns as visual aids to look at the data.
  6. -
  7. The spreadsheet must not contain any summary cells. No final row called “Average” or final column called “Total”. We can create these in R, and they make data processing more complicated if they’re included in the raw data.
  8. -
-
-
-

Semantics of Data Structure

-

In the semantics of data structure Wickham lays out, there are three important primitives:

-
    -
  1. Variables
  2. -
  3. Values
  4. -
  5. Observations
  6. -
-
-

Defining the primitives

-
-
Variables
-

Variables are the collections of values of interest in data analysis. For example, let’s say you were doing a study on unnormalized vowel space size by just looking at /i:/ and /ɑ/. The variables in that study could be:

-
    -
  • speaker
  • -
  • word
  • -
  • phoneme
  • -
  • duration
  • -
  • F1
  • -
  • F2
  • -
  • word_frequency
  • -
-
-
-
Values
-

Values are, as the name implies, the possible values that each variable can have, for example:

-
    -
  • speaker: "Oakley", "Charlie", "Azaria", ...
  • -
  • word: "street", "thirteen", "not", "got", ...
  • -
  • phoneme: "iy", "ah"
  • -
-
-
-
Observations
-

An observation is the minimal unit across which all variables are collected. For example, in the vowel space study, one observation would be one instance of an uttered vowel for which you record who the speaker was, the word, the duration, F1, F2, etc.

-
-
-
-

Organizing data with these primitives

-

Once you’ve thought through what the variables, values and observations are for your study, the principle of how to organize them is simple:

-
    -
  1. Each variable forms a column.
  2. -
  3. Each observation forms a row.
  4. -
-

For the vowel space size study, you might want to wind up with a plot that looks like this:

- - - -

- - - -

It wouldn’t be uncommon to see the data untidily organized like this:

- - - -
- -
- - - -
-

~5 Minute Activity

-

In small groups, figure out the following:

-
    -
  • What are the variables in the data frame above?
  • -
  • What are the values?
  • -
  • What are the observations?
  • -
  • How should the table above be re-organized?
  • -
- - - - -
-
-
-
-
-
-
-

Data Frames

-

So far we have discussed the following types of values in R:

-
    -
  • numerical
  • -
  • character
  • -
  • logical
  • -
-

And we’ve discussed the following data structures.

-
    -
  • vectors
  • -
-

Here, we’ll cover one new data structure:

-
    -
  • data frames
  • -
-

Data Frames are the data structure we’ll be using the most in R. When you begin thinking about data frames, a useful starting place is to think of them as spreadsheets, with columns and rows (but we’ll eventually abandon spreadsheet thinking). Let’s start out by creating a very simple data frame using the data.frame() function.

- - - -
  pitch <- data.frame(speaker_names = c("Charlie", "Skyler", "Sawyer", "Jamie"),
-                      ages = c(18, 35, 41, 62),
-                      F0 = c(114, 189, 189, 199))
-  pitch
- - -
- -
- - - -
-

Finding your way around

-

The pitch data frame has four rows, and three columns. The rows are just numbered 1 through 4, and the three columns are named speaker_names, ages and F0. To find out how many rows and columns a data frame has, you can use the nrow() and ncol() functions.

- - - -
  nrow(pitch)
- - -
[1] 4
- - -
  ncol(pitch)
- - -
[1] 3
- - - -

Most data frames you’re going to work with have a lot more rows than that. For example, iy_ah is a data frame that is bundled in the lsa2017 package.

- - - -
  nrow(iy_ah)
- - -
[1] 44818
- - - -

That’s too many rows to look at just in the console. One option is to use the head() function, that just prints the first 6 rows.

- - - -
  head(iy_ah)
- - -
- -
- - - -

Another option is to use the summary() function.

- - - -
  summary(iy_ah)
- - -
   idstring              age            sex                 year      years_of_schooling
- Length:44818       Min.   :18.00   Length:44818       Min.   :1973   Length:44818      
- Class :character   1st Qu.:30.00   Class :character   1st Qu.:1980   Class :character  
- Mode  :character   Median :45.00   Mode  :character   Median :1985   Mode  :character  
-                    Mean   :46.69                      Mean   :1989                     
-                    3rd Qu.:65.00                      3rd Qu.:2002                     
-                    Max.   :93.00                      Max.   :2010                     
-    vowel               word                 F1               F2              dur        
- Length:44818       Length:44818       Min.   : 186.6   Min.   : 598.6   Min.   :0.0500  
- Class :character   Class :character   1st Qu.: 398.3   1st Qu.:1355.1   1st Qu.:0.0800  
- Mode  :character   Mode  :character   Median : 528.6   Median :1879.5   Median :0.1100  
-                                       Mean   : 570.6   Mean   :1889.1   Mean   :0.1214  
-                                       3rd Qu.: 732.5   3rd Qu.:2386.7   3rd Qu.:0.1500  
-                                       Max.   :1428.9   Max.   :3690.4   Max.   :0.8700  
-  plt_vclass          pre_seg            fol_seg            context         
- Length:44818       Length:44818       Length:44818       Length:44818      
- Class :character   Class :character   Class :character   Class :character  
- Mode  :character   Mode  :character   Mode  :character   Mode  :character  
-                                                                            
-                                                                            
-                                                                            
-  word_trans             F1_n              F2_n            manner         
- Length:44818       Min.   :-3.0055   Min.   :-2.4533   Length:44818      
- Class :character   1st Qu.:-1.4855   1st Qu.:-0.7021   Class :character  
- Mode  :character   Median :-0.7261   Median : 0.8525   Mode  :character  
-                    Mean   :-0.2476   Mean   : 0.5639                     
-                    3rd Qu.: 1.0521   3rd Qu.: 1.7315                     
-                    Max.   : 4.3707   Max.   : 4.8674                     
- - - -

summary() is a function that works on almost every kind of object.

-
-
-

Indexing Data Frames

-

Since data frames are 2 dimensional (rows are one dimension, columns are another), the way you index them is a little bit more complicated than with vectors. It still uses square brackets, though, but these square brackets have two positions:

-
-

df[row number, column number]

-
-

If you specify a specific row number, but leave the column number blank, you’ll get back that row and all columns.

- - - -
  pitch[1,]
- - -
- -
- - - -

Alternatively, if you specify just the column number, but leave the rows blank, you’ll get back all of the values for that column.

- - - -
  pitch[,2]
- - -
[1] 18 35 41 62
- - - -

When you specify both, you get back the value in the specified row and column

- - - -
  pitch[1,2]
- - -
[1] 18
- - - -

However, there is a special indexing operator for data frames that take advantage of their named columns: $.

-
-

df$column_name

-
- - - -
  pitch$speaker_names
- - -
[1] Charlie Skyler  Sawyer  Jamie  
-Levels: Charlie Jamie Sawyer Skyler
- - - -

After accessing the column of a data frame, you can index it just like it’s a vector.

- - - -
  pitch$speaker_names[1]
- - -
[1] Charlie
-Levels: Charlie Jamie Sawyer Skyler
- - - -

If you really want to, you can do logical indexing of data frames like so:

- - - -
  pitch[pitch$speaker_names == "Charlie", ]
- - -
- -
- - - -

But there’s also a function called filter() that you can use to do the same thing. filter() takes a data frame as its first argument, and then a logical statement referring to one or more of the data frame’s columns.

- - - -
  filter(pitch, speaker_names == "Charlie")
- - -
- -
- - -
  filter(pitch, ages > 18, F0 > 190)
- - -
- -
- - - -
-

~5 Minute Activity

-

First, review the documentation of the iy_ah data set with ?iy_ah. Using filter() and nrow(), find out what percent of /i:/ tokens have a duration less than 90ms (0.09s).

-
-
-
-
-
-

Reading Data into R

-

R can easily read comma-separated (.csv) files and tab-delimited files into its memory.1 You can read them in with read.csv() and read.delim(), respectively. If your data is unavoidably in an Excel spreadsheet, there is a package called readxl with a function called read_excel() If you have the readxl package installed, I strongly recommend reading over its documentation on sheet geometry by calling up the vignette like so:

- - - -
vignette("sheet-geometry", package = "readxl")
- - - -

Last Minute Update: There is also package for reading data in from google spreadsheets https://github.com/jennybc/googlesheets. I haven’t used it yet, but it’s gotten good reviews.

-

When loading a data file into R, you are just loading it into the R workspace. Any alterations or modifications you make to the data frame will not be reflected in the file in your system, just in the copy in the R workspace.

-

The tricky thing now is that the way that feels most natural or normal for you to organize and name your files and folders doesn’t necessarily translate into a good way for R (or other programming languages) to look at them. In order to load a file into R, you need to provide read.csv() or read.delim() with the “path” to the file, which is just a text string.

-

For example, here’s a screenshot of a data file I’d like to load into R.

-
- - -
-

I have the option turned on in my system to see the full path at the bottom of the file window, so you can see a full list of all of the folders this data file is embedded in. In order to read this data into R, you need to type out the full path, although a nice thing about

- - - -
  joe_vowels <- read.csv("~/ownCloud/DocSyncUoE/Courses/LSA/data/joe_vowels.csv")
- - - -

If you’re not sure what it looks like on your system, use the file.choose() function.

- - - -
  file.choose()
- - - -

That’ll launch the default visual file browser for your system. After browsing around and clicking on a file, file.choose() will print the character string that represents the path to that file into the console.

-
-

Hygiene

-

Don’t rely heavily on file.choose(). Sometimes, I’ve seen R scripts with the following line of code in it:

- - - -
data <- read.csv(file.choose())
- - - -

Please never do this. I would caution against using it in any code, scripts or notebooks at all. Only ever use it to refresh your memory of where your data is located. By always writing out the the text of the path to the data, you

-
    -
  • produce more transparent code
  • -
  • allow yourself to re-run your analysis without needing to click around
  • -
  • ensure that you’re using the same data file every single time
  • -
-
-

One pretty cool thing is that if a data file is up on a website somewhere, you can just access it by passing the url to read.csv() or read.delim().2 Here is some sample data on the Donner Party.3

- - - -
  donner <- read.csv("http://jofrhwld.github.io/data/donner.csv")
-  head(donner)
- - - -
-

~5 minute activity

-

Download the file joe_vowels.csv from the course Canvas. Save it to the data directory for the course, or wherever you would like to keep it. Read it into R. What’s my mean F1 and F2 across all of my vowels?

-
-
-
-
-

Cleaning up data

-

We’ve discussed how data ought to be tidily organized, and we’ve now gone over how to load data, and minimally explore dataframes in R. Let’s quickly go over how to tidy up messy data a little.

-

First, let’s look at the wide iy_ah_wide dataframe, which is part of the lsa2017 package.

- - - -
iy_ah_wide
- - -
- -
- - - -

The problem with this data is

-
    -
  • There are values spread across the columns.
  • -
  • Individual column names have combined these values with some variables.
  • -
-

Getting to a tidier format of the data will involve a three step process:

-
    -
  1. Converting this wide data format to a long data format.
  2. -
  3. Separating the vowel class values from the formant variable.
  4. -
  5. Spreading the formant variables back out along the column space.’
  6. -
-

We can do this easily with the functions gather(), separate() and spread() from the tidyr package.

-

For a smaller illustrative purpose for people who may feel uneasy about vowels and formants, I’ll be illustrating each of these steps with a simpler data set about how many apples and oranges two people bought, and how many they ate.

- - - -
fruit <- data.frame(person = c("Oakley", "Charlie"),
-                 apples_bought = c(5, 3),
-                 apples_ate = c(1, 2),
-                 oranges_bought = c(5, 4),
-                 oranges_ate = c(3, 3))
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
personapples_boughtapples_ateoranges_boughtoranges_ate
Oakley5153
Charlie3243
- - - - - - -

Note, even though the column labels look different, this is is an equivalent table to formatting involving merged column label cells.

-
-
- - -
-
-
-

Gathering Columns

-

The gather() function makes wide data long. It takes the following arguments:

-
-

gather(data, key, value, cols)

-
-
    -
  • data -
      -
    • Obviously, the data you want to reshape. must be a data frame.
    • -
  • -
  • key and value -
      -
    • These are new column names that you want to create. gather() is going to take the column names and put them in the column you give to key, and the values from all the cells and put them in the column you call value.
    • -
  • -
  • cols -
      -
    • An indication of which columns you want to gather, either a vector of column names, a vector of column numbers, or some specialized methods for gather() that we’ll discuss.
    • -
  • -
-

Here’s how that’ll work for the fruit data. We’ll tell gather() to gather columns 2 through 5.

- - - -
fruit_long <- gather(data = fruit,
-                     key = fruit_behavior,
-                     value = number,
-                     2:5)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
personfruit_behaviornumber
Oakleyapples_bought5
Charlieapples_bought3
Oakleyapples_ate1
Charlieapples_ate2
Oakleyoranges_bought5
Charlieoranges_bought4
Oakleyoranges_ate3
Charlieoranges_ate3
- - - - - - -

gather() has returned a new data frame. It has created a new column called fruit_behavior, because we told it to with the key argument, and it has created a new column called number, because we told it to with the value function. It has taken all of the column names of the columns we told it to gather, and put them into the fruit_behavior column, and the numeric values from the columns we told it to gather, and put them into the number column. It has also repeated the rows of the other columns (person) as logically necessary.

-

Now, we told it to gather column numbers 2 through 5, but this would have also worked:

- - - -
gather(data = fruit, 
-       key = fruit_behavior, 
-       value = number, 
-       c("apples_bought","apples_ate", "oranges_bought", "oranges_ate"))
- - - -

gather() also has a more convenient method of specifying the columns you want to gather by passing it a named range of columns. We want to gather all columns from apples_bought to oranges_ate, so we can tell it to do so with apples_bought:oranges_ate.

- - - -
gather(data = fruit, 
-       key = fruit_behavior, 
-       value = number, 
-       apples_bought:oranges_ate)
- - - -

Ok, let’s do this now to the iy_ah_wide data, gathering all of the columns from ah_F1 to iy_F2.

- - - -
iy_ah_step1 <- gather(data = iy_ah_wide, 
-                      key = vowel_formant, 
-                      value = hz, 
-                      ah_F1:iy_F2)
-iy_ah_step1
- - -
- -
- - - -

For the fruit data, the only un-gathered column was person, but for iy_ah_wide, idstring, age, sex, and year, were all ungathered. Here you can see how all rows of ungathered columns are repeated as logically necessary.

-
-
-

Separating Columns

-

There is still a problem with both the fruit_long and the iy_ah_step1 data frames, which is that two different kinds of data are merged within one column. For iy_ah_step1, the vowel class and formant variable are merged together (e.g. ah_F1) and for fruit_long, the fruit and behavior are merged together (e.g. apple_bought). We need to separate these, with a very aptly named function called separate()

-
-

separate(data, col, into, sep)

-
-
    -
  • data -
      -
    • Again,the data frame you want to do this separation to.
    • -
  • -
  • col -
      -
    • The name of the column you want to separate.
    • -
  • -
  • into -
      -
    • A character vector of the new column names you want to create.
    • -
  • -
  • sep -
      -
    • The character or regex pattern you want to use to split up the values in col.
    • -
  • -
-

Here’s how it works for fruit_long.

- - - -
fruit_separate <- separate(data = fruit_long,
-                           col = fruit_behavior,
-                           into = c("fruit", "behavior"),
-                           sep = "_")
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
personfruitbehaviornumber
Oakleyapplesbought5
Charlieapplesbought3
Oakleyapplesate1
Charlieapplesate2
Oakleyorangesbought5
Charlieorangesbought4
Oakleyorangesate3
Charlieorangesate3
- - - - - - -

It has returned a new data frame with the fruit_behavior column split into two new columns, named after what I passed to the into argument. It split up fruit_behavior based on what I passed to sep, which was the underscore character.

-

Let’s do this for iy_ah_step1 now.

- - - -
iy_ah_step2 <- separate(iy_ah_step1, 
-                        vowel_formant, 
-                        into = c("vowel", "formant"),
-                        sep = "_")
-iy_ah_step2
- - -
- -
- - - -

We now have two separate columns for vowel and formant.

-
-

Hygiene

-

I have been very helpful and used underscores to merge together the values we want to separate. Be helpful to yourself, and be consistent in the semantics of how you used potential delimiters like - and _. Here’s an example of being helpful to yourself:

-
project_subject_firstname-lastname
-
-EDI_1_Stuart-Duddingston
-EDI_2_Connor-Black-Macdowall
-EDI_3_Mhairi
-

This is helpful, because when you separate by underscore, you’ll have something tidy

-
EDI    1    Stuart-Duddingston
-EDI    2    Connor-Black-Macdowall
-EDI    3    Mhairi
-

If you used - for everything, you’ll have chaos when you try to separate them because some speakers have “double barreled” names, and some speakers have only first names:

-
# Input:
-EDI-1-Stuart-Duddingston
-EDI-2-Connor-Black-Macdowall
-EDI-3-Mhairi
-
-# Becomes
-
-EDI    1    Stuart    Duddingston
-EDI    2    Connor    Black        Macdowall
-EDI    3    Mhairi
-

This goes beyond R programming. You should make some decisions and stick with them for all of your data analysis, including file naming, Praat tier naming, etc.

-
-
-
-

Spreading columns

-

We’ve got one last step, which is spreading the values in some rows across the column space. With the fruit data, we might not want a column called behavior, but actually have two columns called bought and ate. For the vowel data, we definitely don’t want one column called formant. We want one called F1 and one called F2. We can do this with the spread() function.

-
-

spread(data, key, value)

-
-
    -
  • data -
      -
    • Again, the data we want to work with.
    • -
  • -
  • key -
      -
    • The column whose values you want to spread across the column space.
    • -
  • -
  • value -
      -
    • The column with values that you want to fill in the cells.
    • -
  • -
-

Here’s how that looks with the fruit_separate data.

- - - -
fruit_spread <- spread(data = fruit_separate,
-                       key = behavior,
-                       value = number)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
personfruitatebought
Charlieapples23
Charlieoranges34
Oakleyapples15
Oakleyoranges35
- - - - - - -

This has created a new data frame. I told spread() to spread the values in behavior across the column space. Because it had only two unique values in it (bought and ate), it has created two new columns called bought and ate. After creating these new columns, it had to fill in the new cells with some values, and I told it to use the values in number for that.

-

Here’s how that works with iy_ah_step2.

- - - -
iy_ah_step3 <- spread(data = iy_ah_step2,
-                      key = formant,
-                      value = hz)
-iy_ah_step3
- - -
- -
- - - -

Now, we’ve finally gotten to a tidy data format. In our next meeting, we’ll discuss how to chain these three functions into one easy to read process.

-
-

Idiom

-

You might have noticed that in the functions above, I’ve put a new line between individual function arguments. I’ve done this because white-space doesn’t matter when it comes to R. I could have written these with just spaces between each argument, but that would be too visually crowded.

- - - -
# compare
-
-# One line
-fruit_separate <- separate(data = fruit_long, col = fruit_behavior, into = c("fruit", "behavior"), sep = "_")
-
-# New Lines
-fruit_separate <- separate(data = fruit_long, 
-                           col = fruit_behavior, 
-                           into = c("fruit", "behavior"), 
-                           sep = "_")
-
- - - -

I encourage you to use new lines similarly to give yourself “some space to breathe”. Don’t be shy about it. But, if you put newlines between some arguments, you should really put new lines between all arguments.

-
- -
-
-
-
-
    -
  1. My personal aesthetic preference is for tab-delimited files.

  2. -
  3. This doesn’t work if the file is behind encryption, i.e. if it begins with https://.

  4. -
  5. “The Donner Party (sometimes called the Donner-Reed Party) was a group of American pioneer migrants who set out for California in a wagon train. Delayed by a series of mishaps, they spent the winter of 1846–47 snowbound in the Sierra Nevadas. Some of the migrants resorted to cannibalism to survive, eating those who had succumbed to starvation and sickness.” https://en.wikipedia.org/wiki/Donner_Party

  6. -
-
- -
LS0tCnRpdGxlOiAiRGF0YSBhbmQgRGF0YSBGcmFtZXMiCm91dHB1dDogCiAgaHRtbF9ub3RlYm9vazogCiAgICBjb2RlX2ZvbGRpbmc6IG5vbmUKICAgIGNzczogLi4vY3VzdG9tLmNzcwogICAgdGhlbWU6IGZsYXRseQogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCiAgICB0b2NfZGVwdGg6IDMKLS0tCgoKCiMgVGhlIEFnZW5kYQoKLSBUYWxrIGFib3V0IG9yZ2FuaXppbmcsIHN0cnVjdHVyaW5nICYgc3RvcmluZyB5b3VyIGRhdGEuCi0gUmV2aWV3IHNvbWUgaW1wb3J0YW50IGRhdGEgaW5wdXQvb3V0cHV0IG9wdGlvbnMgaW4gUi4KLSBSZXZpZXcgYWJvdXQgaG93IGRhdGEgZnJhbWVzIHdvcmsgaW4gUi4KCgojIyBTZXR1cAoKPGRpdiBjbGFzcyA9ICJicmVhayBib3giPgo8c3BhbiBjbGFzcyA9ICdiaWctbGFiZWwnPn4yIG1pbnV0ZSBzZXR1cDwvc3Bhbj4KCk1ha2Ugc3VyZSB0aGF0IHlvdXIgY3VycmVudCBSU3R1ZGlvIHByb2plY3QgaXMgc2V0IHRvIHlvdXIgY291cnNlIHByb2plY3QuIENyZWF0ZSBhbmQgc2F2ZSB5b3VyIFIgbm90ZWJvb2sgZm9yIHRvZGF5IChJIHdvdWxkIHJlY29tbWVuZCBgMDJfbGVjdHVyZS5SbWRgKS4gQ2xlYXIgdGhlIHdvcmtzcGFjZSBvZiBhbnl0aGluZyBsZWZ0IG92ZXIgZnJvbSBsYXN0IHRpbWUgd2l0aCB0aGUgbWVudSBvcHRpb25zIGBTZXNzaW9uID4gQ2xlYXIgV29ya3NwYWNlYC4KCkxvYWQgdGhlIGltcG9ydGFudCBwYWNrYWdlcyBmb3IgdG9kYXkncyB3b3JrOgoKYGBge3J9CmxpYnJhcnkobHNhMjAxNykKbGlicmFyeSh0aWR5dmVyc2UpCmBgYAoKPC9kaXY+CgoKCjxoci8+CgojIERhdGEgQ29sbGVjdGlvbiBhbmQgU3RvcmFnZQoKIyMgR2VuZXJhbCBQcmluY2lwbGVzIG9mIERhdGEgQ29sbGVjdGlvbgoKIyMjIE92ZXItY29sbGVjdCAoZm9yIHNvbWUgdGhpbmdzKQpXaGVuIGNvbGxlY3RpbmcgZGF0YSBpbiB0aGUgZmlyc3QgcGxhY2UsIG92ZXItY29sbGVjdCBpZiBhdCBhbGwgcG9zc2libGUgb3IgZXRoaWNhbC4gVGhlIHdvcmxkIGlzIGEgdmVyeSBjb21wbGV4IHBsYWNlLCBzbyB0aGVyZSBpcyBubyB3YXkgeW91IGNvdWxkIGNyYW0gaXQgYWxsIGludG8gYSBib3R0bGUsIGJ1dCBnaXZlIGl0IHlvdXIgYmVzdCBzaG90ISBJZiBkdXJpbmcgdGhlIGNvdXJzZSBvZiB5b3VyIGRhdGEgYW5hbHlzaXMsIHlvdSBmaW5kIHRoYXQgaXQgd291bGQgaGF2ZSBiZWVuIHJlYWxseSB1c2VmdWwgdG8gaGF2ZSBkYXRhIG9uLCBzYXksIGR1cmF0aW9uLCBhcyB3ZWxsIGFzIGZvcm1hbnQgZnJlcXVlbmNpZXMsIGl0IGJlY29tZXMgY29zdGx5IHRvIHJlY29sbGVjdCB0aGF0IGRhdGEsIGVzcGVjaWFsbHkgaWYgeW91IGhhdmVuJ3QgbGFpZCB0aGUgcHJvcGVyIHRyYWlsIGZvciB5b3Vyc2VsZi4gT24gdGhlIG90aGVyIGhhbmQsICphdXRvbWF0aW9uKiBvZiBhY291c3RpYyBhbmFseXNpcyBvciBkYXRhIHByb2Nlc3NpbmcgY2FuIGN1dCBkb3duIG9uIHRoaXMgY29zdGxpbmVzcy4KClRoaXMgZG9lc24ndCBnbyBmb3IgcGVyc29uYWwgaW5mb3JtYXRpb24gb24gaHVtYW4gc3ViamVjdHMsIHRob3VnaC4gSXQncyBpbXBvcnRhbnQgZnJvbSBhbiBldGhpY3Mgc3RhbmRwb2ludCB0byBhc2sgZm9yIGV2ZXJ5dGhpbmcgeW91J2xsIG5lZWQsIGJ1dCBub3QgbW9yZS4gWW91IGRvbid0IHdhbnQgdG8gY29sbGVjdCBhbiBlbm9ybW91cyBkZW1vZ3JhcGhpYyBwcm9maWxlIG9uIHlvdXIgcGFydGljaXBhbnRzIGlmIHlvdSB3b24ndCB3aW5kIHVwIHVzaW5nIGl0LCBlc3BlY2lhbGx5IGlmIHlvdSBrbm93IHlvdSB3b24ndCB1c2UgaXQgdG8gYmVnaW4gd2l0aC4KCgoKCiMjIyBQcmVzZXJ2ZSBIaUQgSW5mbwpJZiwgZm9yIGluc3RhbmNlLCB5b3UncmUgY29sbGVjdGluZyBkYXRhIG9uIHRoZSBlZmZlY3Qgb2Ygdm9pY2luZyBvbiBwcmVjZWRpbmcgdm93ZWwgZHVyYXRpb24sIHByZXNlcnZlIGhpZ2ggZGltZW5zaW9uYWwgZGF0YSBjb2RpbmcsIGxpa2UgTGV4aWNhbCBJdGVtLCBvciB0aGUgdHJhbnNjcmlwdGlvbiBvZiB0aGUgZm9sbG93aW5nIHNlZ21lbnQuIFRoZXNlIGhpZ2ggZGltZW5zaW9uYWwgY29kaW5ncyBwcm9iYWJseSB3b24ndCBiZSB0b28gdXNlZnVsIGZvciB5b3VyIGltbWVkaWF0ZSBhbmFseXNpcywgYnV0IHRoZXkgd2lsbCBhbGxvdyB5b3UgdG8gcHJvY2VkdXJhbGx5IGV4dHJhY3QgYWRkaXRpb25hbCBmZWF0dXJlcyBmcm9tIHRoZW0gYXQgYSBsYXRlciB0aW1lLiBGb3IgZXhhbXBsZSwgaWYgeW91IGhhdmUgYSBjb2x1bW4gY2FsbGVkIGBmb2xfc2VnYCwgd2hpY2ggaXMganVzdCBhIHRyYW5zY3JpcHRpb24gb2YgdGhlIGZvbGxvd2luZyBzZWdtZW50LCBpdCBpcyBlYXN5IGNyZWF0ZSBhIG5ldyBjb2x1bW4gY2FsbGVkIGBtYW5uZXJgIHdpdGggY29kZSB0aGF0IGxvb2tzIGxpa2UgdGhpczoKCmBgYHtyfQp0YWJsZShpeV9haCRmb2xfc2VnKQpgYGAKCmBgYHtyfQppeV9haCA8LSBpeV9haCAlPiUKICAgICAgICAgICAgbXV0YXRlKG1hbm5lciA9IHJlY29kZShmb2xfc2VnLCBCID0gJ3N0b3AnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIENIID0gJ2FmZnJpY2F0ZScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgRCA9ICdzdG9wJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBESCA9ICdmcmljYXRpdmUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGBGYCA9ICdmcmljYXRpdmUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEcgPSAnc3RvcCcsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgSEggPSAnZnJpY2F0aXZlJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBKSCA9ICdhZmZyaWNhdGUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEsgPSAnc3RvcCcsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTCA9ICdsaXF1aWQnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE0gPSAnbmFzYWwnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE4gPSAnbmFzYWwnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE5HID0gJ25hc2FsJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBQID0gJ3N0b3AnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFIgPSAnbGlxdWlkJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBTID0gJ2ZyaWNhdGl2ZScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgU0ggPSAnZnJpY2F0aXZlJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBTUCA9ICdwYXVzZScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYFRgID0gJ3N0b3AnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFRIID0gJ2ZyaWNhdGl2ZScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgViA9ICdmcmljYXRpdmUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFcgPSAnZ2xpZGUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFkgPSAnZ2xpZGUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFogPSAnZnJpY2F0aXZlJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBaSCA9ICdmcmljYXRpdmUnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC5kZWZhdWx0ID0gJ3Zvd2VsJykpCnRhYmxlKGl5X2FoJG1hbm5lcikKYGBgCgoKIyMjIExlYXZlIEEgVHJhaWwgb2YgQ3J1bWJzIApCZSBzdXJlIHRvIGFuc3dlciB0aGlzIHF1ZXN0aW9uOiBIb3cgY2FuIEkgcHJlc2VydmUgYSByZWNvcmQgb2YgdGhpcyBvYnNlcnZhdGlvbiBpbiBzdWNoIGEgd2F5IHRoYXQgSSBjYW4gcXVpY2tseSByZXR1cm4gdG8gaXQgYW5kIGdhdGhlciBtb3JlIGRhdGEgb24gaXQgaWYgbmVjZXNzYXJ5PyBJZiB5b3UgZmFpbCB0byBzdWNjZXNzZnVsbHkgYW5zd2VyIHRoaXMgcXVlc3Rpb24sIHRoZW4geW91J2xsIGJlIGxvc3QgaW4gdGhlIHdvb2RzIGlmIHlvdSBldmVyIHdhbnQgdG8gcmVzdHVkeSwgYW5kIHRoZSBvbmx5IHdheSBob21lIGlzIHRvIHJlcGxpY2F0ZSB0aGUgc3R1ZHkgZnJvbSBzY3JhdGNoLgoKIyMjIEdpdmUgTWVhbmluZ2Z1bCBOYW1lcwpHaXZlIG1lYW5pbmdmdWwgbmFtZXMgdG8gYm90aCB0aGUgbmFtZXMgb2YgcHJlZGljdG9yIGNvbHVtbnMsIGFzIHdlbGwgYXMgdG8gbGFiZWxzIG9mIG5vbWluYWwgb2JzZXJ2YXRpb25zLiBLZWVwaW5nIGEgcmVhZG1lIGRlc2NyaWJpbmcgdGhlIGRhdGEgaXMgc3RpbGwgYSBnb29kIGlkZWEsIGJ1dCBhdCBsZWFzdCBub3cgdGhlIGRhdGEgaXMgYXBwcm9hY2hhYmxlIGF0IGZpcnN0IGdsYW5jZS4KCiMjIyBEaXN0aW5ndWlzaCBiZXR3ZWVuIGAwYCBhbmQgYE5BYApJIGhhdmUgd29ya2VkIHdpdGggc29tZSBzcHJlYWRzaGVldHMgd2hlcmUgbWlzc2luZyBkYXRhIHdhcyBnaXZlbiBhIHZhbHVlIG9mIGAwYCwgd2hpY2ggd2lsbCBtZXNzIHRoaW5ncyB1cC4gRm9yIGV4YW1wbGUsIC9veS8gaXMgYSBmYWlybHkgcmFyZWx5IG9jY3VycmluZyBwaG9uZW1lIGluIEVuZ2xpc2gsIGFuZCBpdCdzIHBvc3NpYmxlIHRoYXQgYSBzcGVha2VyIHdvbid0IHByb2R1Y2UgYW55IHRva2VucyBpbiBhIHNob3J0IGludGVydmlldy4gSW4gYSBzcHJlYWRzaGVldCBvZiBtZWFuIEYxIGFuZCBGMiBmb3IgYWxsIHZvd2VscywgdGhhdCBzcGVha2VyIHNob3VsZCBnZXQgYW4gYE5BYCBmb3IgL295LywgKipub3QqKiBgMGAuCgojIyBTdG9yaW5nIERhdGEKCldoZW4gd2Ugc3RvcmUgZGF0YSwgaXQgc2hvdWxkIGJlOgoKKEByYXcpCioqUmF3KioKUmF3IGRhdGEgaXMgdGhlIG1vc3QgdXNlZnVsIGRhdGEuIEl0J3MgaW1wb3NzaWJsZSB0byBtb3ZlIGRvd24gdG8gc21hbGxlciBncmFudWxhcml0eSBmcm9tIGEgY29hcnNlciwgc3VtbWFyaXplZCBncmFudWxhcml0eS4gU3VtbWFyeSB0YWJsZXMgZXRjLiBhcmUgbmljZSBmb3IgcHVibGlzaGluZyBpbiBhIHBhcGVyIGRvY3VtZW50LCBidXQgcmF3IGRhdGEgaXMgd2hhdCB3ZSBuZWVkIGZvciBhc2tpbmcgbm92ZWwgcmVzZWFyY2ggcXVlc3Rpb25zIHdpdGggb2xkIGRhdGEuCgooQG9wZW4pCioqT3BlbiBmb3JtYXR0ZWQqKgpEbyBub3QgdXNlIHByb3ByaWV0YXJ5IGRhdGFiYXNlIHNvZnR3YXJlIGZvciBsb25nIHRlcm0gc3RvcmFnZSBvZiB5b3VyIGRhdGEuIEkgaGF2ZSBlbm91Z2ggaGVhcmQgc3RvcmllcyBhYm91dCBpbnRlcmVzdGluZyBkYXRhIHNldHMgdGhhdCBhcmUgbm8gbG9uZ2VyIGFjY2Vzc2libGUgZm9yIHJlc2VhcmNoIGVpdGhlciBiZWNhdXNlIHRoZSBzb2Z0d2FyZSB0aGV5IGFyZSBzdG9yZWQgaW4gaXMgZGVmdW5jdCwgb3IgY3VycmVudCB2ZXJzaW9ucyBhcmUgbm90IGJhY2t3YXJkcyBjb21wYXRpYmxlLiBBdCB0aGF0IHBvaW50LCB5b3VyIGRhdGEgaXMgcHJvcGVydHkgb2YgTWljcm9zb2Z0LCBvciB3aG9ldmVyLiBTdG9yZSB5b3VyIGRhdGEgYXMgcmF3IHRleHQsIGRlbGltaXRlZCBpbiBzb21lIHdheSAoSSBwcmVmZXIgdGFicykuCgooQGNvbnNpc3RlbnQpCioqQ29uc2lzdGVudCoqCkkgdGhpbmsgdGhpcyBpcyBtb3N0IGltcG9ydGFudCB3aGVuIHlvdSBtYXkgaGF2ZSBkYXRhIGluIG1hbnkgc2VwYXJhdGUgZmlsZXMuIEVhY2ggZmlsZSBhbmQgaXRzIGhlYWRlcnMgc2hvdWxkIGJlIGNvbnNpc3RlbnRseSBuYW1lZCBhbmQgZm9ybWF0dGVkLiBUaGV5IHNob3VsZCBiZSBjb25zaXN0ZW50bHkgZGVsaW1pdGVkIGFuZCBjb21tZW50ZWQgYWxzby4gVGhlcmUgaXMgbm90aGluZyB3b3JzZSB0aGFuIGluY29uc2lzdGVudCBoZWFkZXJzIGFuZCBlcnJhdGljIGNvbW1lbnRzLCBsYWJlbHMsIGhlYWRlcnMgb3IgTkEgY2hhcmFjdGVycyBpbiBhIGNvcnB1cy4gKEF1dG9tYXRpb24gYWxzbyBoZWxwcyBoZXJlLikKCihAZG9jdW1lbnRlZCkKKipEb2N1bWVudGVkKioKUHJvZHVjZSBhIHJlYWRtZSBkZXNjcmliaW5nIHRoZSBkYXRhLCBob3cgaXQgd2FzIGNvbGxlY3RlZCBhbmQgcHJvY2Vzc2VkLCBhbmQgZGVzY3JpYmUgZXZlcnkgdmFyaWFibGUgYW5kIGl0cyBwb3NzaWJsZSB2YWx1ZXMuCgo8aHIvPgojIFN0cnVjdHVyaW5nIERhdGEKCiMjIEJyZWFraW5nIEJhZCBTcHJlYWRzaGVldCBIYWJpdHMKCkxldCdzIHN0YXJ0IG9mZiBieSBsb29raW5nIGF0IGEgcGljdHVyZSBvZiBhIGRhdGEgb3JnYW5pemF0aW9uIGFwcHJvYWNoIHRoYXQgbWlnaHQgbG9vayBmYW1pbGlhciwgYW5kIGlzIGEgKnZlcnkgYmFkKiB3YXkgdG8gZG8gdGhpbmdzOgoKIVtdKGZpZ3VyZXMvYmFkX3NwcmVhZHNoZWV0LnBuZykKClRoaXMgc3ByZWFkc2hlZXQgaGFzIGEgZmFpcmx5IHN0cmljdCBvcmdhbml6YXRpb25hbCBzdHJ1Y3R1cmUsIGJ1dCBpcyB2aXJ0dW91c2x5IGhvcGVsZXNzIGZvciBkb2luZyBhbnkga2luZCBvZiBzZXJpb3VzIHN0YXRpc3RpY2FsIGFuYWx5c2lzLiBJdCdzIGFsc28gdmVyZ2luZyBvbiBpcnJlcGFyYWJsZSB1c2luZyBSLiBUaGlzIGJlY2F1c2UgdGhlIGRhdGEgaW4gdGhpcyBzcHJlYWRzaGVldCBpcyBvcmdhbml6ZWQgdG8gYmUgZWFzeSB0byBsb29rIGF0IHdpdGggeW91ciBleWViYWxscyDwn5GALiAKCkJ1dCBsb29raW5nIGF0IG5lYXRseSBvcmdhbml6ZWQgZGF0YSBpbiBhIHNwcmVhZHNoZWV0IGlzIG5vdCBhIHN0YXRpc3RpY2FsIGFuYWx5c2lzIHRlY2huaXF1ZS4gU28gd2UgbmVlZCB0byBzdGFydCBvcmdhbml6aW5nIG91ciBkYXRhIGluIGEgd2F5IHRoYXQgaXNuJ3QgZWFzeSB0byBsb29rIGF0LCBidXQgKmlzKiBlYXN5IHRvIGdyYXBoIGFuZCBhbmFseXplLgoKCgojIyBCZXR0ZXIgSGFiaXRzCgpFdmVyeW9uZSB3b3JraW5nIHdpdGggZGF0YSAoaW4gUiBvciBvdGhlcndpc2UpIHNob3VsZCByZWFkIEhhZGxleSBXaWNraGFtJ3MgcGFwZXIgb24gVGlkeSBEYXRhOiBbaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL3RpZHlyL3ZpZ25ldHRlcy90aWR5LWRhdGEuaHRtbF0oaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL3RpZHlyL3ZpZ25ldHRlcy90aWR5LWRhdGEuaHRtbCkgSWYgeW91IGFyZSBjb21pbmcgb2ZmIG9mIG9yZ2FuaXppbmcgeW91ciBkYXRhIGxpa2UgdGhlIHBpY3R1cmUgYWJvdmUsIHRoZXJlIGFyZSBhIGZldyBndWlkZWxpbmVzIG5vdCBkaXNjdXNzZWQgaW4gdGhhdCBwYXBlciwgbmFtZWx5OgoKIyMjIyBGb2xsb3cgdGhlc2UgcnVsZXMKMS4gVGhlIGZpcnN0IHJvdyBvZiB0aGUgZGF0YSAqbXVzdCogYmUgdGhlIG5hbWVzIG9mIHRoZSBkYXRhIGNvbHVtbnMuCjIuIEFsbCBvdGhlciByb3dzICptdXN0KiBiZSB0aGUgZGF0YSwgYW5kIG5vdGhpbmcgZWxzZS4KMy4gWW91IGNhbm5vdCB1c2UgZW1wdHkgcm93cyBvciBlbXB0eSBjb2x1bW5zIGFzIHZpc3VhbCBhaWRzIHRvIGxvb2sgYXQgdGhlIGRhdGEuCjQuIFRoZSBzcHJlYWRzaGVldCBtdXN0IG5vdCBjb250YWluIGFueSBzdW1tYXJ5IGNlbGxzLiBObyBmaW5hbCByb3cgY2FsbGVkICJBdmVyYWdlIiBvciBmaW5hbCBjb2x1bW4gY2FsbGVkICJUb3RhbCIuIFdlIGNhbiBjcmVhdGUgdGhlc2UgaW4gUiwgYW5kIHRoZXkgbWFrZSBkYXRhIHByb2Nlc3NpbmcgbW9yZSBjb21wbGljYXRlZCBpZiB0aGV5J3JlIGluY2x1ZGVkIGluIHRoZSByYXcgZGF0YS4KCiMjIyBTZW1hbnRpY3Mgb2YgRGF0YSBTdHJ1Y3R1cmUKCkluIHRoZSBzZW1hbnRpY3Mgb2YgZGF0YSBzdHJ1Y3R1cmUgV2lja2hhbSBsYXlzIG91dCwgdGhlcmUgYXJlIHRocmVlIGltcG9ydGFudCBwcmltaXRpdmVzOgoKMS4gVmFyaWFibGVzCjMuIFZhbHVlcwoyLiBPYnNlcnZhdGlvbnMKCiMjIyMgRGVmaW5pbmcgdGhlIHByaW1pdGl2ZXMKCiMjIyMjIFZhcmlhYmxlcwoKVmFyaWFibGVzIGFyZSB0aGUgY29sbGVjdGlvbnMgb2YgdmFsdWVzIG9mIGludGVyZXN0IGluIGRhdGEgYW5hbHlzaXMuIEZvciBleGFtcGxlLCBsZXQncyBzYXkgeW91IHdlcmUgZG9pbmcgYSBzdHVkeSBvbiB1bm5vcm1hbGl6ZWQgdm93ZWwgc3BhY2Ugc2l6ZSBieSBqdXN0IGxvb2tpbmcgYXQgL2k6LyBhbmQgL8mRLy4gVGhlIHZhcmlhYmxlcyBpbiB0aGF0IHN0dWR5IGNvdWxkIGJlOgoKLSBgc3BlYWtlcmAKLSBgd29yZGAKLSBgcGhvbmVtZWAKLSBgZHVyYXRpb25gCi0gYEYxYAotIGBGMmAKLSBgd29yZF9mcmVxdWVuY3lgCgojIyMjIyBWYWx1ZXMKClZhbHVlcyBhcmUsIGFzIHRoZSBuYW1lIGltcGxpZXMsIHRoZSBwb3NzaWJsZSB2YWx1ZXMgdGhhdCBlYWNoIHZhcmlhYmxlIGNhbiBoYXZlLCBmb3IgZXhhbXBsZToKCi0gYHNwZWFrZXJgOiBgIk9ha2xleSJgLCBgIkNoYXJsaWUiYCwgYCJBemFyaWEiYCwgYC4uLmAKLSBgd29yZGA6IGAic3RyZWV0ImAsIGAidGhpcnRlZW4iYCwgYCJub3QiYCwgYCJnb3QiYCwgYC4uLmAKLSBgcGhvbmVtZWA6IGAiaXkiYCwgYCJhaCJgCgoKIyMjIyMgT2JzZXJ2YXRpb25zCgpBbiBvYnNlcnZhdGlvbiBpcyB0aGUgbWluaW1hbCB1bml0IGFjcm9zcyB3aGljaCBhbGwgdmFyaWFibGVzIGFyZSBjb2xsZWN0ZWQuIEZvciBleGFtcGxlLCBpbiB0aGUgdm93ZWwgc3BhY2Ugc3R1ZHksIG9uZSBvYnNlcnZhdGlvbiB3b3VsZCBiZSBvbmUgaW5zdGFuY2Ugb2YgYW4gdXR0ZXJlZCB2b3dlbCBmb3Igd2hpY2ggeW91IHJlY29yZCB3aG8gdGhlIHNwZWFrZXIgd2FzLCB0aGUgd29yZCwgdGhlIGR1cmF0aW9uLCBGMSwgRjIsIGV0Yy4KCiMjIyMgT3JnYW5pemluZyBkYXRhIHdpdGggdGhlc2UgcHJpbWl0aXZlcwoKT25jZSB5b3UndmUgdGhvdWdodCB0aHJvdWdoIHdoYXQgdGhlIHZhcmlhYmxlcywgdmFsdWVzIGFuZCBvYnNlcnZhdGlvbnMgYXJlIGZvciB5b3VyIHN0dWR5LCB0aGUgcHJpbmNpcGxlIG9mIGhvdyB0byBvcmdhbml6ZSB0aGVtIGlzIHNpbXBsZToKCjEuIEVhY2ggdmFyaWFibGUgZm9ybXMgYSBjb2x1bW4uCjIuIEVhY2ggb2JzZXJ2YXRpb24gZm9ybXMgYSByb3cuCgpGb3IgdGhlIHZvd2VsIHNwYWNlIHNpemUgc3R1ZHksIHlvdSBtaWdodCB3YW50IHRvIHdpbmQgdXAgd2l0aCBhIHBsb3QgdGhhdCBsb29rcyBsaWtlIHRoaXM6CgpgYGB7ciBlY2hvID0gRiwgZGV2ID0gJ3N2ZycsIGZpZy53aWR0aD0xMS8yLCBmaWcuaGVpZ2h0PTUvMn0KaXlfYWggJT4lCiAgZ3JvdXBfYnkoaWRzdHJpbmcsIHNleCwgYWdlLCBwbHRfdmNsYXNzKSAlPiUKICBzdW1tYXJpc2UoRjEgPSBtZWFuKEYxKSwKICAgICAgICAgICAgRjIgPSBtZWFuKEYyKSkgJT4lCiAgZ2dwbG90KGFlcyhGMiwgRjEsIHNoYXBlID0gcGx0X3ZjbGFzcywgY29sb3IgPSBzZXgpKSsKICAgIGdlb21fcG9pbnQoKSsKICAgIHNjYWxlX3lfcmV2ZXJzZSgpKwogICAgc2NhbGVfeF9yZXZlcnNlKCkrCiAgICBzY2FsZV9jb2xvcl9icmV3ZXIocGFsZXR0ZSA9ICJEYXJrMiIpKwogICAgc2NhbGVfc2hhcGVfZGlzY3JldGUoInZvd2VsIikrCiAgICB0aGVtZV9taW5pbWFsKCkrCiAgICBjb29yZF9maXhlZCgpCiAgCmBgYAoKCgpJdCB3b3VsZG4ndCBiZSB1bmNvbW1vbiB0byBzZWUgdGhlIGRhdGEgdW50aWRpbHkgb3JnYW5pemVkIGxpa2UgdGhpczoKCmBgYHtyIGVjaG8gPSBGfQppeV9haCAlPiUKICBncm91cF9ieShpZHN0cmluZywgc2V4LCBhZ2UsIHBsdF92Y2xhc3MpICU+JQogIHN1bW1hcmlzZShGMSA9IG1lYW4oRjEpLAogICAgICAgICAgICBGMiA9IG1lYW4oRjIpKSAlPiUKICBnYXRoZXIoZm9ybWFudCwgdmFsdWUsIEYxOkYyKSU+JQogIG11dGF0ZSh2YXIgPSBwYXN0ZShwbHRfdmNsYXNzLCBmb3JtYW50LCBzZXAgPSAiXyIpKSAlPiUKICBzZWxlY3QoLXBsdF92Y2xhc3MsIC1mb3JtYW50KSU+JQogIHNwcmVhZCh2YXIsIHZhbHVlKQpgYGAKCjxkaXYgY2xhc3MgPSAiYm94IGJyZWFrIj4KPHNwYW4gY2xhc3M9ImJpZy1sYWJlbCI+fjUgTWludXRlIEFjdGl2aXR5PC9zcGFuPgoKSW4gc21hbGwgZ3JvdXBzLCBmaWd1cmUgb3V0IHRoZSBmb2xsb3dpbmc6CgotIFdoYXQgYXJlIHRoZSB2YXJpYWJsZXMgaW4gdGhlIGRhdGEgZnJhbWUgYWJvdmU/Ci0gV2hhdCBhcmUgdGhlIHZhbHVlcz8KLSBXaGF0IGFyZSB0aGUgb2JzZXJ2YXRpb25zPwotIEhvdyBzaG91bGQgdGhlIHRhYmxlIGFib3ZlIGJlIHJlLW9yZ2FuaXplZD8KCmBgYHtyIGVjaG8gPSBGfQppeV9haCAlPiUKICBncm91cF9ieShpZHN0cmluZywgc2V4LCBhZ2UsIHBsdF92Y2xhc3MpICU+JQogIHN1bW1hcmlzZShGMSA9IG1lYW4oRjEpLAogICAgICAgICAgICBGMiA9IG1lYW4oRjIpKQpgYGAKCgo8L2Rpdj4KCgo8aHIvPgojIERhdGEgRnJhbWVzCgpTbyBmYXIgd2UgaGF2ZSBkaXNjdXNzZWQgdGhlIGZvbGxvd2luZyB0eXBlcyBvZiB2YWx1ZXMgaW4gUjoKCi0gbnVtZXJpY2FsCi0gY2hhcmFjdGVyCi0gbG9naWNhbAoKQW5kIHdlJ3ZlIGRpc2N1c3NlZCB0aGUgZm9sbG93aW5nIGRhdGEgc3RydWN0dXJlcy4KCi0gdmVjdG9ycwoKSGVyZSwgd2UnbGwgY292ZXIgb25lIG5ldyBkYXRhIHN0cnVjdHVyZToKCi0gZGF0YSBmcmFtZXMKCkRhdGEgRnJhbWVzIGFyZSB0aGUgZGF0YSBzdHJ1Y3R1cmUgd2UnbGwgYmUgdXNpbmcgdGhlIG1vc3QgaW4gUi4KV2hlbiB5b3UgYmVnaW4gdGhpbmtpbmcgYWJvdXQgZGF0YSBmcmFtZXMsIGEgdXNlZnVsIHN0YXJ0aW5nIHBsYWNlIGlzIHRvIHRoaW5rIG9mIHRoZW0gYXMgc3ByZWFkc2hlZXRzLCB3aXRoIGNvbHVtbnMgYW5kIHJvd3MgKGJ1dCB3ZSdsbCBldmVudHVhbGx5IGFiYW5kb24gc3ByZWFkc2hlZXQgdGhpbmtpbmcpLgpMZXQncyBzdGFydCBvdXQgYnkgY3JlYXRpbmcgYSB2ZXJ5IHNpbXBsZSBkYXRhIGZyYW1lIHVzaW5nIHRoZSBgZGF0YS5mcmFtZSgpYCBmdW5jdGlvbi4KCmBgYHtyfQogIHBpdGNoIDwtIGRhdGEuZnJhbWUoc3BlYWtlcl9uYW1lcyA9IGMoIkNoYXJsaWUiLCAiU2t5bGVyIiwgIlNhd3llciIsICJKYW1pZSIpLAogICAgICAgICAgICAgICAgICAgICAgYWdlcyA9IGMoMTgsIDM1LCA0MSwgNjIpLAogICAgICAgICAgICAgICAgICAgICAgRjAgPSBjKDExNCwgMTg5LCAxODksIDE5OSkpCgogIHBpdGNoCmBgYAoKCiMjIEZpbmRpbmcgeW91ciB3YXkgYXJvdW5kIAoKVGhlIGBwaXRjaGAgZGF0YSBmcmFtZSBoYXMgZm91ciByb3dzLCBhbmQgdGhyZWUgY29sdW1ucy4KVGhlIHJvd3MgYXJlIGp1c3QgbnVtYmVyZWQgMSB0aHJvdWdoIDQsIGFuZCB0aGUgdGhyZWUgY29sdW1ucyBhcmUgbmFtZWQgYHNwZWFrZXJfbmFtZXNgLCBgYWdlc2AgYW5kIGBGMGAuClRvIGZpbmQgb3V0IGhvdyBtYW55IHJvd3MgYW5kIGNvbHVtbnMgYSBkYXRhIGZyYW1lIGhhcywgeW91IGNhbiB1c2UgdGhlIGBucm93KClgIGFuZCBgbmNvbCgpYCBmdW5jdGlvbnMuCgoKCmBgYHtyfQogIG5yb3cocGl0Y2gpCiAgbmNvbChwaXRjaCkKYGBgCgpNb3N0IGRhdGEgZnJhbWVzIHlvdSdyZSBnb2luZyB0byB3b3JrIHdpdGggaGF2ZSBhIGxvdCBtb3JlIHJvd3MgdGhhbiB0aGF0LgpGb3IgZXhhbXBsZSwgYGl5X2FoYCBpcyBhIGRhdGEgZnJhbWUgdGhhdCBpcyBidW5kbGVkIGluIHRoZSBgbHNhMjAxN2AgcGFja2FnZS4KCmBgYHtyfQogIG5yb3coaXlfYWgpCmBgYAoKVGhhdCdzIHRvbyBtYW55IHJvd3MgdG8gbG9vayBhdCBqdXN0IGluIHRoZSBjb25zb2xlLgpPbmUgb3B0aW9uIGlzIHRvIHVzZSB0aGUgYGhlYWQoKWAgZnVuY3Rpb24sIHRoYXQganVzdCBwcmludHMgdGhlIGZpcnN0IDYgcm93cy4KCmBgYHtyfQogIGhlYWQoaXlfYWgpCmBgYAoKQW5vdGhlciBvcHRpb24gaXMgdG8gdXNlIHRoZSBgc3VtbWFyeSgpYCBmdW5jdGlvbi4gCgpgYGB7cn0KICBzdW1tYXJ5KGl5X2FoKQpgYGAKCmBzdW1tYXJ5KClgIGlzIGEgZnVuY3Rpb24gdGhhdCB3b3JrcyBvbiBhbG1vc3QgZXZlcnkga2luZCBvZiBvYmplY3QuCgoKIyMgSW5kZXhpbmcgRGF0YSBGcmFtZXMKClNpbmNlIGRhdGEgZnJhbWVzIGFyZSAyIGRpbWVuc2lvbmFsIChyb3dzIGFyZSBvbmUgZGltZW5zaW9uLCBjb2x1bW5zIGFyZSBhbm90aGVyKSwgdGhlIHdheSB5b3UgaW5kZXggdGhlbSBpcyBhIGxpdHRsZSBiaXQgbW9yZSBjb21wbGljYXRlZCB0aGFuIHdpdGggdmVjdG9ycy4KSXQgc3RpbGwgdXNlcyBzcXVhcmUgYnJhY2tldHMsIHRob3VnaCwgYnV0IHRoZXNlIHNxdWFyZSBicmFja2V0cyBoYXZlIHR3byBwb3NpdGlvbnM6Cgo8ZGl2IHN0eWxlPSJmb250LWZhbWlseTptb25vc3BhY2U7Zm9udC1zaXplOnh4LWxhcmdlO3RleHQtYWxpZ246Y2VudGVyOyI+CjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5kZjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5bPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5yb3cgbnVtYmVyPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjpyZWQiPiw8L3NwYW4+IDxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5jb2x1bW4gbnVtYmVyPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjpyZWQiPl08L3NwYW4+CjwvZGl2PgoKSWYgeW91IHNwZWNpZnkgYSBzcGVjaWZpYyByb3cgbnVtYmVyLCBidXQgbGVhdmUgdGhlIGNvbHVtbiBudW1iZXIgYmxhbmssIHlvdSdsbCBnZXQgYmFjayB0aGF0IHJvdyBhbmQgYWxsIGNvbHVtbnMuCgpgYGB7cn0KICBwaXRjaFsxLF0KYGBgCgpBbHRlcm5hdGl2ZWx5LCBpZiB5b3Ugc3BlY2lmeSBqdXN0IHRoZSBjb2x1bW4gbnVtYmVyLCBidXQgbGVhdmUgdGhlIHJvd3MgYmxhbmssIHlvdSdsbCBnZXQgYmFjayBhbGwgb2YgdGhlIHZhbHVlcyBmb3IgdGhhdCBjb2x1bW4uCmBgYHtyfQogIHBpdGNoWywyXQpgYGAKCldoZW4geW91IHNwZWNpZnkgYm90aCwgeW91IGdldCBiYWNrIHRoZSB2YWx1ZSBpbiB0aGUgc3BlY2lmaWVkIHJvdyBhbmQgY29sdW1uCmBgYHtyfQogIHBpdGNoWzEsMl0KYGBgCgpIb3dldmVyLCB0aGVyZSBpcyBhIHNwZWNpYWwgaW5kZXhpbmcgb3BlcmF0b3IgZm9yIGRhdGEgZnJhbWVzIHRoYXQgdGFrZSBhZHZhbnRhZ2Ugb2YgdGhlaXIgbmFtZWQgY29sdW1uczogYCRgLgoKPGRpdiBzdHlsZT0iZm9udC1mYW1pbHk6bW9ub3NwYWNlO2ZvbnQtc2l6ZTp4eC1sYXJnZTt0ZXh0LWFsaWduOmNlbnRlciI+CjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5kZjwvc3Bhbj48c3BhbiBzdHlsZT0iY29sb3I6cmVkIj4kPC9zcGFuPjxzcGFuIHN0eWxlPSJjb2xvcjojNzQ3NDc0Ij5jb2x1bW5fbmFtZTwvc3Bhbj4KPC9kaXY+CgpgYGB7cn0KICBwaXRjaCRzcGVha2VyX25hbWVzCmBgYAoKCkFmdGVyIGFjY2Vzc2luZyB0aGUgY29sdW1uIG9mIGEgZGF0YSBmcmFtZSwgeW91IGNhbiBpbmRleCBpdCBqdXN0IGxpa2UgaXQncyBhIHZlY3Rvci4KCmBgYHtyfQogIHBpdGNoJHNwZWFrZXJfbmFtZXNbMV0KYGBgCgpJZiB5b3UgcmVhbGx5IHdhbnQgdG8sIHlvdSBjYW4gZG8gbG9naWNhbCBpbmRleGluZyBvZiBkYXRhIGZyYW1lcyBsaWtlIHNvOgoKYGBge3J9CiAgcGl0Y2hbcGl0Y2gkc3BlYWtlcl9uYW1lcyA9PSAiQ2hhcmxpZSIsIF0KYGBgCgpCdXQgdGhlcmUncyBhbHNvIGEgZnVuY3Rpb24gY2FsbGVkIGBmaWx0ZXIoKWAgdGhhdCB5b3UgY2FuIHVzZSB0byBkbyB0aGUgc2FtZSB0aGluZy4KYGZpbHRlcigpYCB0YWtlcyBhIGRhdGEgZnJhbWUgYXMgaXRzIGZpcnN0IGFyZ3VtZW50LCBhbmQgdGhlbiBhIGxvZ2ljYWwgc3RhdGVtZW50IHJlZmVycmluZyB0byBvbmUgb3IgbW9yZSBvZiB0aGUgZGF0YSBmcmFtZSdzIGNvbHVtbnMuCgoKYGBge3J9CiAgZmlsdGVyKHBpdGNoLCBzcGVha2VyX25hbWVzID09ICJDaGFybGllIikKICBmaWx0ZXIocGl0Y2gsIGFnZXMgPiAxOCwgRjAgPiAxOTApCmBgYAoKPGRpdiBjbGFzcyA9ICJib3ggYnJlYWsiPgo8c3BhbiBjbGFzcz0iYmlnLWxhYmVsIj5+NSBNaW51dGUgQWN0aXZpdHk8L3NwYW4+CgpGaXJzdCwgcmV2aWV3IHRoZSBkb2N1bWVudGF0aW9uIG9mIHRoZSBgaXlfYWhgIGRhdGEgc2V0IHdpdGggYD9peV9haGAuIFVzaW5nIGBmaWx0ZXIoKWAgYW5kIGBucm93KClgLCBmaW5kIG91dCB3aGF0IHBlcmNlbnQgb2YgL2k6LyB0b2tlbnMgaGF2ZSBhIGR1cmF0aW9uIGxlc3MgdGhhbiA5MG1zICgwLjA5cykuCgo8L2Rpdj4KCjxoci8+CiMgUmVhZGluZyBEYXRhIGludG8gUgoKUiBjYW4gZWFzaWx5IHJlYWQgY29tbWEtc2VwYXJhdGVkICguY3N2KSBmaWxlcyBhbmQgdGFiLWRlbGltaXRlZCBmaWxlcyBpbnRvIGl0cyBtZW1vcnkuXltNeSBwZXJzb25hbCBhZXN0aGV0aWMgcHJlZmVyZW5jZSBpcyBmb3IgdGFiLWRlbGltaXRlZCBmaWxlcy5dCllvdSBjYW4gcmVhZCB0aGVtIGluIHdpdGggYHJlYWQuY3N2KClgIGFuZCBgcmVhZC5kZWxpbSgpYCwgcmVzcGVjdGl2ZWx5LgpJZiB5b3VyIGRhdGEgaXMgdW5hdm9pZGFibHkgaW4gYW4gRXhjZWwgc3ByZWFkc2hlZXQsIHRoZXJlIGlzIGEgcGFja2FnZSBjYWxsZWQgYHJlYWR4bGAgd2l0aCBhIGZ1bmN0aW9uIGNhbGxlZCBgcmVhZF9leGNlbCgpYApJZiB5b3UgaGF2ZSB0aGUgYHJlYWR4bGAgcGFja2FnZSBpbnN0YWxsZWQsIEkgKnN0cm9uZ2x5KiByZWNvbW1lbmQgcmVhZGluZyBvdmVyIGl0cyBkb2N1bWVudGF0aW9uIG9uIHNoZWV0IGdlb21ldHJ5IGJ5IGNhbGxpbmcgdXAgdGhlIHZpZ25ldHRlIGxpa2Ugc286CgpgYGB7cn0KdmlnbmV0dGUoInNoZWV0LWdlb21ldHJ5IiwgcGFja2FnZSA9ICJyZWFkeGwiKQpgYGAKCioqTGFzdCBNaW51dGUgVXBkYXRlKio6IFRoZXJlIGlzIGFsc28gcGFja2FnZSBmb3IgcmVhZGluZyBkYXRhIGluIGZyb20gZ29vZ2xlIHNwcmVhZHNoZWV0cyBbaHR0cHM6Ly9naXRodWIuY29tL2plbm55YmMvZ29vZ2xlc2hlZXRzXShodHRwczovL2dpdGh1Yi5jb20vamVubnliYy9nb29nbGVzaGVldHMpLiBJIGhhdmVuJ3QgdXNlZCBpdCB5ZXQsIGJ1dCBpdCdzIGdvdHRlbiBnb29kIHJldmlld3MuCgoKV2hlbiBsb2FkaW5nIGEgZGF0YSBmaWxlIGludG8gUiwgeW91IGFyZSBqdXN0IGxvYWRpbmcgaXQgaW50byB0aGUgUiB3b3Jrc3BhY2UuIApBbnkgYWx0ZXJhdGlvbnMgb3IgbW9kaWZpY2F0aW9ucyB5b3UgbWFrZSB0byB0aGUgZGF0YSBmcmFtZSB3aWxsIG5vdCBiZSByZWZsZWN0ZWQgaW4gdGhlIGZpbGUgaW4geW91ciBzeXN0ZW0sICpqdXN0KiBpbiB0aGUgY29weSBpbiB0aGUgUiB3b3Jrc3BhY2UuCgpUaGUgdHJpY2t5IHRoaW5nIG5vdyBpcyB0aGF0IHRoZSB3YXkgdGhhdCBmZWVscyBtb3N0IG5hdHVyYWwgb3Igbm9ybWFsIGZvciB5b3UgdG8gb3JnYW5pemUgYW5kIG5hbWUgeW91ciBmaWxlcyBhbmQgZm9sZGVycyBkb2Vzbid0IG5lY2Vzc2FyaWx5IHRyYW5zbGF0ZSBpbnRvIGEgZ29vZCB3YXkgZm9yIFIgKG9yIG90aGVyIHByb2dyYW1taW5nIGxhbmd1YWdlcykgdG8gbG9vayBhdCB0aGVtLgpJbiBvcmRlciB0byBsb2FkIGEgZmlsZSBpbnRvIFIsIHlvdSBuZWVkIHRvIHByb3ZpZGUgYHJlYWQuY3N2KClgIG9yIGByZWFkLmRlbGltKClgIHdpdGggdGhlICJwYXRoIiB0byB0aGUgZmlsZSwgd2hpY2ggaXMganVzdCBhIHRleHQgc3RyaW5nLiAKCkZvciBleGFtcGxlLCBoZXJlJ3MgYSBzY3JlZW5zaG90IG9mIGEgZGF0YSBmaWxlIEknZCBsaWtlIHRvIGxvYWQgaW50byBSLgoKIVtdKGZpZ3VyZXMvZmlsZS5wbmcpCgpJIGhhdmUgdGhlIG9wdGlvbiB0dXJuZWQgb24gaW4gbXkgc3lzdGVtIHRvIHNlZSB0aGUgZnVsbCBwYXRoIGF0IHRoZSBib3R0b20gb2YgdGhlIGZpbGUgd2luZG93LCBzbyB5b3UgY2FuIHNlZSBhIGZ1bGwgbGlzdCBvZiBhbGwgb2YgdGhlIGZvbGRlcnMgdGhpcyBkYXRhIGZpbGUgaXMgZW1iZWRkZWQgaW4uCkluIG9yZGVyIHRvIHJlYWQgdGhpcyBkYXRhIGludG8gUiwgeW91IG5lZWQgdG8gdHlwZSBvdXQgdGhlIGZ1bGwgcGF0aCwgYWx0aG91Z2ggYSBuaWNlIHRoaW5nIGFib3V0IAoKCmBgYHtyIGV2YWwgPSBGfQogIGpvZV92b3dlbHMgPC0gcmVhZC5jc3YoIn4vb3duQ2xvdWQvRG9jU3luY1VvRS9Db3Vyc2VzL0xTQS9kYXRhL2pvZV92b3dlbHMuY3N2IikKYGBgCgpJZiB5b3UncmUgbm90IHN1cmUgd2hhdCBpdCBsb29rcyBsaWtlIG9uIHlvdXIgc3lzdGVtLCB1c2UgdGhlIGBmaWxlLmNob29zZSgpYCBmdW5jdGlvbi4KCmBgYHtyIGV2YWwgPSBGfQogIGZpbGUuY2hvb3NlKCkKYGBgCgpUaGF0J2xsIGxhdW5jaCB0aGUgZGVmYXVsdCB2aXN1YWwgZmlsZSBicm93c2VyIGZvciB5b3VyIHN5c3RlbS4KQWZ0ZXIgYnJvd3NpbmcgYXJvdW5kIGFuZCBjbGlja2luZyBvbiBhIGZpbGUsIGBmaWxlLmNob29zZSgpYCB3aWxsIHByaW50IHRoZSBjaGFyYWN0ZXIgc3RyaW5nIHRoYXQgcmVwcmVzZW50cyB0aGUgcGF0aCB0byB0aGF0IGZpbGUgaW50byB0aGUgY29uc29sZS4KCjxkaXYgY2xhc3MgPSAiYm94IGh5Z2llbmUiPgo8c3BhbiBjbGFzcyA9ICJsYWJlbCI+SHlnaWVuZTwvc3Bhbj4KCkRvbid0IHJlbHkgaGVhdmlseSBvbiBgZmlsZS5jaG9vc2UoKWAuIFNvbWV0aW1lcywgSSd2ZSBzZWVuIFIgc2NyaXB0cyB3aXRoIHRoZSBmb2xsb3dpbmcgbGluZSBvZiBjb2RlIGluIGl0OgoKYGBge3J9CmRhdGEgPC0gcmVhZC5jc3YoZmlsZS5jaG9vc2UoKSkKYGBgCgpQbGVhc2UgbmV2ZXIgZG8gdGhpcy4KSSB3b3VsZCBjYXV0aW9uIGFnYWluc3QgdXNpbmcgaXQgaW4gYW55IGNvZGUsIHNjcmlwdHMgb3Igbm90ZWJvb2tzIGF0IGFsbC4gT25seSBldmVyIHVzZSBpdCB0byByZWZyZXNoIHlvdXIgbWVtb3J5IG9mIHdoZXJlIHlvdXIgZGF0YSBpcyBsb2NhdGVkLiBCeSBhbHdheXMgd3JpdGluZyBvdXQgdGhlIHRoZSB0ZXh0IG9mIHRoZSBwYXRoIHRvIHRoZSBkYXRhLCB5b3UKCi0gcHJvZHVjZSBtb3JlIHRyYW5zcGFyZW50IGNvZGUKLSBhbGxvdyB5b3Vyc2VsZiB0byByZS1ydW4geW91ciBhbmFseXNpcyB3aXRob3V0IG5lZWRpbmcgdG8gY2xpY2sgYXJvdW5kCi0gZW5zdXJlIHRoYXQgeW91J3JlIHVzaW5nIHRoZSBzYW1lIGRhdGEgZmlsZSBldmVyeSBzaW5nbGUgdGltZQoKPC9kaXY+CgpPbmUgcHJldHR5IGNvb2wgdGhpbmcgaXMgdGhhdCBpZiBhIGRhdGEgZmlsZSBpcyB1cCBvbiBhIHdlYnNpdGUgc29tZXdoZXJlLCB5b3UgY2FuIGp1c3QgYWNjZXNzIGl0IGJ5IHBhc3NpbmcgdGhlIHVybCB0byBgcmVhZC5jc3YoKWAgb3IgYHJlYWQuZGVsaW0oKWAuXltUaGlzIGRvZXNuJ3Qgd29yayBpZiB0aGUgZmlsZSBpcyBiZWhpbmQgZW5jcnlwdGlvbiwgaS5lLiBpZiBpdCBiZWdpbnMgd2l0aCBgaHR0cHM6Ly9gLl0KSGVyZSBpcyBzb21lIHNhbXBsZSBkYXRhIG9uIHRoZSBEb25uZXIgUGFydHkuXlsiVGhlIERvbm5lciBQYXJ0eSAoc29tZXRpbWVzIGNhbGxlZCB0aGUgRG9ubmVyLVJlZWQgUGFydHkpIHdhcyBhIGdyb3VwIG9mIEFtZXJpY2FuIHBpb25lZXIgbWlncmFudHMgd2hvIHNldCBvdXQgZm9yIENhbGlmb3JuaWEgaW4gYSB3YWdvbiB0cmFpbi4gRGVsYXllZCBieSBhIHNlcmllcyBvZiBtaXNoYXBzLCB0aGV5IHNwZW50IHRoZSB3aW50ZXIgb2YgMTg0NuKAkzQ3IHNub3dib3VuZCBpbiB0aGUgU2llcnJhIE5ldmFkYXMuIFNvbWUgb2YgdGhlIG1pZ3JhbnRzIHJlc29ydGVkIHRvIGNhbm5pYmFsaXNtIHRvIHN1cnZpdmUsIGVhdGluZyB0aG9zZSB3aG8gaGFkIHN1Y2N1bWJlZCB0byBzdGFydmF0aW9uIGFuZCBzaWNrbmVzcy4iCltodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9Eb25uZXJfUGFydHldKGh0dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL0Rvbm5lcl9QYXJ0eSkKXQoKYGBge3J9CiAgZG9ubmVyIDwtIHJlYWQuY3N2KCJodHRwOi8vam9mcmh3bGQuZ2l0aHViLmlvL2RhdGEvZG9ubmVyLmNzdiIpCiAgaGVhZChkb25uZXIpCmBgYAoKPGRpdiBjbGFzcyA9ICJib3ggYnJlYWsiPgogIDxzcGFuIGNsYXNzID0gImJpZy1sYWJlbCI+fjUgbWludXRlIGFjdGl2aXR5PC9zcGFuPiAKICAKRG93bmxvYWQgdGhlIGZpbGUgYGpvZV92b3dlbHMuY3N2YCBmcm9tIHRoZSBjb3Vyc2UgQ2FudmFzLiBTYXZlIGl0IHRvIHRoZSBkYXRhIGRpcmVjdG9yeSBmb3IgdGhlIGNvdXJzZSwgb3Igd2hlcmV2ZXIgeW91IHdvdWxkIGxpa2UgdG8ga2VlcCBpdC4gUmVhZCBpdCBpbnRvIFIuIFdoYXQncyBteSBtZWFuIEYxIGFuZCBGMiBhY3Jvc3MgYWxsIG9mIG15IHZvd2Vscz8KCjwvZGl2PgoKCjxoci8+CgojIENsZWFuaW5nIHVwIGRhdGEKCldlJ3ZlIGRpc2N1c3NlZCBob3cgZGF0YSBvdWdodCB0byBiZSB0aWRpbHkgb3JnYW5pemVkLCBhbmQgd2UndmUgbm93IGdvbmUgb3ZlciBob3cgdG8gbG9hZCBkYXRhLCBhbmQgbWluaW1hbGx5IGV4cGxvcmUgZGF0YWZyYW1lcyBpbiBSLiBMZXQncyBxdWlja2x5IGdvIG92ZXIgaG93IHRvIHRpZHkgdXAgbWVzc3kgZGF0YSBhIGxpdHRsZS4KCkZpcnN0LCBsZXQncyBsb29rIGF0IHRoZSB3aWRlIGBpeV9haF93aWRlYCBkYXRhZnJhbWUsIHdoaWNoIGlzIHBhcnQgb2YgdGhlIGBsc2EyMDE3YCBwYWNrYWdlLgoKYGBge3J9Cml5X2FoX3dpZGUKYGBgCgoKVGhlIHByb2JsZW0gd2l0aCB0aGlzIGRhdGEgaXMKCi0gVGhlcmUgYXJlICp2YWx1ZXMqIHNwcmVhZCBhY3Jvc3MgdGhlIGNvbHVtbnMuCi0gSW5kaXZpZHVhbCBjb2x1bW4gbmFtZXMgaGF2ZSBjb21iaW5lZCB0aGVzZSAqdmFsdWVzKiB3aXRoIHNvbWUgKnZhcmlhYmxlcyouCgpHZXR0aW5nIHRvIGEgdGlkaWVyIGZvcm1hdCBvZiB0aGUgZGF0YSB3aWxsIGludm9sdmUgYSB0aHJlZSBzdGVwIHByb2Nlc3M6CgoxLiBDb252ZXJ0aW5nIHRoaXMgd2lkZSBkYXRhIGZvcm1hdCB0byBhIGxvbmcgZGF0YSBmb3JtYXQuCjIuIFNlcGFyYXRpbmcgdGhlIHZvd2VsIGNsYXNzIHZhbHVlcyBmcm9tIHRoZSBmb3JtYW50IHZhcmlhYmxlLgozLiBTcHJlYWRpbmcgdGhlIGZvcm1hbnQgdmFyaWFibGVzIGJhY2sgb3V0IGFsb25nIHRoZSBjb2x1bW4gc3BhY2UuJwoKV2UgY2FuIGRvIHRoaXMgZWFzaWx5IHdpdGggdGhlIGZ1bmN0aW9ucyBgZ2F0aGVyKClgLCBgc2VwYXJhdGUoKWAgYW5kIGBzcHJlYWQoKWAgZnJvbSB0aGUgYHRpZHlyYCBwYWNrYWdlLgoKRm9yIGEgc21hbGxlciBpbGx1c3RyYXRpdmUgcHVycG9zZSBmb3IgcGVvcGxlIHdobyBtYXkgZmVlbCB1bmVhc3kgYWJvdXQgdm93ZWxzIGFuZCBmb3JtYW50cywgSSdsbCBiZSBpbGx1c3RyYXRpbmcgZWFjaCBvZiB0aGVzZSBzdGVwcyB3aXRoIGEgc2ltcGxlciBkYXRhIHNldCBhYm91dCBob3cgbWFueSBhcHBsZXMgYW5kIG9yYW5nZXMgdHdvIHBlb3BsZSBib3VnaHQsIGFuZCBob3cgbWFueSB0aGV5IGF0ZS4KCmBgYHtyfQpmcnVpdCA8LSBkYXRhLmZyYW1lKHBlcnNvbiA9IGMoIk9ha2xleSIsICJDaGFybGllIiksCiAgICAgICAgICAgICAgICAgYXBwbGVzX2JvdWdodCA9IGMoNSwgMyksCiAgICAgICAgICAgICAgICAgYXBwbGVzX2F0ZSA9IGMoMSwgMiksCiAgICAgICAgICAgICAgICAgb3Jhbmdlc19ib3VnaHQgPSBjKDUsIDQpLAogICAgICAgICAgICAgICAgIG9yYW5nZXNfYXRlID0gYygzLCAzKSkKYGBgCgoKYGBge3IgZWNobyA9IEYsIHJlc3VsdHMgPSAnYXNpcyd9CmxpYnJhcnkoa25pdHIpCmthYmxlKGZydWl0KQpgYGAKCgpOb3RlLCBldmVuIHRob3VnaCB0aGUgY29sdW1uIGxhYmVscyBsb29rIGRpZmZlcmVudCwgdGhpcyBpcyBpcyBhbiBlcXVpdmFsZW50IHRhYmxlIHRvIGZvcm1hdHRpbmcgaW52b2x2aW5nIG1lcmdlZCBjb2x1bW4gbGFiZWwgY2VsbHMuCgo8ZGl2IGNsYXNzID0gImhhbGYtaW1nIj4KCiFbXShmaWd1cmVzL21lcmdlX3RhYi5wbmcpCgo8L2Rpdj4KCiMjIEdhdGhlcmluZyBDb2x1bW5zCgpUaGUgYGdhdGhlcigpYCBmdW5jdGlvbiBtYWtlcyAqd2lkZSogZGF0YSAqbG9uZy4qIEl0IHRha2VzIHRoZSBmb2xsb3dpbmcgYXJndW1lbnRzOgoKPGRpdiBjbGFzcyA9ICJpbGx1c3RyYXRlIj4KZ2F0aGVyKDxzcGFuIGNsYXNzPSJwb3AiPmRhdGE8L3NwYW4+LCA8c3BhbiBjbGFzcyA9ICJwb3AiPmtleTwvc3Bhbj4sIDxzcGFuIGNsYXNzID0gInBvcCI+dmFsdWU8L3NwYW4+LCBjb2xzKQo8L2Rpdj4KCi0gYGRhdGFgCiAgICAtIE9idmlvdXNseSwgdGhlIGRhdGEgeW91IHdhbnQgdG8gcmVzaGFwZS4gbXVzdCBiZSBhIGRhdGEgZnJhbWUuCi0gYGtleWAgYW5kIGB2YWx1ZWAKICAgIC0gVGhlc2UgYXJlIG5ldyBjb2x1bW4gbmFtZXMgdGhhdCB5b3Ugd2FudCB0byBjcmVhdGUuIGBnYXRoZXIoKWAgaXMgZ29pbmcgdG8gdGFrZSB0aGUgY29sdW1uIG5hbWVzIGFuZCBwdXQgdGhlbSBpbiB0aGUgY29sdW1uIHlvdSBnaXZlIHRvIGBrZXlgLCBhbmQgdGhlIHZhbHVlcyBmcm9tIGFsbCB0aGUgY2VsbHMgYW5kIHB1dCB0aGVtIGluIHRoZSBjb2x1bW4geW91IGNhbGwgYHZhbHVlYC4KLSBgY29sc2AKICAgIC0gQW4gaW5kaWNhdGlvbiBvZiB3aGljaCBjb2x1bW5zIHlvdSB3YW50IHRvIGdhdGhlciwgZWl0aGVyIGEgdmVjdG9yIG9mIGNvbHVtbiBuYW1lcywgYSB2ZWN0b3Igb2YgY29sdW1uIG51bWJlcnMsIG9yIHNvbWUgc3BlY2lhbGl6ZWQgbWV0aG9kcyBmb3IgYGdhdGhlcigpYCB0aGF0IHdlJ2xsIGRpc2N1c3MuCiAgICAKCkhlcmUncyBob3cgdGhhdCdsbCB3b3JrIGZvciB0aGUgZnJ1aXQgZGF0YS4gV2UnbGwgdGVsbCBgZ2F0aGVyKClgIHRvIGdhdGhlciBjb2x1bW5zIDIgdGhyb3VnaCA1LgoKYGBge3J9CmZydWl0X2xvbmcgPC0gZ2F0aGVyKGRhdGEgPSBmcnVpdCwKICAgICAgICAgICAgICAgICAgICAga2V5ID0gZnJ1aXRfYmVoYXZpb3IsCiAgICAgICAgICAgICAgICAgICAgIHZhbHVlID0gbnVtYmVyLAogICAgICAgICAgICAgICAgICAgICAyOjUpCmBgYApgYGB7ciBlY2hvPUYsIHJlc3VsdHMgPSAnYXNpcyd9CmthYmxlKGZydWl0X2xvbmcpCmBgYApgZ2F0aGVyKClgIGhhcyByZXR1cm5lZCBhIG5ldyBkYXRhIGZyYW1lLiBJdCBoYXMgY3JlYXRlZCBhIG5ldyBjb2x1bW4gY2FsbGVkIGBmcnVpdF9iZWhhdmlvcmAsIGJlY2F1c2Ugd2UgdG9sZCBpdCB0byB3aXRoIHRoZSBga2V5YCBhcmd1bWVudCwgYW5kIGl0IGhhcyBjcmVhdGVkIGEgbmV3IGNvbHVtbiBjYWxsZWQgYG51bWJlcmAsIGJlY2F1c2Ugd2UgdG9sZCBpdCB0byB3aXRoIHRoZSBgdmFsdWVgIGZ1bmN0aW9uLiBJdCBoYXMgdGFrZW4gYWxsIG9mIHRoZSBjb2x1bW4gbmFtZXMgb2YgdGhlIGNvbHVtbnMgd2UgdG9sZCBpdCB0byBnYXRoZXIsIGFuZCBwdXQgdGhlbSBpbnRvIHRoZSBgZnJ1aXRfYmVoYXZpb3JgIGNvbHVtbiwgYW5kIHRoZSBudW1lcmljIHZhbHVlcyBmcm9tIHRoZSBjb2x1bW5zIHdlIHRvbGQgaXQgdG8gZ2F0aGVyLCBhbmQgcHV0IHRoZW0gaW50byB0aGUgYG51bWJlcmAgY29sdW1uLiBJdCBoYXMgYWxzbyByZXBlYXRlZCB0aGUgcm93cyBvZiB0aGUgb3RoZXIgY29sdW1ucyAoYHBlcnNvbmApIGFzIGxvZ2ljYWxseSBuZWNlc3NhcnkuIAoKTm93LCB3ZSB0b2xkIGl0IHRvIGdhdGhlciBjb2x1bW4gbnVtYmVycyAyIHRocm91Z2ggNSwgYnV0IHRoaXMgd291bGQgaGF2ZSBhbHNvIHdvcmtlZDoKCmBgYHtyfQpnYXRoZXIoZGF0YSA9IGZydWl0LCAKICAgICAgIGtleSA9IGZydWl0X2JlaGF2aW9yLCAKICAgICAgIHZhbHVlID0gbnVtYmVyLCAKICAgICAgIGMoImFwcGxlc19ib3VnaHQiLCJhcHBsZXNfYXRlIiwgIm9yYW5nZXNfYm91Z2h0IiwgIm9yYW5nZXNfYXRlIikpCmBgYAoKCmBnYXRoZXIoKWAgYWxzbyBoYXMgYSBtb3JlIGNvbnZlbmllbnQgbWV0aG9kIG9mIHNwZWNpZnlpbmcgdGhlIGNvbHVtbnMgeW91IHdhbnQgdG8gZ2F0aGVyIGJ5IHBhc3NpbmcgaXQgYSBuYW1lZCByYW5nZSBvZiBjb2x1bW5zLiBXZSB3YW50IHRvIGdhdGhlciBhbGwgY29sdW1ucyBmcm9tIGBhcHBsZXNfYm91Z2h0YCB0byBgb3Jhbmdlc19hdGVgLCBzbyB3ZSBjYW4gdGVsbCBpdCB0byBkbyBzbyB3aXRoIGBhcHBsZXNfYm91Z2h0Om9yYW5nZXNfYXRlYC4KCmBgYHtyfQpnYXRoZXIoZGF0YSA9IGZydWl0LCAKICAgICAgIGtleSA9IGZydWl0X2JlaGF2aW9yLCAKICAgICAgIHZhbHVlID0gbnVtYmVyLCAKICAgICAgIGFwcGxlc19ib3VnaHQ6b3Jhbmdlc19hdGUpCmBgYAoKCk9rLCBsZXQncyBkbyB0aGlzIG5vdyB0byB0aGUgYGl5X2FoX3dpZGVgIGRhdGEsIGdhdGhlcmluZyBhbGwgb2YgdGhlIGNvbHVtbnMgZnJvbSBgYWhfRjFgIHRvIGBpeV9GMmAuCgoKYGBge3J9Cml5X2FoX3N0ZXAxIDwtIGdhdGhlcihkYXRhID0gaXlfYWhfd2lkZSwgCiAgICAgICAgICAgICAgICAgICAgICBrZXkgPSB2b3dlbF9mb3JtYW50LCAKICAgICAgICAgICAgICAgICAgICAgIHZhbHVlID0gaHosIAogICAgICAgICAgICAgICAgICAgICAgYWhfRjE6aXlfRjIpCml5X2FoX3N0ZXAxCmBgYAoKRm9yIHRoZSBmcnVpdCBkYXRhLCB0aGUgb25seSB1bi1nYXRoZXJlZCBjb2x1bW4gd2FzIGBwZXJzb25gLCBidXQgZm9yIGBpeV9haF93aWRlYCwgYGlkc3RyaW5nYCwgYGFnZWAsIGBzZXhgLCBhbmQgYHllYXJgLCB3ZXJlIGFsbCB1bmdhdGhlcmVkLiBIZXJlIHlvdSBjYW4gc2VlIGhvdyBhbGwgcm93cyBvZiB1bmdhdGhlcmVkIGNvbHVtbnMgYXJlIHJlcGVhdGVkIGFzIGxvZ2ljYWxseSBuZWNlc3NhcnkuCgojIyBTZXBhcmF0aW5nIENvbHVtbnMKClRoZXJlIGlzIHN0aWxsIGEgcHJvYmxlbSB3aXRoIGJvdGggdGhlIGBmcnVpdF9sb25nYCBhbmQgdGhlIGBpeV9haF9zdGVwMWAgZGF0YSBmcmFtZXMsIHdoaWNoIGlzIHRoYXQgdHdvIGRpZmZlcmVudCBraW5kcyBvZiBkYXRhIGFyZSBtZXJnZWQgd2l0aGluIG9uZSBjb2x1bW4uIEZvciBgaXlfYWhfc3RlcDFgLCB0aGUgdm93ZWwgY2xhc3MgYW5kIGZvcm1hbnQgdmFyaWFibGUgYXJlIG1lcmdlZCB0b2dldGhlciAoZS5nLiBgYWhfRjFgKSBhbmQgZm9yIGBmcnVpdF9sb25nYCwgdGhlIGZydWl0IGFuZCBiZWhhdmlvciBhcmUgbWVyZ2VkIHRvZ2V0aGVyIChlLmcuIGBhcHBsZV9ib3VnaHRgKS4gV2UgbmVlZCB0byBzZXBhcmF0ZSB0aGVzZSwgd2l0aCBhIHZlcnkgYXB0bHkgbmFtZWQgZnVuY3Rpb24gY2FsbGVkIGBzZXBhcmF0ZSgpYAoKPGRpdiBjbGFzcyA9ICJpbGx1c3RyYXRlIj4Kc2VwYXJhdGUoPHNwYW4gY2xhc3MgPSAicG9wIj5kYXRhPC9zcGFuPiwgPHNwYW4gY2xhc3MgPSAicG9wIj5jb2w8L3NwYW4+LCA8c3BhbiBjbGFzcyA9ICJwb3AiPmludG88L3NwYW4+LCA8c3BhbiBjbGFzcyA9ICJwb3AiPnNlcDwvc3Bhbj4pCjwvZGl2PgoKLSBgZGF0YWAKICAgIC0gQWdhaW4sdGhlIGRhdGEgZnJhbWUgeW91IHdhbnQgdG8gZG8gdGhpcyBzZXBhcmF0aW9uIHRvLgotIGBjb2xgCiAgICAtIFRoZSBuYW1lIG9mIHRoZSBjb2x1bW4geW91IHdhbnQgdG8gc2VwYXJhdGUuCi0gYGludG9gCiAgICAtIEEgY2hhcmFjdGVyIHZlY3RvciBvZiB0aGUgbmV3IGNvbHVtbiBuYW1lcyB5b3Ugd2FudCB0byBjcmVhdGUuCi0gYHNlcGAKICAgIC0gVGhlIGNoYXJhY3RlciBvciByZWdleCBwYXR0ZXJuIHlvdSB3YW50IHRvIHVzZSB0byBzcGxpdCB1cCB0aGUgdmFsdWVzIGluIGBjb2xgLgoKSGVyZSdzIGhvdyBpdCB3b3JrcyBmb3IgYGZydWl0X2xvbmdgLiAKCmBgYHtyfQpmcnVpdF9zZXBhcmF0ZSA8LSBzZXBhcmF0ZShkYXRhID0gZnJ1aXRfbG9uZywKICAgICAgICAgICAgICAgICAgICAgICAgICAgY29sID0gZnJ1aXRfYmVoYXZpb3IsCiAgICAgICAgICAgICAgICAgICAgICAgICAgIGludG8gPSBjKCJmcnVpdCIsICJiZWhhdmlvciIpLAogICAgICAgICAgICAgICAgICAgICAgICAgICBzZXAgPSAiXyIpCmBgYApgYGB7ciBlY2hvID0gRiwgcmVzdWx0cz0nYXNpcyd9CmthYmxlKGZydWl0X3NlcGFyYXRlKQpgYGAKCkl0IGhhcyByZXR1cm5lZCBhIG5ldyBkYXRhIGZyYW1lIHdpdGggdGhlIGBmcnVpdF9iZWhhdmlvcmAgY29sdW1uIHNwbGl0IGludG8gdHdvIG5ldyBjb2x1bW5zLCBuYW1lZCBhZnRlciB3aGF0IEkgcGFzc2VkIHRvIHRoZSBgaW50b2AgYXJndW1lbnQuIEl0IHNwbGl0IHVwIGBmcnVpdF9iZWhhdmlvcmAgYmFzZWQgb24gd2hhdCBJIHBhc3NlZCB0byBgc2VwYCwgd2hpY2ggd2FzIHRoZSB1bmRlcnNjb3JlIGNoYXJhY3Rlci4gCgpMZXQncyBkbyB0aGlzIGZvciBgaXlfYWhfc3RlcDFgIG5vdy4KCgpgYGB7cn0KaXlfYWhfc3RlcDIgPC0gc2VwYXJhdGUoaXlfYWhfc3RlcDEsIAogICAgICAgICAgICAgICAgICAgICAgICB2b3dlbF9mb3JtYW50LCAKICAgICAgICAgICAgICAgICAgICAgICAgaW50byA9IGMoInZvd2VsIiwgImZvcm1hbnQiKSwKICAgICAgICAgICAgICAgICAgICAgICAgc2VwID0gIl8iKQppeV9haF9zdGVwMgpgYGAKCldlIG5vdyBoYXZlIHR3byBzZXBhcmF0ZSBjb2x1bW5zIGZvciBgdm93ZWxgIGFuZCBgZm9ybWFudGAuCgo8ZGl2IGNsYXNzID0gImJveCBoeWdpZW5lIj4KPHNwYW4gY2xhc3MgPSAibGFiZWwiPkh5Z2llbmU8L3NwYW4+CgpJIGhhdmUgYmVlbiB2ZXJ5IGhlbHBmdWwgYW5kIHVzZWQgdW5kZXJzY29yZXMgdG8gbWVyZ2UgdG9nZXRoZXIgdGhlIHZhbHVlcyB3ZSB3YW50IHRvIHNlcGFyYXRlLiBCZSBoZWxwZnVsIHRvIHlvdXJzZWxmLCBhbmQgYmUgY29uc2lzdGVudCBpbiB0aGUgc2VtYW50aWNzIG9mIGhvdyB5b3UgdXNlZCBwb3RlbnRpYWwgZGVsaW1pdGVycyBsaWtlIGAtYCBhbmQgYF9gLiBIZXJlJ3MgYW4gZXhhbXBsZSBvZiBiZWluZyBoZWxwZnVsIHRvIHlvdXJzZWxmOgoKYGBgCnByb2plY3Rfc3ViamVjdF9maXJzdG5hbWUtbGFzdG5hbWUKCkVESV8xX1N0dWFydC1EdWRkaW5nc3RvbgpFRElfMl9Db25ub3ItQmxhY2stTWFjZG93YWxsCkVESV8zX01oYWlyaQpgYGAKVGhpcyBpcyBoZWxwZnVsLCBiZWNhdXNlIHdoZW4geW91IHNlcGFyYXRlIGJ5IHVuZGVyc2NvcmUsIHlvdSdsbCBoYXZlIHNvbWV0aGluZyB0aWR5CgpgYGAKRURJICAgIDEgICAgU3R1YXJ0LUR1ZGRpbmdzdG9uCkVESSAgICAyICAgIENvbm5vci1CbGFjay1NYWNkb3dhbGwKRURJICAgIDMgICAgTWhhaXJpCmBgYAoKSWYgeW91IHVzZWQgYC1gIGZvciBldmVyeXRoaW5nLCB5b3UnbGwgaGF2ZSBjaGFvcyB3aGVuIHlvdSB0cnkgdG8gc2VwYXJhdGUgdGhlbSBiZWNhdXNlIHNvbWUgc3BlYWtlcnMgaGF2ZSAiZG91YmxlIGJhcnJlbGVkIiBuYW1lcywgYW5kIHNvbWUgc3BlYWtlcnMgaGF2ZSBvbmx5IGZpcnN0IG5hbWVzOgoKYGBgCiMgSW5wdXQ6CkVESS0xLVN0dWFydC1EdWRkaW5nc3RvbgpFREktMi1Db25ub3ItQmxhY2stTWFjZG93YWxsCkVESS0zLU1oYWlyaQoKIyBCZWNvbWVzCgpFREkgICAgMSAgICBTdHVhcnQgICAgRHVkZGluZ3N0b24KRURJICAgIDIgICAgQ29ubm9yICAgIEJsYWNrICAgICAgICBNYWNkb3dhbGwKRURJICAgIDMgICAgTWhhaXJpCmBgYAoKVGhpcyBnb2VzIGJleW9uZCBSIHByb2dyYW1taW5nLiBZb3Ugc2hvdWxkIG1ha2Ugc29tZSBkZWNpc2lvbnMgYW5kIHN0aWNrIHdpdGggdGhlbSBmb3IgYWxsIG9mIHlvdXIgZGF0YSBhbmFseXNpcywgaW5jbHVkaW5nIGZpbGUgbmFtaW5nLCBQcmFhdCB0aWVyIG5hbWluZywgZXRjLgoKPC9kaXY+CgojIyBTcHJlYWRpbmcgY29sdW1ucwoKV2UndmUgZ290IG9uZSBsYXN0IHN0ZXAsIHdoaWNoIGlzIHNwcmVhZGluZyB0aGUgdmFsdWVzIGluIHNvbWUgcm93cyBhY3Jvc3MgdGhlIGNvbHVtbiBzcGFjZS4gV2l0aCB0aGUgYGZydWl0YCBkYXRhLCB3ZSBtaWdodCBub3Qgd2FudCBhIGNvbHVtbiBjYWxsZWQgYGJlaGF2aW9yYCwgYnV0IGFjdHVhbGx5IGhhdmUgdHdvIGNvbHVtbnMgY2FsbGVkIGBib3VnaHRgIGFuZCBgYXRlYC4gRm9yIHRoZSB2b3dlbCBkYXRhLCB3ZSBkZWZpbml0ZWx5IGRvbid0IHdhbnQgb25lIGNvbHVtbiBjYWxsZWQgYGZvcm1hbnRgLiBXZSB3YW50IG9uZSBjYWxsZWQgYEYxYCBhbmQgb25lIGNhbGxlZCBgRjJgLiBXZSBjYW4gZG8gdGhpcyB3aXRoIHRoZSBgc3ByZWFkKClgIGZ1bmN0aW9uLgoKPGRpdiBjbGFzcyA9ICJpbGx1c3RyYXRlIj4Kc3ByZWFkKDxzcGFuIGNsYXNzID0gInBvcCI+ZGF0YTwvc3Bhbj4sIDxzcGFuIGNsYXNzID0gInBvcCI+a2V5PC9zcGFuPiwgPHNwYW4gY2xhc3MgPSAicG9wIj52YWx1ZTwvc3Bhbj4pCjwvZGl2PgoKLSBgZGF0YWAKICAgIC0gQWdhaW4sIHRoZSBkYXRhIHdlIHdhbnQgdG8gd29yayB3aXRoLgotIGBrZXlgCiAgICAtIFRoZSBjb2x1bW4gd2hvc2UgdmFsdWVzIHlvdSB3YW50IHRvIHNwcmVhZCBhY3Jvc3MgdGhlIGNvbHVtbiBzcGFjZS4KLSBgdmFsdWVgCiAgICAtIFRoZSBjb2x1bW4gd2l0aCB2YWx1ZXMgdGhhdCB5b3Ugd2FudCB0byBmaWxsIGluIHRoZSBjZWxscy4KCkhlcmUncyBob3cgdGhhdCBsb29rcyB3aXRoIHRoZSBgZnJ1aXRfc2VwYXJhdGVgIGRhdGEuCgpgYGB7cn0KZnJ1aXRfc3ByZWFkIDwtIHNwcmVhZChkYXRhID0gZnJ1aXRfc2VwYXJhdGUsCiAgICAgICAgICAgICAgICAgICAgICAga2V5ID0gYmVoYXZpb3IsCiAgICAgICAgICAgICAgICAgICAgICAgdmFsdWUgPSBudW1iZXIpCmBgYApgYGB7ciBlY2hvID0gRiwgcmVzdWx0cyA9ICdhc2lzJ30Ka2FibGUoZnJ1aXRfc3ByZWFkKQpgYGAKClRoaXMgaGFzIGNyZWF0ZWQgYSBuZXcgZGF0YSBmcmFtZS4gSSB0b2xkIGBzcHJlYWQoKWAgdG8gc3ByZWFkIHRoZSB2YWx1ZXMgaW4gYGJlaGF2aW9yYCBhY3Jvc3MgdGhlIGNvbHVtbiBzcGFjZS4gQmVjYXVzZSBpdCBoYWQgb25seSB0d28gdW5pcXVlIHZhbHVlcyBpbiBpdCAoYGJvdWdodGAgYW5kIGBhdGVgKSwgaXQgaGFzIGNyZWF0ZWQgdHdvIG5ldyBjb2x1bW5zIGNhbGxlZCBgYm91Z2h0YCBhbmQgYGF0ZWAuIEFmdGVyIGNyZWF0aW5nIHRoZXNlIG5ldyBjb2x1bW5zLCBpdCBoYWQgdG8gZmlsbCBpbiB0aGUgbmV3IGNlbGxzIHdpdGggc29tZSB2YWx1ZXMsIGFuZCBJIHRvbGQgaXQgdG8gdXNlIHRoZSB2YWx1ZXMgaW4gYG51bWJlcmAgZm9yIHRoYXQuCgpIZXJlJ3MgaG93IHRoYXQgd29ya3Mgd2l0aCBgaXlfYWhfc3RlcDJgLgoKYGBge3J9Cml5X2FoX3N0ZXAzIDwtIHNwcmVhZChkYXRhID0gaXlfYWhfc3RlcDIsCiAgICAgICAgICAgICAgICAgICAgICBrZXkgPSBmb3JtYW50LAogICAgICAgICAgICAgICAgICAgICAgdmFsdWUgPSBoeikKaXlfYWhfc3RlcDMKYGBgCgpOb3csIHdlJ3ZlIGZpbmFsbHkgZ290dGVuIHRvIGEgdGlkeSBkYXRhIGZvcm1hdC4gSW4gb3VyIG5leHQgbWVldGluZywgd2UnbGwgZGlzY3VzcyBob3cgdG8gY2hhaW4gdGhlc2UgdGhyZWUgZnVuY3Rpb25zIGludG8gb25lIGVhc3kgdG8gcmVhZCBwcm9jZXNzLgoKPGRpdiBjbGFzcyA9ICJib3ggaWRpb20iPgo8c3BhbiBjbGFzcyA9ICJsYWJlbCI+SWRpb208L3NwYW4+CgpZb3UgbWlnaHQgaGF2ZSBub3RpY2VkIHRoYXQgaW4gdGhlIGZ1bmN0aW9ucyBhYm92ZSwgSSd2ZSBwdXQgYSBuZXcgbGluZSBiZXR3ZWVuIGluZGl2aWR1YWwgZnVuY3Rpb24gYXJndW1lbnRzLiBJJ3ZlIGRvbmUgdGhpcyBiZWNhdXNlIHdoaXRlLXNwYWNlIGRvZXNuJ3QgbWF0dGVyIHdoZW4gaXQgY29tZXMgdG8gUi4gSSBjb3VsZCBoYXZlIHdyaXR0ZW4gdGhlc2Ugd2l0aCBqdXN0IHNwYWNlcyBiZXR3ZWVuIGVhY2ggYXJndW1lbnQsIGJ1dCB0aGF0IHdvdWxkIGJlIHRvbyB2aXN1YWxseSBjcm93ZGVkLgoKYGBge3J9CiMgY29tcGFyZQoKIyBPbmUgbGluZQpmcnVpdF9zZXBhcmF0ZSA8LSBzZXBhcmF0ZShkYXRhID0gZnJ1aXRfbG9uZywgY29sID0gZnJ1aXRfYmVoYXZpb3IsIGludG8gPSBjKCJmcnVpdCIsICJiZWhhdmlvciIpLCBzZXAgPSAiXyIpCgojIE5ldyBMaW5lcwpmcnVpdF9zZXBhcmF0ZSA8LSBzZXBhcmF0ZShkYXRhID0gZnJ1aXRfbG9uZywgCiAgICAgICAgICAgICAgICAgICAgICAgICAgIGNvbCA9IGZydWl0X2JlaGF2aW9yLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgaW50byA9IGMoImZydWl0IiwgImJlaGF2aW9yIiksIAogICAgICAgICAgICAgICAgICAgICAgICAgICBzZXAgPSAiXyIpCgpgYGAKCkkgZW5jb3VyYWdlIHlvdSB0byB1c2UgbmV3IGxpbmVzIHNpbWlsYXJseSB0byBnaXZlIHlvdXJzZWxmICJzb21lIHNwYWNlIHRvIGJyZWF0aGUiLiBEb24ndCBiZSBzaHkgYWJvdXQgaXQuIEJ1dCwgaWYgeW91IHB1dCBuZXdsaW5lcyBiZXR3ZWVuICpzb21lKiBhcmd1bWVudHMsIHlvdSBzaG91bGQgcmVhbGx5IHB1dCBuZXcgbGluZXMgYmV0d2VlbiAqYWxsKiBhcmd1bWVudHMuCjwvZGl2Pgo=
- - -
-
- -
- - - - - - - - diff --git a/teaching/courses/2017_lsa/lectures/Session_2.nb.qmd b/teaching/courses/2017_lsa/lectures/Session_2.nb.qmd new file mode 100644 index 0000000..eca33f3 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/Session_2.nb.qmd @@ -0,0 +1,678 @@ +--- +title: "Data and Data Frames" +order: 2 +knitr: + opts_chunk: + error: true + warning: false +--- + +# The Agenda + +- Talk about organizing, structuring & storing your data. +- Review some important data input/output options in R. +- Review about how data frames work in R. + +## Setup + +::: callout-note +## \~2 minute setup + +Make sure that your current RStudio project is set to your course project. Create and save your R notebook for today (I would recommend `02_lecture.Rmd`). Clear the workspace of anything left over from last time with the menu options `Session > Clear Workspace`. + +Load the important packages for today's work: + +```{r} +library(lsa2017) +library(tidyverse) +``` +::: + +------------------------------------------------------------------------ + +# Data Collection and Storage + +## General Principles of Data Collection + +### Over-collect (for some things) + +When collecting data in the first place, over-collect if at all possible or ethical. The world is a very complex place, so there is no way you could cram it all into a bottle, but give it your best shot! If during the course of your data analysis, you find that it would have been really useful to have data on, say, duration, as well as formant frequencies, it becomes costly to recollect that data, especially if you haven't laid the proper trail for yourself. On the other hand, *automation* of acoustic analysis or data processing can cut down on this costliness. + +This doesn't go for personal information on human subjects, though. It's important from an ethics standpoint to ask for everything you'll need, but not more. You don't want to collect an enormous demographic profile on your participants if you won't wind up using it, especially if you know you won't use it to begin with. + +### Preserve HiD Info + +If, for instance, you're collecting data on the effect of voicing on preceding vowel duration, preserve high dimensional data coding, like Lexical Item, or the transcription of the following segment. These high dimensional codings probably won't be too useful for your immediate analysis, but they will allow you to procedurally extract additional features from them at a later time. For example, if you have a column called `fol_seg`, which is just a transcription of the following segment, it is easy create a new column called `manner` with code that looks like this: + +```{r} +table(iy_ah$fol_seg) +``` + +```{r} +iy_ah <- iy_ah %>% + mutate( + manner = recode( + fol_seg, + B = 'stop', + CH = 'affricate', + D = 'stop', + DH = 'fricative', + `F` = 'fricative', + G = 'stop', + HH = 'fricative', + JH = 'affricate', + K = 'stop', + L = 'liquid', + M = 'nasal', + N = 'nasal', + NG = 'nasal', + P = 'stop', + R = 'liquid', + S = 'fricative', + SH = 'fricative', + SP = 'pause', + `T` = 'stop', + TH = 'fricative', + V = 'fricative', + W = 'glide', + Y = 'glide', + Z = 'fricative', + ZH = 'fricative', + .default = 'vowel' + ) + ) +table(iy_ah$manner) +``` + +### Leave A Trail of Crumbs + +Be sure to answer this question: How can I preserve a record of this observation in such a way that I can quickly return to it and gather more data on it if necessary? If you fail to successfully answer this question, then you'll be lost in the woods if you ever want to restudy, and the only way home is to replicate the study from scratch. + +### Give Meaningful Names + +Give meaningful names to both the names of predictor columns, as well as to labels of nominal observations. Keeping a readme describing the data is still a good idea, but at least now the data is approachable at first glance. + +### Distinguish between `0` and `NA` + +I have worked with some spreadsheets where missing data was given a value of `0`, which will mess things up. For example, /oy/ is a fairly rarely occurring phoneme in English, and it's possible that a speaker won't produce any tokens in a short interview. In a spreadsheet of mean F1 and F2 for all vowels, that speaker should get an `NA` for /oy/, **not** `0`. + +## Storing Data + +When we store data, it should be: + +Raw + +: Raw data is the most useful data. It's impossible to move down to smaller granularity from a coarser, summarized granularity. Summary tables etc. are nice for publishing in a paper document, but raw data is what we need for asking novel research questions with old data. + +Open Formatted + +: Do not use proprietary database software for long term storage of your data. I have enough heard stories about interesting data sets that are no longer accessible for research either because the software they are stored in is defunct, or current versions are not backwards compatible. At that point, your data is property of Microsoft, or whoever. Store your data as raw text, delimited in some way (I prefer tabs). + +Consistent + +: I think this is most important when you may have data in many separate files. Each file and its headers should be consistently named and formatted. They should be consistently delimited and commented also. There is nothing worse than inconsistent headers and erratic comments, labels, headers or NA characters in a corpus. (Automation also helps here.) + +Documented + +: Produce a readme describing the data, how it was collected and processed, and describe every variable and its possible values. + +------------------------------------------------------------------------ + +# Structuring Data + +## Breaking Bad Spreadsheet Habits + +Let's start off by looking at a picture of a data organization approach that might look familiar, and is a *very bad* way to do things: + +![](figures/bad_spreadsheet.png){fig-align="center" width="100%"} + +This spreadsheet has a fairly strict organizational structure, but is virtuously hopeless for doing any kind of serious statistical analysis. It's also verging on irreparable using R. This because the data in this spreadsheet is organized to be easy to look at with your eyeballs 👀. + +But looking at neatly organized data in a spreadsheet is not a statistical analysis technique. So we need to start organizing our data in a way that isn't easy to look at, but *is* easy to graph and analyze. + +## Better Habits + +Everyone working with data (in R or otherwise) should read Hadley Wickham's paper on Tidy Data: If you are coming off of organizing your data like the picture above, there are a few guidelines not discussed in that paper, namely: + +#### Follow these rules + +1. The first row of the data *must* be the names of the data columns. +2. All other rows *must* be the data, and nothing else. +3. You cannot use empty rows or empty columns as visual aids to look at the data. +4. The spreadsheet must not contain any summary cells. No final row called "Average" or final column called "Total". We can create these in R, and they make data processing more complicated if they're included in the raw data. + +### Semantics of Data Structure + +In the semantics of data structure Wickham lays out, there are three important primitives: + +1. Variables +2. Values +3. Observations + +#### Defining the primitives + +##### Variables + +Variables are the collections of values of interest in data analysis. For example, let's say you were doing a study on unnormalized vowel space size by just looking at /i:/ and /ɑ/. The variables in that study could be: + +- `speaker` +- `word` +- `phoneme` +- `duration` +- `F1` +- `F2` +- `word_frequency` + +##### Values + +Values are, as the name implies, the possible values that each variable can have, for example: + +- `speaker`: `"Oakley"`, `"Charlie"`, `"Azaria"`, `...` +- `word`: `"street"`, `"thirteen"`, `"not"`, `"got"`, `...` +- `phoneme`: `"iy"`, `"ah"` + +##### Observations + +An observation is the minimal unit across which all variables are collected. For example, in the vowel space study, one observation would be one instance of an uttered vowel for which you record who the speaker was, the word, the duration, F1, F2, etc. + +#### Organizing data with these primitives + +Once you've thought through what the variables, values and observations are for your study, the principle of how to organize them is simple: + +1. Each variable forms a column. +2. Each observation forms a row. + +For the vowel space size study, you might want to wind up with a plot that looks like this: + +```{r} +#| echo: false +#| dev: svg +iy_ah %>% + group_by(idstring, sex, age, plt_vclass) %>% + summarise(F1 = mean(F1), + F2 = mean(F2)) %>% + ggplot(aes(F2, F1, shape = plt_vclass, color = sex))+ + geom_point()+ + scale_y_reverse()+ + scale_x_reverse()+ + scale_color_brewer(palette = "Dark2")+ + scale_shape_discrete("vowel")+ + theme_minimal()+ + coord_fixed() + +``` + +It wouldn't be uncommon to see the data untidily organized like this: + +```{r} +#| echo: false +iy_ah %>% + group_by(idstring, sex, age, plt_vclass) %>% + summarise(F1 = mean(F1), + F2 = mean(F2)) %>% + gather(formant, value, F1:F2)%>% + mutate(var = paste(plt_vclass, formant, sep = "_")) %>% + select(-plt_vclass, -formant)%>% + spread(var, value) +``` + +::: callout-note +## \~5 minute activity + +In small groups, figure out the following: + +- What are the variables in the data frame above? +- What are the values? +- What are the observations? +- How should the table above be re-organized? + +```{r} +#| echo: false +iy_ah %>% + group_by(idstring, sex, age, plt_vclass) %>% + summarise(F1 = mean(F1), + F2 = mean(F2)) +``` +::: + +# Data Frames + +So far we have discussed the following types of values in R: + +- numerical +- character +- logical + +And we've discussed the following data structures. + +- vectors + +Here, we'll cover one new data structure: + +- data frames + +Data Frames are the data structure we'll be using the most in R. When you begin thinking about data frames, a useful starting place is to think of them as spreadsheets, with columns and rows (but we'll eventually abandon spreadsheet thinking). Let's start out by creating a very simple data frame using the `data.frame()` function. + +```{r} +pitch <- data.frame( + speaker_names = c("Charlie", "Skyler", "Sawyer", "Jamie"), + ages = c(18, 35, 41, 62), + F0 = c(114, 189, 189, 199) +) + +pitch +``` + +## Finding your way around + +The `pitch` data frame has four rows, and three columns. The rows are just numbered 1 through 4, and the three columns are named `speaker_names`, `ages` and `F0`. To find out how many rows and columns a data frame has, you can use the `nrow()` and `ncol()` functions. + +```{r} +nrow(pitch) +ncol(pitch) +``` + +Most data frames you're going to work with have a lot more rows than that. For example, `iy_ah` is a data frame that is bundled in the `lsa2017` package. + +```{r} +nrow(iy_ah) +``` + +That's too many rows to look at just in the console. One option is to use the `head()` function, that just prints the first 6 rows. + +```{r} +head(iy_ah) +``` + +Another option is to use the `summary()` function. + +```{r} +summary(iy_ah) +``` + +`summary()` is a function that works on almost every kind of object. + +## Indexing Data Frames + +Since data frames are 2 dimensional (rows are one dimension, columns are another), the way you index them is a little bit more complicated than with vectors. It still uses square brackets, though, but these square brackets have two positions: + +::: {style="font-family:monospace;font-size:xx-large;text-align:center;"} +[df]{style="color:#747474"}[\[]{style="color:red"}[row number]{style="color:#747474"}[,]{style="color:red"} [column number]{style="color:#747474"}[\]]{style="color:red"} +::: + +If you specify a specific row number, but leave the column number blank, you'll get back that row and all columns. + +```{r} +pitch[1,] +``` + +Alternatively, if you specify just the column number, but leave the rows blank, you'll get back all of the values for that column. + +```{r} +pitch[,2] +``` + +When you specify both, you get back the value in the specified row and column + +```{r} +pitch[1,2] +``` + +However, there is a special indexing operator for data frames that take advantage of their named columns: `$`. + +::: {style="font-family:monospace;font-size:xx-large;text-align:center"} +[df]{style="color:#747474"}[\$]{style="color:red"}[column_name]{style="color:#747474"} +::: + +```{r} +pitch$speaker_names +``` + +After accessing the column of a data frame, you can index it just like it's a vector. + +```{r} +pitch$speaker_names[1] +``` + +If you really want to, you can do logical indexing of data frames like so: + +```{r} +pitch[pitch$speaker_names == "Charlie", ] +``` + +But there's also a function called `filter()` that you can use to do the same thing. `filter()` takes a data frame as its first argument, and then a logical statement referring to one or more of the data frame's columns. + +```{r} +filter(pitch, speaker_names == "Charlie") +filter(pitch, ages > 18, F0 > 190) +``` + +::: callout-note +## \~5 minute activity + +First, review the documentation of the `iy_ah` data set with `?iy_ah`. Using `filter()` and `nrow()`, find out what percent of /i:/ tokens have a duration less than 90ms (0.09s). +::: + +------------------------------------------------------------------------ + +# Reading Data into R + +R can easily read comma-separated (.csv) files and tab-delimited files into its memory.[^1] You can read them in with `read.csv()` and `read.delim()`, respectively. If your data is unavoidably in an Excel spreadsheet, there is a package called `readxl` with a function called `read_excel()` If you have the `readxl` package installed, I *strongly* recommend reading over its documentation on sheet geometry by calling up the vignette like so: + +[^1]: My personal aesthetic preference is for tab-delimited files. + +```{r} +#| eval: false +vignette("sheet-geometry", package = "readxl") +``` + +**Last Minute Update**: There is also package for reading data in from google spreadsheets . I haven't used it yet, but it's gotten good reviews. + +When loading a data file into R, you are just loading it into the R workspace. Any alterations or modifications you make to the data frame will not be reflected in the file in your system, *just* in the copy in the R workspace. + +The tricky thing now is that the way that feels most natural or normal for you to organize and name your files and folders doesn't necessarily translate into a good way for R (or other programming languages) to look at them. In order to load a file into R, you need to provide `read.csv()` or `read.delim()` with the "path" to the file, which is just a text string. + +For example, here's a screenshot of a data file I'd like to load into R. + +![](figures/file.png){fig-align="center" width="100%"} + +I have the option turned on in my system to see the full path at the bottom of the file window, so you can see a full list of all of the folders this data file is embedded in. In order to read this data into R, you need to type out the full path, although a nice thing about + +```{r} +#| eval: false +joe_vowels <- read.csv("~/ownCloud/DocSyncUoE/Courses/LSA/data/joe_vowels.csv") +``` + +If you're not sure what it looks like on your system, use the `file.choose()` function. + +```{r eval = F} +#| eval: false +file.choose() +``` + +That'll launch the default visual file browser for your system. After browsing around and clicking on a file, `file.choose()` will print the character string that represents the path to that file into the console. + +::: callout-tip +## Hygiene + +Don't rely heavily on `file.choose()`. Sometimes, I've seen R scripts with the following line of code in it: + +```{r} +#| eval: false +data <- read.csv(file.choose()) +``` + +Please never do this. I would caution against using it in any code, scripts or notebooks at all. Only ever use it to refresh your memory of where your data is located. By always writing out the the text of the path to the data, you + +- produce more transparent code +- allow yourself to re-run your analysis without needing to click around +- ensure that you're using the same data file every single time +::: + +One pretty cool thing is that if a data file is up on a website somewhere, you can just access it by passing the url to `read.csv()` or `read.delim()`.[^2] Here is some sample data on the Donner Party.[^3] + +[^2]: This doesn't work if the file is behind encryption, i.e. if it begins with `https://`. + +[^3]: "The Donner Party (sometimes called the Donner-Reed Party) was a group of American pioneer migrants who set out for California in a wagon train. Delayed by a series of mishaps, they spent the winter of 1846--47 snowbound in the Sierra Nevadas. Some of the migrants resorted to cannibalism to survive, eating those who had succumbed to starvation and sickness." + +```{r} +donner <- read.csv("https://jofrhwld.github.io/data/donner.csv") +head(donner) +``` + +::: {.box .break} +[\~5 minute activity]{.big-label} + +Download the file `joe_vowels.csv` from the course Canvas. Save it to the data directory for the course, or wherever you would like to keep it. Read it into R. What's my mean F1 and F2 across all of my vowels? +::: + +------------------------------------------------------------------------ + +# Cleaning up data + +We've discussed how data ought to be tidily organized, and we've now gone over how to load data, and minimally explore dataframes in R. Let's quickly go over how to tidy up messy data a little. + +First, let's look at the wide `iy_ah_wide` dataframe, which is part of the `lsa2017` package. + +```{r} +iy_ah_wide +``` + +The problem with this data is + +- There are *values* spread across the columns. +- Individual column names have combined these *values* with some *variables*. + +Getting to a tidier format of the data will involve a three step process: + +1. Converting this wide data format to a long data format. +2. Separating the vowel class values from the formant variable. +3. Spreading the formant variables back out along the column space.' + +We can do this easily with the functions `gather()`, `separate()` and `spread()` from the `tidyr` package. + +For a smaller illustrative purpose for people who may feel uneasy about vowels and formants, I'll be illustrating each of these steps with a simpler data set about how many apples and oranges two people bought, and how many they ate. + +```{r} +fruit <- data.frame(person = c("Oakley", "Charlie"), + apples_bought = c(5, 3), + apples_ate = c(1, 2), + oranges_bought = c(5, 4), + oranges_ate = c(3, 3)) +``` + +```{r echo = F, results = 'asis'} +library(knitr) +kable(fruit) +``` + +Note, even though the column labels look different, this is is an equivalent table to formatting involving merged column label cells. + +::: half-img +![](figures/merge_tab.png) +::: + +## Gathering Columns + +The `gather()` function makes *wide* data *long.* It takes the following arguments: + +::: illustrate +gather([data]{.pop}, [key]{.pop}, [value]{.pop}, cols) +::: + +- `data` + - Obviously, the data you want to reshape. must be a data frame. +- `key` and `value` + - These are new column names that you want to create. `gather()` is going to take the column names and put them in the column you give to `key`, and the values from all the cells and put them in the column you call `value`. +- `cols` + - An indication of which columns you want to gather, either a vector of column names, a vector of column numbers, or some specialized methods for `gather()` that we'll discuss. + +Here's how that'll work for the fruit data. We'll tell `gather()` to gather columns 2 through 5. + +```{r} +fruit_long <- gather(data = fruit, + key = fruit_behavior, + value = number, + 2:5) +``` + +```{r echo=F, results = 'asis'} +kable(fruit_long) +``` + +`gather()` has returned a new data frame. It has created a new column called `fruit_behavior`, because we told it to with the `key` argument, and it has created a new column called `number`, because we told it to with the `value` function. It has taken all of the column names of the columns we told it to gather, and put them into the `fruit_behavior` column, and the numeric values from the columns we told it to gather, and put them into the `number` column. It has also repeated the rows of the other columns (`person`) as logically necessary. + +Now, we told it to gather column numbers 2 through 5, but this would have also worked: + +```{r} +gather(data = fruit, + key = fruit_behavior, + value = number, + c("apples_bought","apples_ate", "oranges_bought", "oranges_ate")) +``` + +`gather()` also has a more convenient method of specifying the columns you want to gather by passing it a named range of columns. We want to gather all columns from `apples_bought` to `oranges_ate`, so we can tell it to do so with `apples_bought:oranges_ate`. + +```{r} +gather(data = fruit, + key = fruit_behavior, + value = number, + apples_bought:oranges_ate) +``` + +Ok, let's do this now to the `iy_ah_wide` data, gathering all of the columns from `ah_F1` to `iy_F2`. + +```{r} +iy_ah_step1 <- gather(data = iy_ah_wide, + key = vowel_formant, + value = hz, + ah_F1:iy_F2) +iy_ah_step1 +``` + +For the fruit data, the only un-gathered column was `person`, but for `iy_ah_wide`, `idstring`, `age`, `sex`, and `year`, were all ungathered. Here you can see how all rows of ungathered columns are repeated as logically necessary. + +## Separating Columns + +There is still a problem with both the `fruit_long` and the `iy_ah_step1` data frames, which is that two different kinds of data are merged within one column. For `iy_ah_step1`, the vowel class and formant variable are merged together (e.g. `ah_F1`) and for `fruit_long`, the fruit and behavior are merged together (e.g. `apple_bought`). We need to separate these, with a very aptly named function called `separate()` + +::: illustrate +separate([data]{.pop}, [col]{.pop}, [into]{.pop}, [sep]{.pop}) +::: + +- `data` + - Again,the data frame you want to do this separation to. +- `col` + - The name of the column you want to separate. +- `into` + - A character vector of the new column names you want to create. +- `sep` + - The character or regex pattern you want to use to split up the values in `col`. + +Here's how it works for `fruit_long`. + +```{r} +fruit_separate <- separate(data = fruit_long, + col = fruit_behavior, + into = c("fruit", "behavior"), + sep = "_") +``` + +```{r echo = F, results='asis'} +kable(fruit_separate) +``` + +It has returned a new data frame with the `fruit_behavior` column split into two new columns, named after what I passed to the `into` argument. It split up `fruit_behavior` based on what I passed to `sep`, which was the underscore character. + +Let's do this for `iy_ah_step1` now. + +```{r} +iy_ah_step2 <- separate(iy_ah_step1, + vowel_formant, + into = c("vowel", "formant"), + sep = "_") +iy_ah_step2 +``` + +We now have two separate columns for `vowel` and `formant`. + +::: {.box .hygiene} +[Hygiene]{.label} + +I have been very helpful and used underscores to merge together the values we want to separate. Be helpful to yourself, and be consistent in the semantics of how you used potential delimiters like `-` and `_`. Here's an example of being helpful to yourself: + +``` +project_subject_firstname-lastname + +EDI_1_Stuart-Duddingston +EDI_2_Connor-Black-Macdowall +EDI_3_Mhairi +``` + +This is helpful, because when you separate by underscore, you'll have something tidy + +``` +EDI 1 Stuart-Duddingston +EDI 2 Connor-Black-Macdowall +EDI 3 Mhairi +``` + +If you used `-` for everything, you'll have chaos when you try to separate them because some speakers have "double barreled" names, and some speakers have only first names: + +``` +# Input: +EDI-1-Stuart-Duddingston +EDI-2-Connor-Black-Macdowall +EDI-3-Mhairi + +# Becomes + +EDI 1 Stuart Duddingston +EDI 2 Connor Black Macdowall +EDI 3 Mhairi +``` + +This goes beyond R programming. You should make some decisions and stick with them for all of your data analysis, including file naming, Praat tier naming, etc. +::: + +## Spreading columns + +We've got one last step, which is spreading the values in some rows across the column space. With the `fruit` data, we might not want a column called `behavior`, but actually have two columns called `bought` and `ate`. For the vowel data, we definitely don't want one column called `formant`. We want one called `F1` and one called `F2`. We can do this with the `spread()` function. + +::: illustrate +spread([data]{.pop}, [key]{.pop}, [value]{.pop}) +::: + +- `data` + - Again, the data we want to work with. +- `key` + - The column whose values you want to spread across the column space. +- `value` + - The column with values that you want to fill in the cells. + +Here's how that looks with the `fruit_separate` data. + +```{r} +fruit_spread <- spread(data = fruit_separate, + key = behavior, + value = number) +``` + +```{r echo = F, results = 'asis'} +kable(fruit_spread) +``` + +This has created a new data frame. I told `spread()` to spread the values in `behavior` across the column space. Because it had only two unique values in it (`bought` and `ate`), it has created two new columns called `bought` and `ate`. After creating these new columns, it had to fill in the new cells with some values, and I told it to use the values in `number` for that. + +Here's how that works with `iy_ah_step2`. + +```{r} +iy_ah_step3 <- spread(data = iy_ah_step2, + key = formant, + value = hz) +iy_ah_step3 +``` + +Now, we've finally gotten to a tidy data format. In our next meeting, we'll discuss how to chain these three functions into one easy to read process. + +::: {.box .idiom} +[Idiom]{.label} + +You might have noticed that in the functions above, I've put a new line between individual function arguments. I've done this because white-space doesn't matter when it comes to R. I could have written these with just spaces between each argument, but that would be too visually crowded. + +```{r} +# compare + +# One line +fruit_separate <- separate(data = fruit_long, col = fruit_behavior, into = c("fruit", "behavior"), sep = "_") + +# New Lines +fruit_separate <- separate(data = fruit_long, + col = fruit_behavior, + into = c("fruit", "behavior"), + sep = "_") + +``` + +I encourage you to use new lines similarly to give yourself "some space to breathe". Don't be shy about it. But, if you put newlines between *some* arguments, you should really put new lines between *all* arguments. +::: diff --git a/teaching/courses/2017_lsa/lectures/_metadata.yml b/teaching/courses/2017_lsa/lectures/_metadata.yml new file mode 100644 index 0000000..f687063 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/_metadata.yml @@ -0,0 +1,10 @@ +author: + - name: + given: Josef + family: Fruehwald +date: 2017-7-5 +date-format: "MMMM YYYY" +date-modified: 2023-12-18 +format: + html: + code-tools: true \ No newline at end of file diff --git a/teaching/courses/2017_lsa/lectures/figures/2__R.png b/teaching/courses/2017_lsa/lectures/figures/2__R.png new file mode 100755 index 0000000..2bfe1b0 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/2__R.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/DinoSequentialSmaller.gif b/teaching/courses/2017_lsa/lectures/figures/DinoSequentialSmaller.gif new file mode 100755 index 0000000..559430e Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/DinoSequentialSmaller.gif differ diff --git a/teaching/courses/2017_lsa/lectures/figures/Enlight9.jpg b/teaching/courses/2017_lsa/lectures/figures/Enlight9.jpg new file mode 100755 index 0000000..da5bba2 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/Enlight9.jpg differ diff --git a/teaching/courses/2017_lsa/lectures/figures/IMG_4054.jpg b/teaching/courses/2017_lsa/lectures/figures/IMG_4054.jpg new file mode 100755 index 0000000..32f3138 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/IMG_4054.jpg differ diff --git a/teaching/courses/2017_lsa/lectures/figures/RProject.png b/teaching/courses/2017_lsa/lectures/figures/RProject.png new file mode 100755 index 0000000..69103b3 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/RProject.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/aic_bic.png b/teaching/courses/2017_lsa/lectures/figures/aic_bic.png new file mode 100755 index 0000000..7d0d505 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/aic_bic.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/bad_spreadsheet.png b/teaching/courses/2017_lsa/lectures/figures/bad_spreadsheet.png new file mode 100755 index 0000000..d5e01cc Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/bad_spreadsheet.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/before.jpg b/teaching/courses/2017_lsa/lectures/figures/before.jpg new file mode 100755 index 0000000..34a8912 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/before.jpg differ diff --git a/teaching/courses/2017_lsa/lectures/figures/codeCompletion.png b/teaching/courses/2017_lsa/lectures/figures/codeCompletion.png new file mode 100755 index 0000000..79c2bc9 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/codeCompletion.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/complete_pooling.svg b/teaching/courses/2017_lsa/lectures/figures/complete_pooling.svg new file mode 100755 index 0000000..9bc2409 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/complete_pooling.svg @@ -0,0 +1,194 @@ + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + Speaker A + + + + Speaker A + + + + Speaker A + + + + + Data + + + + + "Complete Pooling" + + diff --git a/teaching/courses/2017_lsa/lectures/figures/cran_package.png b/teaching/courses/2017_lsa/lectures/figures/cran_package.png new file mode 100755 index 0000000..25b4fc5 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/cran_package.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/cran_package.svg b/teaching/courses/2017_lsa/lectures/figures/cran_package.svg new file mode 100755 index 0000000..bfdcee1 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/cran_package.svg @@ -0,0 +1,360 @@ + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + CRAN + + github + install.packages("devtools") + install_github("jofrhwld/lsa2017") + + + + + + + + Installed R Packages + devtools + lsa2017 + + + + + + Your current + session + library("lsa2017") + + + + > head(anae) speakerID age sex city state dialect word vclass F1 F2 ... 1 1 50 M SiouxFalls SD North pull u 465 691 ...2 1 50 M SiouxFalls SD North coal uh 397 708 ...3 1 50 M SiouxFalls SD North boys oy 465 724 ...4 1 50 M SiouxFalls SD North pull u 465 724 ...5 1 50 M SiouxFalls SD North boy oy 432 776 ...6 1 50 M SiouxFalls SD North pull u 432 760 ... + + diff --git a/teaching/courses/2017_lsa/lectures/figures/file.png b/teaching/courses/2017_lsa/lectures/figures/file.png new file mode 100755 index 0000000..c854e70 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/file.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/firsthat.jpg b/teaching/courses/2017_lsa/lectures/figures/firsthat.jpg new file mode 100755 index 0000000..40ea23f Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/firsthat.jpg differ diff --git a/teaching/courses/2017_lsa/lectures/figures/fixef.png b/teaching/courses/2017_lsa/lectures/figures/fixef.png new file mode 100755 index 0000000..4f51ac2 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/fixef.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/inaccurate.png b/teaching/courses/2017_lsa/lectures/figures/inaccurate.png new file mode 100755 index 0000000..90c06f2 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/inaccurate.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/install.png b/teaching/courses/2017_lsa/lectures/figures/install.png new file mode 100755 index 0000000..3dd737e Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/install.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/install2.png b/teaching/courses/2017_lsa/lectures/figures/install2.png new file mode 100755 index 0000000..e07f495 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/install2.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/lasthat.jpg b/teaching/courses/2017_lsa/lectures/figures/lasthat.jpg new file mode 100755 index 0000000..a914d86 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/lasthat.jpg differ diff --git a/teaching/courses/2017_lsa/lectures/figures/lrt.png b/teaching/courses/2017_lsa/lectures/figures/lrt.png new file mode 100755 index 0000000..7d55cb7 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/lrt.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/merge_tab.png b/teaching/courses/2017_lsa/lectures/figures/merge_tab.png new file mode 100755 index 0000000..34f4c2f Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/merge_tab.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/no_pooling.svg b/teaching/courses/2017_lsa/lectures/figures/no_pooling.svg new file mode 100755 index 0000000..eed50e6 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/no_pooling.svg @@ -0,0 +1,272 @@ + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + Data + + + + Data + + + + + + Speaker A + + + + Speaker A + + + + Speaker A + + + + + Data + + + + + "No Pooling" + diff --git a/teaching/courses/2017_lsa/lectures/figures/partial_pooling.svg b/teaching/courses/2017_lsa/lectures/figures/partial_pooling.svg new file mode 100755 index 0000000..bea45ad --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/partial_pooling.svg @@ -0,0 +1,237 @@ + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + Speaker A + + + + Speaker A + + + + Speaker A + + + + Data + + + + Data + + "Partial Pooling" + + Data + + + diff --git a/teaching/courses/2017_lsa/lectures/figures/ranef.png b/teaching/courses/2017_lsa/lectures/figures/ranef.png new file mode 100755 index 0000000..b8723df Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/ranef.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/regression.png b/teaching/courses/2017_lsa/lectures/figures/regression.png new file mode 100755 index 0000000..4e92c15 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/regression.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/resid.png b/teaching/courses/2017_lsa/lectures/figures/resid.png new file mode 100755 index 0000000..df39e72 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/resid.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/rsquared.png b/teaching/courses/2017_lsa/lectures/figures/rsquared.png new file mode 100755 index 0000000..d1f29b4 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/rsquared.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/runcode.png b/teaching/courses/2017_lsa/lectures/figures/runcode.png new file mode 100755 index 0000000..f39e843 Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/runcode.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/speaker_variances.svg b/teaching/courses/2017_lsa/lectures/figures/speaker_variances.svg new file mode 100755 index 0000000..bb9bc33 --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/speaker_variances.svg @@ -0,0 +1,340 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + µ = 0.58 + σ = 0.35 + Distribution of Speakers + + Speakers' µs + µ = 0.15 + µ = 0.53 + µ = 0.81 + µ = 0.19 + µ = 0.78 + + + + + + + diff --git a/teaching/courses/2017_lsa/lectures/figures/stderr.png b/teaching/courses/2017_lsa/lectures/figures/stderr.png new file mode 100755 index 0000000..a7e92bc Binary files /dev/null and b/teaching/courses/2017_lsa/lectures/figures/stderr.png differ diff --git a/teaching/courses/2017_lsa/lectures/figures/workflow.svg b/teaching/courses/2017_lsa/lectures/figures/workflow.svg new file mode 100755 index 0000000..b9e4cfc --- /dev/null +++ b/teaching/courses/2017_lsa/lectures/figures/workflow.svg @@ -0,0 +1,121 @@ + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + begin + + + + summarize + visualize + analyze + + +