rdpeng · engineerchange · Nov 5, 2021 · Nov 5, 2021 · Nov 5, 2021 · Nov 5, 2021
diff --git a/manuscript/apply.Rmd b/manuscript/apply.Rmd
@@ -47,7 +47,7 @@ Note that the actual looping is done internally in C code for efficiency reasons
 
 It's important to remember that `lapply()` always returns a list, regardless of the class of the input.
 
-Here's an example of applying the `mean()` function to all elements of a list. If the original list has names, the the names will be preserved in the output.
+Here's an example of applying the `mean()` function to all elements of a list. If the original list has names, then the names will be preserved in the output.
 
 
 ```{r}
@@ -86,7 +86,7 @@ x <- 1:4
 lapply(x, runif, min = 0, max = 10)
 ```
 
-So now, instead of the random numbers being between 0 and 1 (the default), the are all between 0 and 10.
+So now, instead of the random numbers being between 0 and 1 (the default), they are all between 0 and 10.
 
 The `lapply()` function and its friends make heavy use of _anonymous_ functions. Anonymous functions are like members of [Project Mayhem](http://en.wikipedia.org/wiki/Fight_Club)---they have no names. These are functions are generated "on the fly" as you are using `lapply()`. Once the call to `lapply()` is finished, the function disappears and does not appear in the workspace.
 
@@ -165,7 +165,7 @@ where
 - `f` is a factor (or coerced to one) or a list of factors
 - `drop` indicates whether empty factors levels should be dropped
 
-The combination of `split()` and a function like `lapply()` or `sapply()` is a common paradigm in R. The basic idea is that you can take a data structure, split it into subsets defined by another variable, and apply a function over those subsets. The results of applying tha function over the subsets are then collated and returned as an object. This sequence of operations is sometimes referred to as "map-reduce" in other contexts.
+The combination of `split()` and a function like `lapply()` or `sapply()` is a common paradigm in R. The basic idea is that you can take a data structure, split it into subsets defined by another variable, and apply a function over those subsets. The results of applying the function over the subsets are then collated and returned as an object. This sequence of operations is sometimes referred to as "map-reduce" in other contexts.
 
 Here we simulate some data and split it according to a factor variable. Note that we use the `gl()` function to "generate levels" in a factor variable.
 
@@ -413,13 +413,13 @@ With `mapply()`, instead we can do
 This passes the sequence `1:4` to the first argument of `rep()` and the sequence `4:1` to the second argument.
 
 
-Here's another example for simulating randon Normal variables.
+Here's another example for simulating random Normal variables.
 
 ```{r}
 noise <- function(n, mean, sd) {
       rnorm(n, mean, sd)
 }
-## Simulate 5 randon numbers
+## Simulate 5 random numbers
 noise(5, 1, 2)        
 
 ## This only simulates 1 set of numbers, not 5
@@ -484,9 +484,9 @@ Pretty cool, right?
 
 * The loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form
 
-* The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results.
+* The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and then collating the results and returning the collated results.
 
-* Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere
+* Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere.
 
-* The `split()` function can be used to divide an R object in to subsets determined by another variable which can subsequently be looped over using loop functions.
+* The `split()` function can be used to divide an R object into subsets determined by another variable which can subsequently be looped over using loop functions.
 
diff --git a/manuscript/control.Rmd b/manuscript/control.Rmd
@@ -28,7 +28,7 @@ Commonly used control structures are
 - `next`: skip an interation of a loop
 
 Most control structures are not used in interactive sessions, but
-rather when writing functions or longer expresisons. However, these
+rather when writing functions or longer expressions. However, these
 constructs do not have to be used in functions and it's a good idea to
 become familiar with them before we delve into functions.
 
@@ -58,8 +58,7 @@ an `else` clause.
 ```r
 if(<condition>) {
         ## do something
-} 
-else {
+} else {
         ## do something else
 }
 ```
@@ -123,12 +122,12 @@ if(<condition2>) {
 
 [Watch a video of this section](https://youtu.be/FbT1dGXCCxU)
 
-For loops are pretty much the only looping construct that you will
+`for` loops are pretty much the only looping construct that you will
 need in R. While you may occasionally find a need for other types of
 loops, in my experience doing data analysis, I've found very few
 situations where a for loop wasn't sufficient. 
 
-In R, for loops take an interator variable and assign it successive
+In R, for loops take an iterator variable and assign it successive
 values from a sequence or vector. For loops are most commonly used for
 iterating over the elements of an object (list, vector, etc.)
 
@@ -210,7 +209,7 @@ functions (discussed later).
 
 [Watch a video of this section](https://youtu.be/VqrS1Wghq1c)
 
-While loops begin by testing a condition. If it is true, then they
+`while` loops begin by testing a condition. If it is true, then they
 execute the loop body. Once the loop body is executed, the condition
 is tested again, and so forth, until the condition is false, after
 which the loop exits.
@@ -223,7 +222,7 @@ while(count < 10) {
 }
 ```
 
-While loops can potentially result in infinite loops if not written
+`while` loops can potentially result in infinite loops if not written
 properly. Use with care!
 
 Sometimes there will be more than one condition in the test.
@@ -259,7 +258,7 @@ not commonly used in statistical or data analysis applications but
 they do have their uses. The only way to exit a `repeat` loop is to
 call `break`.
 
-One possible paradigm might be in an iterative algorith where you may
+One possible paradigm might be in an iterative algorithm where you may
 be searching for a solution and you don't want to stop until you're
 close enough to the solution. In this kind of situation, you often
 don't know in advance how many iterations it's going to take to get
@@ -322,8 +321,8 @@ for(i in 1:100) {
 
 ## Summary
 
-- Control structures like `if`, `while`, and `for` allow you to
-  control the flow of an R program
+- Control structures, like `if`, `while`, and `for`, allow you to
+  control the flow of an R program.
 
 - Infinite loops should generally be avoided, even if (you believe)
   they are theoretically correct.

diff --git a/manuscript/debugging.Rmd b/manuscript/debugging.Rmd
@@ -128,7 +128,7 @@ You can see now that the correct messages are printed without any warning or err
 
 ## Figuring Out What's Wrong
 
-The primary task of debugging any R code is correctly diagnosing what the problem is. When diagnosing a problem with your code (or somebody else's), it's important first understand what you were expecting to occur. Then you need to idenfity what *did* occur and how did it deviate from your expectations. Some basic questions you need to ask are
+The primary task of debugging any R code is correctly diagnosing what the problem is. When diagnosing a problem with your code (or somebody else's), it's important to first understand what you were expecting to occur. Then you need to identify what *did* occur and how did it deviate from your expectations. Some basic questions you need to ask are
 
 - What was your input? How did you call the function?
 - What were you expecting? Output, messages, other results? 
@@ -269,11 +269,11 @@ Enter a frame number, or 0 to exit
 Selection:
 ```
 
-The `recover()` function will first print out the function call stack when an error occurrs. Then, you can choose to jump around the call stack and investigate the problem. When you choose a frame number, you will be put in the browser (just like the interactive debugger triggered with `debug()`) and will have the ability to poke around.
+The `recover()` function will first print out the function call stack when an error occurs. Then, you can choose to jump around the call stack and investigate the problem. When you choose a frame number, you will be put in the browser (just like the interactive debugger triggered with `debug()`) and will have the ability to poke around.
 
 ## Summary
 
-- There are three main indications of a problem/condition: `message`, `warning`, `error`; only an `error` is fatal
-- When analyzing a function with a problem, make sure you can reproduce the problem, clearly state your expectations and how the output differs from your expectation
-- Interactive debugging tools `traceback`, `debug`, `browser`, `trace`, and `recover` can be used to find problematic code in functions
+- There are three main indications of a problem/condition: `message`, `warning`, `error`; only an `error` is fatal.
+- When analyzing a function with a problem, make sure you can reproduce the problem, clearly state your expectations and how the output differs from your expectation.
+- Interactive debugging tools `traceback`, `debug`, `browser`, `trace`, and `recover` can be used to find problematic code in functions.
 - Debugging tools are not a substitute for thinking!
diff --git a/manuscript/dplyr.Rmd b/manuscript/dplyr.Rmd
@@ -40,7 +40,7 @@ Some of the key "verbs" provided by the `dplyr` package are
 
 * `%>%`: the "pipe" operator is used to connect multiple verb actions together into a pipeline
 
-The `dplyr` package as a number of its own data types that it takes advantage of. For example, there is a handy `print` method that prevents you from printing a lot of data to the console. Most of the time, these additional data types are transparent to the user and do not need to be worried about.
+The `dplyr` package has a number of its own data types that it takes advantage of. For example, there is a handy `print` method that prevents you from printing a lot of data to the console. Most of the time, these additional data types are transparent to the user and do not need to be worried about.
 
 
 
@@ -52,7 +52,7 @@ All of the functions that we will discuss in this Chapter will have a few common
 
 2. The subsequent arguments describe what to do with the data frame specified in the first argument, and you can refer to columns in the data frame directly without using the $ operator (just use the column names).
 
-3. The return result of a function is a new data frame
+3. The return result of a function is a new data frame.
 
 4. Data frames must be properly formatted and annotated for this to all be useful. In particular, the data must be [tidy](http://www.jstatsoft.org/v59/i10/paper). In short, there should be one observation per row, and each column should represent a feature or characteristic of that observation.
 
@@ -84,7 +84,7 @@ You may get some warnings when the package is loaded because there are functions
 
 ## `select()`
 
-For the examples in this chapter we will be using a dataset containing air pollution and temperature data for the [city of Chicago](http://www.biostat.jhsph.edu/~rpeng/leanpub/rprog/chicago_data.zip) in the U.S. The dataset is available from my web site.     
+For the examples in this chapter we will be using a dataset containing air pollution and temperature data for the [city of Chicago](http://www.biostat.jhsph.edu/~rpeng/leanpub/rprog/chicago_data.zip) in the U.S. The dataset is available from my website.     
 
 After unzipping the archive, you can load the data into R using the `readRDS()` function.
 
@@ -101,7 +101,7 @@ str(chicago)
 
 The `select()` function can be used to select columns of a data frame that you want to focus on. Often you'll have a large data frame containing "all" of the data, but any *given* analysis might only use a subset of variables or observations. The `select()` function allows you to get the few columns you might need.
 
-Suppose we wanted to take the first 3 columns only. There are a few ways to do this. We could for example use numerical indices. But we can also use the names directly.
+Suppose we wanted to take the first 3 columns only. There are a few ways to do this. We could, for example, use numerical indices. But we can also use the names directly.
 
 ```{r}
 names(chicago)[1:3]
@@ -197,7 +197,7 @@ and the last few rows.
 tail(select(chicago, date, pm25tmean2), 3)
 ```
 
-Columns can be arranged in descending order too by useing the special `desc()` operator.
+Columns can be arranged in descending order too by using the special `desc()` operator.
 
 ```{r}
 chicago <- arrange(chicago, desc(date))
@@ -221,7 +221,7 @@ Here you can see the names of the first five variables in the `chicago` data fra
 head(chicago[, 1:5], 3)
 ```
 
-The `dptp` column is supposed to represent the dew point temperature adn the `pm25tmean2` column provides the PM2.5 data. However, these names are pretty obscure or awkward and probably be renamed to something more sensible.
+The `dptp` column is supposed to represent the dew point temperature adn the `pm25tmean2` column provides the PM2.5 data. However, these names are pretty obscure or awkward and probably need to be renamed to something more sensible.
 
 ```{r}
 chicago <- rename(chicago, dewpoint = dptp, pm25 = pm25tmean2)
@@ -365,11 +365,11 @@ Here we can see that `o3` tends to be low in the winter months and high in the s
 
 The `dplyr` package provides a concise set of operations for managing data frames. With these functions we can do a number of complex operations in just a few lines of code. In particular, we can often conduct the beginnings of an exploratory analysis with the powerful combination of `group_by()` and `summarize()`. 
 
-Once you learn the `dplyr` grammar there are a few additional benefits
+Once you learn the `dplyr` grammar there are a few additional benefits:
 
-* `dplyr` can work with other data frame "backends" such as SQL databases. There is an SQL interface for relational databases via the DBI package
+* `dplyr` can work with other data frame "backends", such as SQL databases. There is a SQL interface for relational databases via the DBI package.
 
-* `dplyr` can be integrated with the `data.table` package for large fast tables
+* `dplyr` can be integrated with the `data.table` package for large fast tables.
 
-The `dplyr` package is handy way to both simplify and speed up your data frame management code. It's rare that you get such a combination at the same time!
+The `dplyr` package is a handy way to both simplify and speed up your data frame management code. It's rare that you get such a combination at the same time!