Merge pull request #167 from mallewellyn/mary-review-changes

Initial changes in response to instructor feedback
carpentries-incubator · Mar 25, 2024 · 1c9ce40 · 1c9ce40
2 parents e57d5d4 + 62e191b
commit 1c9ce40
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 15 deletions.
diff --git a/_episodes_rmd/01-introduction-to-high-dimensional-data.Rmd b/_episodes_rmd/01-introduction-to-high-dimensional-data.Rmd
@@ -114,7 +114,7 @@ high-dimensional datasets it can also be difficult to identify a single response
 variable, making standard data exploration and analysis techniques less useful.
 
 Let's have a look at a simple dataset with lots of features to understand some
-of the challenges we are facing when working with high-dimensional data.
+of the challenges we are facing when working with high-dimensional data. 
 
 
 > ## Challenge 2 
@@ -166,6 +166,10 @@ of the challenges we are facing when working with high-dimensional data.
 > {: .solution}
 {: .challenge}
 
+Note that function documentation and information on function arguments will be useful throughout
+this lesson. We can access these easily in R by running `?` followed by the package name.
+For example, the documentation for the `dim` package can be accessed by running `?dim`.
+
 > ## Locating data with R - the **`here`** package
 > 
 > It is often desirable to access external datasets from inside R and to write 

diff --git a/_episodes_rmd/04-principal-component-analysis.Rmd b/_episodes_rmd/04-principal-component-analysis.Rmd
@@ -76,31 +76,24 @@ resulting principal component could also be used as an effect in further analysi
 >    hospital with infectious respiratory disease. They would like to determine
 >    whether length of stay in hospital differs in patients with different
 >    respiratory diseases.
-> 2. An online retailer has collected data on user interactions with its online
->    app and has information on the number of times each user interacted with
->    the app, what products they viewed per interaction, and the type and cost
->    of these products. The retailer would like to use this information to
->    predict whether or not a user will be interested in a new product.
-> 3. A scientist has assayed gene expression levels in 1000 cancer patients and
+> 2. A scientist has assayed gene expression levels in 1000 cancer patients and
 >    has data from probes targeting different genes in tumour samples from
 >    patients. She would like to create new variables representing relative
 >    abundance of different groups of genes to i) find out if genes form
 >    subgroups based on biological function and ii) use these new variables
 >    in a linear regression examining how gene expression varies with disease
 >    severity.
-> 4. All of the above.
+> 3. Both of the above.
 > 
 > > ## Solution
 > > 
 > >
 > > In the first case, a regression model would be more suitable; perhaps a
 > > survival model.
-> > In the second, again a regression model, likely linear or logistic, would
-> > be more suitable.
-> > In the third example, PCA can help to identify modules of correlated
+> > In the second example, PCA can help to identify modules of correlated
 > > features that explain a large amount of variation within the data.
 > >
-> > Therefore the answer here is 3.
+> > Therefore the answer here is 2.
 > {: .solution}
 {: .challenge}
 
@@ -241,8 +234,8 @@ deviation of 1.
 > >    It also won't affect how quickly the output will be calculated, whether
 > >    continuous and categorical variables are present or not.
 > > 
-> >    It is done to ensure that all features have equal weighting in the resulting
-> >    PCs.
+> >    It is done to ensure that features with different ranges of values
+> >    have equal weighting in the resulting PCs (point 2).
 > > 
 > >  2. You may not want to standardise datasets which contain continuous variables
 > >    all measured on the same scale (e.g. gene expression data or RNA sequencing

diff --git a/_episodes_rmd/05-factor-analysis.Rmd b/_episodes_rmd/05-factor-analysis.Rmd
@@ -265,7 +265,7 @@ text(
 > > biologically, as we would expect prostate enlargement to be associated
 > > with greater weight. The groupings of lcavol, lcp, and lpsa also make
 > > sense biologically, as larger cancer volume may be expected to be
-> > associated with greater cancer spead and therefore higher PSA in the blood.
+> > associated with greater cancer spread and therefore higher PSA in the blood.
 > {: .solution}
 {: .challenge}