From 0630b194d82c4741d17d9e6c6a076be70e9ff4c1 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 08:31:54 +0000
Subject: [PATCH 01/15] add computational benefits, task 1

---
 episodes/03-regression-regularisation.Rmd | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 7e767e60..d820348c 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -46,7 +46,8 @@ a linear model where there are more features (predictor variables) than there ar
 models for each feature and sharing information among these models. Now we will
 take a look at an alternative approach called regularisation. Regularisation can be used to
 stabilise coefficient estimates (and thus to fit models with more features than observations)
-and even to select a subset of relevant features.
+and even to select a subset of relevant features. In addition, regularisation is often very fast 
+computationally and is thus practically useful.
 
 First, let us check out what happens if we try to fit a linear model to high-dimensional
 data! We start by reading in the data from the last lesson:

From fe486966f82a02a67fe5c89e2c57dea4f200708c Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 08:41:04 +0000
Subject: [PATCH 02/15] reorder introductory paragraph to clarify differences,
 task 2

---
 episodes/03-regression-regularisation.Rmd | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index d820348c..1820ca91 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -42,12 +42,12 @@ feature selection and it is particularly useful when dealing with high-dimension
 One reason that we need special statistical tools for high-dimensional data is
 that standard linear models cannot handle high-dimensional data sets -- one cannot fit
 a linear model where there are more features (predictor variables) than there are observations
-(data points). In the previous lesson we dealt with this problem by fitting individual
+(data points). In the previous lesson, we dealt with this problem by fitting individual
 models for each feature and sharing information among these models. Now we will
-take a look at an alternative approach called regularisation. Regularisation can be used to
-stabilise coefficient estimates (and thus to fit models with more features than observations)
-and even to select a subset of relevant features. In addition, regularisation is often very fast 
-computationally and is thus practically useful.
+take a look at an alternative approach that can be used to fit models with more 
+features than observations by stabilising coefficient estimates. This approach is called
+regularisation. Compared to many other methods, regularisation is also often very fast 
+and can therefore be extremely useful in practice. 
 
 First, let us check out what happens if we try to fit a linear model to high-dimensional
 data! We start by reading in the data from the last lesson:

From bf96ccbe9243a463612347d8ac4c83b2db88a1ac Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 08:45:09 +0000
Subject: [PATCH 03/15] add sentence to motivate discussion of singularities,
 task 3

---
 episodes/03-regression-regularisation.Rmd | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 1820ca91..a7173c1b 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -79,7 +79,8 @@ high! The summary also says that we were unable to estimate
 effect sizes for `r format(sum(is.na(coef(fit))), big.mark=",")` features
 because of "singularities". What this means is that R couldn't find a way to
 perform the calculations necessary due to the fact that we have more features
-than observations.
+than observations. We explain what singularities are and why they appear when fitting
+models to high-dimensional data below.
 
 
 > ## Singularities

From 4be70e113c1f109049b325dc81c901b475097aa5 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 08:59:01 +0000
Subject: [PATCH 04/15] clarify reason for large effect sizes, task 4

Is this a fair summary?
---
 episodes/03-regression-regularisation.Rmd | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index a7173c1b..93c0ac8c 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -75,7 +75,9 @@ summary(fit)
 ```
 
 You can see that we're able to get some effect size estimates, but they seem very 
-high! The summary also says that we were unable to estimate
+high! This is common when fitting a linear regression model with a large number of features,
+often since the model cannot distinguish between the effects of many, correlated features.
+The summary also says that we were unable to estimate
 effect sizes for `r format(sum(is.na(coef(fit))), big.mark=",")` features
 because of "singularities". What this means is that R couldn't find a way to
 perform the calculations necessary due to the fact that we have more features

From 4eac9e31a90ad4b425069b6e4fa4774468c15e46 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:11:24 +0000
Subject: [PATCH 05/15] clarify why large effect sizes and singularities, tasks
 4 and 5

Do you agree?
---
 episodes/03-regression-regularisation.Rmd | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 93c0ac8c..8a972f31 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -75,15 +75,14 @@ summary(fit)
 ```
 
 You can see that we're able to get some effect size estimates, but they seem very 
-high! This is common when fitting a linear regression model with a large number of features,
-often since the model cannot distinguish between the effects of many, correlated features.
-The summary also says that we were unable to estimate
+high! The summary also says that we were unable to estimate
 effect sizes for `r format(sum(is.na(coef(fit))), big.mark=",")` features
-because of "singularities". What this means is that R couldn't find a way to
-perform the calculations necessary due to the fact that we have more features
-than observations. We explain what singularities are and why they appear when fitting
-models to high-dimensional data below.
-
+because of "singularities". We clarify what singularities are in the note below
+but this essentially means that R couldn't find a way to
+perform the calculations necessary to fit the model. Large effect sizes and singularities are common
+when naively fitting linear regression models with a large number of features (i.e., to high-dimensional data),
+often since the model cannot distinguish between the effects of many, correlated features and 
+when we have more features than observations. 
 
 > ## Singularities
 > 

From ebc6fc1431c0621b96907d6494399024f05f9f52 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:16:25 +0000
Subject: [PATCH 06/15] reframe correlated features section

doesn't the previous example show this too? Collinearity isn't a distinct issue to having singularities?
---
 episodes/03-regression-regularisation.Rmd | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 8a972f31..24851f61 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -116,12 +116,9 @@ when we have more features than observations.
 
 > ## Correlated features -- common in high-dimensional data
 > 
-> So, we can't fit a standard linear model to high-dimensional data. But there
-> is another issue. In high-dimensional datasets, there
+> In high-dimensional datasets, there
 > are often multiple features that contain redundant information (correlated features).
->
-> We have seen in the first episode that correlated features can make it hard 
-> (or impossible) to correctly infer parameters. If we visualise the level of 
+> If we visualise the level of 
 > correlation between sites in the methylation dataset, we can see that many 
 > of the features essentially represent the same information - there are many 
 > off-diagonal cells, which are deep red or blue. For example, the following

From 515cbc805d28f46795904bc26fa243c850afb267 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:39:48 +0000
Subject: [PATCH 07/15] rewrite singularities description

check this is correct
---
 episodes/03-regression-regularisation.Rmd | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 24851f61..1b349098 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -89,24 +89,22 @@ when we have more features than observations.
 > The message that `lm` produced is not necessarily the most intuitive. What
 > are "singularities", and why are they an issue? A singular matrix 
 > is one that cannot be
-> [inverted](https://en.wikipedia.org/wiki/Invertible_matrix).
-> The inverse of an $n \times n$ square matrix $A$ is the matrix $B$ for which
-> $AB = BA = I_n$, where $I_n$ is the $n \times n$ identity matrix.
-> 
-> Why is the inverse important? Well, to find the
-> coefficients of a linear model of a matrix of predictor features $X$ and an
-> outcome vector $y$, we may perform the calculation 
+> [inverted](https://en.wikipedia.org/wiki/Invertible_matrix). R uses 
+> inverse operations to fit linear models (finds the coefficients) using: 
 > 
 > $$
->     (X^TX)^{-1}X^Ty
+>     (X^TX)^{-1}X^Ty,
 > $$
 > 
-> You can see that, if we're unable to find the inverse of the matrix $X^TX$,
-> then we'll be unable to find the regression coefficients. 
+> where $X$ is a matrix of predictor features and $y$ is the outcome vector.
+> Thus, if the matrix $X^TX$ cannot be inverted to give $(X^TX)^{-1}$, R 
+> cannot fit the model and returns the singularities error.
 > 
-> Why might this be the case?
+> Why might R be unable to calculate $(X^TX)^{-1}$ and return singularities errors?
 > Well, when the [determinant](https://en.wikipedia.org/wiki/Determinant)
-> of the matrix is zero, we are unable to find its inverse.
+> of the matrix is zero, we are unable to find its inverse. The determinant 
+> of the matrix is zero when there are more features than observations or when
+> the features are highly correlated.
 > 
 > ```{r determinant}
 > xtx <- t(methyl_mat) %*% methyl_mat

From 90155d7d1bd749fdc12a505b0db254e167761dff Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:42:49 +0000
Subject: [PATCH 08/15] minor wording change, singularities, task 6

---
 episodes/03-regression-regularisation.Rmd | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 1b349098..1ec473f7 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -87,10 +87,10 @@ when we have more features than observations.
 > ## Singularities
 > 
 > The message that `lm` produced is not necessarily the most intuitive. What
-> are "singularities", and why are they an issue? A singular matrix 
+> are "singularities" and why are they an issue? A singular matrix 
 > is one that cannot be
 > [inverted](https://en.wikipedia.org/wiki/Invertible_matrix). R uses 
-> inverse operations to fit linear models (finds the coefficients) using: 
+> inverse operations to fit linear models (find the coefficients) using: 
 > 
 > $$
 >     (X^TX)^{-1}X^Ty,
@@ -103,7 +103,7 @@ when we have more features than observations.
 > Why might R be unable to calculate $(X^TX)^{-1}$ and return singularities errors?
 > Well, when the [determinant](https://en.wikipedia.org/wiki/Determinant)
 > of the matrix is zero, we are unable to find its inverse. The determinant 
-> of the matrix is zero when there are more features than observations or when
+> of the matrix is zero when there are more features than observations or often when
 > the features are highly correlated.
 > 
 > ```{r determinant}

From 53772595a817541577148d63c8d70030e7220f4c Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:49:29 +0000
Subject: [PATCH 09/15] remove text at the end of correlation section

if keeping the other changes
---
 episodes/03-regression-regularisation.Rmd | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 1ec473f7..cfc81f50 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -136,10 +136,7 @@ library("ComplexHeatmap")
 > )
 > ```
 > 
-> Correlation between features can be problematic for technical reasons. If it is 
-> very severe, it may even make it impossible to fit a model! This is in addition to
-> the fact that with more features than observations, we can't even estimate
-> the model properly. Regularisation can help us to deal with correlated features.
+Regularisation can help us to deal with correlated features.
 {: .callout}
 
 

From 856eaf06e2304fd09aacbde5f08c02b8b6d294d5 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:51:41 +0000
Subject: [PATCH 10/15] tasks 7 and 8 addressed if keeping other changes

---
 episodes/03-regression-regularisation.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index cfc81f50..cdfae8ee 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -136,7 +136,7 @@ library("ComplexHeatmap")
 > )
 > ```
 > 
-Regularisation can help us to deal with correlated features.
+> Regularisation can help us to deal with correlated features.
 {: .callout}
 
 

From 3d28a550ad4c2b9cc8c95258f8767fd14c0c23b9 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 09:56:00 +0000
Subject: [PATCH 11/15] remove sentence until we're about to discuss
 regularisation

---
 episodes/03-regression-regularisation.Rmd | 1 -
 1 file changed, 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index cdfae8ee..2ac63621 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -136,7 +136,6 @@ library("ComplexHeatmap")
 > )
 > ```
 > 
-> Regularisation can help us to deal with correlated features.
 {: .callout}
 
 

From ea3e28ffcdb9cbe1f80dd0c8c17916127b81b55e Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Thu, 29 Feb 2024 10:10:25 +0000
Subject: [PATCH 12/15] change "singularities errors"

---
 episodes/03-regression-regularisation.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 2ac63621..2d61f00d 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -98,9 +98,9 @@ when we have more features than observations.
 > 
 > where $X$ is a matrix of predictor features and $y$ is the outcome vector.
 > Thus, if the matrix $X^TX$ cannot be inverted to give $(X^TX)^{-1}$, R 
-> cannot fit the model and returns the singularities error.
+> cannot fit the model and returns the error that there are singularities.
 > 
-> Why might R be unable to calculate $(X^TX)^{-1}$ and return singularities errors?
+> Why might R be unable to calculate $(X^TX)^{-1}$ and return the error that there are singularities?
 > Well, when the [determinant](https://en.wikipedia.org/wiki/Determinant)
 > of the matrix is zero, we are unable to find its inverse. The determinant 
 > of the matrix is zero when there are more features than observations or often when

From 2fa3452a658da876763f03aef877f57124fc92e8 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Tue, 12 Mar 2024 17:59:25 +0000
Subject: [PATCH 13/15] and to or

Co-authored-by: Ailith Ewing <54178580+ailithewing =@=users.noreply.github.com>
---
 episodes/03-regression-regularisation.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 2d61f00d..d6ebf86d 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -81,7 +81,7 @@ because of "singularities". We clarify what singularities are in the note below
 but this essentially means that R couldn't find a way to
 perform the calculations necessary to fit the model. Large effect sizes and singularities are common
 when naively fitting linear regression models with a large number of features (i.e., to high-dimensional data),
-often since the model cannot distinguish between the effects of many, correlated features and 
+often since the model cannot distinguish between the effects of many, correlated features or
 when we have more features than observations. 
 
 > ## Singularities

From e7d80c2462dcf1933930e9944d4c9ab9fec9d0a8 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Tue, 12 Mar 2024 17:59:54 +0000
Subject: [PATCH 14/15] remove essentially 1

Co-authored-by: Ailith Ewing <54178580+ailithewing =@=users.noreply.github.com>
---
 episodes/03-regression-regularisation.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index d6ebf86d..14a1d71d 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -78,7 +78,7 @@ You can see that we're able to get some effect size estimates, but they seem ver
 high! The summary also says that we were unable to estimate
 effect sizes for `r format(sum(is.na(coef(fit))), big.mark=",")` features
 because of "singularities". We clarify what singularities are in the note below
-but this essentially means that R couldn't find a way to
+but this means that R couldn't find a way to
 perform the calculations necessary to fit the model. Large effect sizes and singularities are common
 when naively fitting linear regression models with a large number of features (i.e., to high-dimensional data),
 often since the model cannot distinguish between the effects of many, correlated features or

From 3f9d51f3b0c0d3000ccd59595f01c235acbc42e1 Mon Sep 17 00:00:00 2001
From: Mary Llewellyn <mary.llewellyn@ed.ac.uk>
Date: Tue, 12 Mar 2024 18:00:22 +0000
Subject: [PATCH 15/15] remove essentially 2

Co-authored-by: Ailith Ewing <54178580+ailithewing =@=users.noreply.github.com>
---
 episodes/03-regression-regularisation.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/episodes/03-regression-regularisation.Rmd b/episodes/03-regression-regularisation.Rmd
index 14a1d71d..d7b51e94 100644
--- a/episodes/03-regression-regularisation.Rmd
+++ b/episodes/03-regression-regularisation.Rmd
@@ -118,7 +118,7 @@ when we have more features than observations.
 > are often multiple features that contain redundant information (correlated features).
 > If we visualise the level of 
 > correlation between sites in the methylation dataset, we can see that many 
-> of the features essentially represent the same information - there are many 
+> of the features represent the same information - there are many 
 > off-diagonal cells, which are deep red or blue. For example, the following
 > heatmap visualises the correlations for the first 500 features in the 
 > `methylation` dataset (we selected 500 features only as it can be hard to