diff --git a/docs/source/guide/lre-5-theory.md b/docs/source/guide/lre-5-theory.md
index 96e74e704..0bf40c810 100644
--- a/docs/source/guide/lre-5-theory.md
+++ b/docs/source/guide/lre-5-theory.md
@@ -19,13 +19,13 @@ The user guide for LRE in Mitiq is currently under construction.
 # What is the theory behind LRE?
 
 Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
-{cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input
+{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
 circuit such that the noiseless expectation value is extrapolated from the execution of each
 noisy circuit.
 
 Similar to [ZNE](zne.md), this process works in two steps:
 
-- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits through unitary folding.
+- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding.
 
 - **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation.
 
@@ -44,7 +44,7 @@ Suppose we're interested in the value of some observable in an $n$-qubit circuit
 
 Each layer can have a different scale factor and we can create $M$ such variations of the scaled circuit. Let $\{λ_1, λ_2, λ_3, \ldots, λ_M\}$ be the scale factors vectors used to create multiple variations of the noise-scaled circuits $\{C_{λ_1}, C_{λ_2}, C_{λ_3}, \ldots, C_{λ_M}\}$ such that each vector $λ_i$ defines the scale factors for the different layers in the input circuit $\{{λ^1}_i, {λ^2}_i, {λ^3}_i, \ldots, {λ^l}_i\}^T$.
 
-If $d$ is the chosen degree of our multivariate polynomial, $M_j(λ_i, d)$ corresponds to the terms in the polynomial. In general, the monomial terms for a variable $l$ up to degree $d$ can be determined through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
+If $d$ is the chosen degree of our multivariate polynomial, $M_j(λ_i, d)$ corresponds to the terms in the polynomial arranged in increasing order. In general, the monomial terms for a variable $l$ up to degree $d$ can be determined through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
 
 $$
 \text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d}
@@ -54,32 +54,34 @@ $$
 \text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d}
 $$
 
-These monomial terms define the rows of the square sample matrix where $a_{i,j}=M_j(λ_i, d)$.
+These monomial terms define the rows of the square sample matrix as shown below:
 
 $$
 \mathbf{A}(\Lambda, d) = 
 \begin{bmatrix}
-    a_{1,1} & a_{1,2} & \cdots & a_{1,M} \\
-    a_{2,1} & a_{2,2} & \cdots & a_{2,M} \\
+    M_1(λ_1, d) & M_2(λ_1, d) & \cdots & M_N(λ_1, d) \\
+    M_1(λ_2, d) & M_2(λ_2, d) & \cdots & M_N(λ_2, d) \\
     \vdots & \vdots & \ddots & \vdots \\
-    a_{N,1} & a_{N,2} & \cdots & a_{N,M}
+    M_1(λ_N, d) & M_2(λ_N, d) & \cdots & M_N(λ_N, d)
 \end{bmatrix}
 $$
 
-Each monomial term in the sample matrix is evaluated using the values in the scale factor vectors. We aim to define the zero-noise limit as a linear combination of the noisy expectation values. Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $Ac = z$ where $c$ is the coefficients vector, $z$ is the vector of expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
+Each monomial term in the sample matrix $\mathbf{A}$ is evaluated using the values in the scale factor vectors. In Step 2, we aim to define $O_{\mathrm{LRE}}$ as a linear combination of the noisy expectation values.
+
+Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
 
 ## Step 2: Extrapolate to the noiseless limit
 
-Each noise scaled circuit $C_{λ_i}$ has an expectation value associated with it $\langle O(λ_i) \rangle$ such that we can define a vector of the noisy expectation values $z = (\langle O(λ_1) \rangle, \langle O(λ_2) \rangle, \ldots, \langle O(λ_M)\rangle)^T$. These have a coefficient of linear combination associated with them such that 
+Each noise scaled circuit $C_{λ_i}$ has an expectation value $\langle O(λ_i) \rangle$ associated with it such that we can define a vector of the noisy expectation values $z = (\langle O(λ_1) \rangle, \langle O(λ_2) \rangle, \ldots, \langle O(λ_M)\rangle)^T$. These have a coefficient of linear combination associated with them as shown below: 
 
 $$
-O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(\boldsymbol{\lambda}_i) \rangle.
+O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(λ_i) \rangle.
 $$
 
-The system of linear equations is used to find the numerous $\eta_i$. As we only need to find the noiseless expectation value, we do not need to calculate the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf). 
+The system of linear equations is used to find the numerous $\eta_i$ in vector $c$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$.
 
 $$
 O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle  \frac{\det \left(\mathbf{M}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}.
 $$
 
-To get the matrix $\mathbf{M}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_1=(1, 0, \ldots, 0)^T$.
+To get the matrix $\mathbf{M}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_1=(1, 0, \ldots, 0)^T$ where except $M_1(0, d) = 1$ all the other monomial terms are zero.