From ac24ca7296b2d275f884e89c49d1140a6864cebc Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Fri, 25 Sep 2020 14:31:43 -0500
Subject: [PATCH 1/6] Initial addition of glossary

---
 docs/src/glossary.md | 57 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100644 docs/src/glossary.md

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
new file mode 100644
index 000000000..be03f3f60
--- /dev/null
+++ b/docs/src/glossary.md
@@ -0,0 +1,57 @@
+# ChainRules Glossary
+
+This glossary serves as a quick reference for common terms used in the field of Automatic Differentiation, as well as those used throughout the documentation relating specifically to ChainRules.
+
+##Definitions:
+
+###Adjoint:
+
+The conjugate transpose of the Jacobian for a given function `f`.
+
+###Derivative:
+
+The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function, or its directional derivative.
+
+###Differential:
+
+The differential of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).
+
+###Directional Derivative:
+
+The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction. It represents the rate of change of `f` in that direction.
+
+###F-rule:
+
+A function used in forward-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pushforward.
+
+###Gradient:
+
+The gradient of a scalar function `f` represented by `∇f` is a vector function whose components are the partial derivatives of `f` with respect to each dimension of the domain of `f`.
+
+###Jacobian:
+
+The Jacobian of a vector-valued function `f` is the matrix of `f`'s first-order partial derivatives.
+
+###Jacobian Transpose Vector Product (j'vp):
+
+The product of the adjoint of the Jacobian and the vector in question. A description of the pullback in terms of its Jacobian.
+
+###Jacobian Vector Product (jvp):
+
+The product of the Jacobian and the vector in question. It is a description of the pushforward in terms of its Jacobian.
+
+###Primal:
+
+Something relating to the original problem, as opposed to relating to the derivative. For example in `y = f(x)`, `f` is the primal function, and computing `f(x)` is doing the primal computation. `y` is the primal return, and `x` is a primal argument. `typeof(y)` and `typeof(x)` are both primal types.
+
+###Pullback:
+
+`Pullback(f)` describes the sensitivity of the input of `f` as a function of (for the relative change to) the sensitivity of the output of `f`. Can be represented as the dot product of a vector (left) the adjoint Jacobian (right).
+
+###Pushforward:
+
+`Pushforward(f)` describes the sensitivity of the output of `f` as a function of (for the relative change to) the sensitivity of the input of `f`. Can be represented as the dot product of the Jacobian (left) and a vector (right).
+
+###R-rule:
+
+A function used in reverse-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pullback.

From 75d2d396d300475fa0507a6d0b5f8f2b9c278a46 Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Tue, 29 Sep 2020 14:35:34 -0500
Subject: [PATCH 2/6] Add differential type defs

---
 docs/src/glossary.md | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
index be03f3f60..a4787f05d 100644
--- a/docs/src/glossary.md
+++ b/docs/src/glossary.md
@@ -10,12 +10,26 @@ The conjugate transpose of the Jacobian for a given function `f`.
 
 ###Derivative:
 
-The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function, or its directional derivative.
+The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function.
 
 ###Differential:
 
 The differential of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).
 
+In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential should represent a difference between two primal values.
+
+####*   Natural Differential:
+
+A natural differential type for a given primal type is the type people would intuitively associate with representing the difference between two values of the primal type.
+
+####Structural Differential:
+
+If a given primal type `P` does not have a natural differential, we need to come up with one that makes sense. These are called structural differentials and are represented as `Composite{P, <:NamedTuple}`.
+
+####Semi-Structural Differential:
+
+A structural differential that contains at least one natural differential field.
+
 ###Directional Derivative:
 
 The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction. It represents the rate of change of `f` in that direction.
@@ -42,7 +56,7 @@ The product of the Jacobian and the vector in question. It is a description of t
 
 ###Primal:
 
-Something relating to the original problem, as opposed to relating to the derivative. For example in `y = f(x)`, `f` is the primal function, and computing `f(x)` is doing the primal computation. `y` is the primal return, and `x` is a primal argument. `typeof(y)` and `typeof(x)` are both primal types.
+Something relating to the original problem, as opposed to relating to the derivative. In ChainRules, primals are types ("primal types").
 
 ###Pullback:
 
@@ -55,3 +69,9 @@ Something relating to the original problem, as opposed to relating to the deriva
 ###R-rule:
 
 A function used in reverse-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pullback.
+
+###Thunk:
+
+If we wish to delay the computation of a derivative for whatever reason, we wrap it in a `Thunk` or `ImplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
+
+

From 0283a39d47dd06c49ea671a7661852d9bc5646e8 Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Tue, 29 Sep 2020 14:36:48 -0500
Subject: [PATCH 3/6] Typo fix

---
 docs/src/glossary.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
index a4787f05d..3b1afd00e 100644
--- a/docs/src/glossary.md
+++ b/docs/src/glossary.md
@@ -18,7 +18,7 @@ The differential of a given function `y = f(x)` denoted `dy` is the product of t
 
 In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential should represent a difference between two primal values.
 
-####*   Natural Differential:
+####Natural Differential:
 
 A natural differential type for a given primal type is the type people would intuitively associate with representing the difference between two values of the primal type.
 
@@ -30,6 +30,14 @@ If a given primal type `P` does not have a natural differential, we need to come
 
 A structural differential that contains at least one natural differential field.
 
+####Thunk:
+
+An "unnatural" differential type. If we wish to delay the computation of a derivative for whatever reason, we wrap it in a `Thunk` or `ImplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
+
+####Zero:
+
+`Zero()` can also be a differential type. If you have trouble understanding the rules enforced upon differential types, consider this one first, as `Zero()` is the trivial vector space.
+
 ###Directional Derivative:
 
 The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction. It represents the rate of change of `f` in that direction.
@@ -70,8 +78,4 @@ Something relating to the original problem, as opposed to relating to the deriva
 
 A function used in reverse-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pullback.
 
-###Thunk:
-
-If we wish to delay the computation of a derivative for whatever reason, we wrap it in a `Thunk` or `ImplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
-
 

From 7b9872364916b26b4f0a824b28b2b0ec0e3e515a Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Tue, 1 Dec 2020 13:15:16 -0600
Subject: [PATCH 4/6] Adding Automatic Differentiation def

---
 docs/src/glossary.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
index 3b1afd00e..ef57bbaf0 100644
--- a/docs/src/glossary.md
+++ b/docs/src/glossary.md
@@ -8,6 +8,10 @@ This glossary serves as a quick reference for common terms used in the field of
 
 The conjugate transpose of the Jacobian for a given function `f`.
 
+###Automatic Differentiation:
+
+Automatic Differentiation is the process of applying numerical methods to solving derivative problems, most often algorithmically, using computer programs to achieve high degrees of accuracy.
+
 ###Derivative:
 
 The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function.

From 3146ca36050d2e99d6de76fc0d14222d75f9e348 Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Wed, 9 Dec 2020 09:20:49 -0600
Subject: [PATCH 5/6] Addressing comments

---
 docs/src/glossary.md | 68 +++++++++++++++++++++++++++-----------------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
index ef57bbaf0..1483ff811 100644
--- a/docs/src/glossary.md
+++ b/docs/src/glossary.md
@@ -14,72 +14,88 @@ Automatic Differentiation is the process of applying numerical methods to solvin
 
 ###Derivative:
 
-The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function.
+The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, the derivative is not defined.
+Instead there are the partial derivatives, the directional derivative and the jacobian (called gradient for scalar-valued functions).
 
 ###Differential:
 
 The differential of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).
 
-In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential should represent a difference between two primal values.
+In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential type should represent a difference between two primal typed values.
 
 ####Natural Differential:
 
-A natural differential type for a given primal type is the type people would intuitively associate with representing the difference between two values of the primal type.
+A natural differential type for a given primal type is a `ChainRules.jl` specific term for the type people would intuitively associate with representing the difference between two values of the primal type. This is in contrast to the structural differential.
+* **Note:** Not to be confused with the [natural gradient](https://towardsdatascience.com/natural-gradient-ce454b3dcdfa), which is an unrelated concept.
 
-####Structural Differential:
-
-If a given primal type `P` does not have a natural differential, we need to come up with one that makes sense. These are called structural differentials and are represented as `Composite{P, <:NamedTuple}`.
+**eg.** A natural differential type for the primal type `DateTime` could be `Hours`
 
-####Semi-Structural Differential:
+####Structural Differential:
 
-A structural differential that contains at least one natural differential field.
+If a given primal type `P` does not have a natural differential, we need to come up with one that makes sense. These are called structural differentials and are `ChainRules.jl` specific terms represented as `Composite{P}` and mirrors the structure of the primal type.
 
 ####Thunk:
 
-An "unnatural" differential type. If we wish to delay the computation of a derivative for whatever reason, we wrap it in a `Thunk` or `ImplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
+If we wish to delay the computation of a derivative for whatever reason, we wrap it in a [`Thunk`](https://en.wikipedia.org/wiki/Thunk) or `InplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
+
+For the purposes of `ChainRles.jl`, the `AbstractThunk` subtype is an "unnatural" differential type. It is a function set up to act like a differential.
 
 ####Zero:
 
-`Zero()` can also be a differential type. If you have trouble understanding the rules enforced upon differential types, consider this one first, as `Zero()` is the trivial vector space.
+The additive identity for differentials. It represents the hard zero (ie adding it to anything returns the original thing). `Zero()` can also be a differential type.
 
 ###Directional Derivative:
 
-The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction. It represents the rate of change of `f` in that direction.
+The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction (ie. the Jacobian Vector Product). It represents the rate of change of `f` in the given direction. This gets computed by the pushforward function.
 
-###F-rule:
+###`frule`:
 
-A function used in forward-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pushforward.
+A forward mode rule, that descripes how to propagate the sensitivity into the forwards direction.
+
+The `frule` fuses the primal computation and the pushforward. It takes in the primal function name, the primal arguments and their matching partial derivatives. It returns the primal output, and the matching directional derivative (jvp).
 
 ###Gradient:
 
-The gradient of a scalar function `f` represented by `∇f` is a vector function whose components are the partial derivatives of `f` with respect to each dimension of the domain of `f`.
+The gradient of a scalar function `f` represented by `∇f` is a vector function whose components are the partial derivatives of `f` with respect to each dimension of the domain of `f`. This is equivalent to the jacobian for scalar-valued functions.
 
 ###Jacobian:
 
 The Jacobian of a vector-valued function `f` is the matrix of `f`'s first-order partial derivatives.
 
-###Jacobian Transpose Vector Product (j'vp):
+###Primal:
 
-The product of the adjoint of the Jacobian and the vector in question. A description of the pullback in terms of its Jacobian.
+Something relating to the original problem, as opposed to relating to the derivative.
+Such as:
+ - The primal function being the function that is to be differnetiated
+ - The primal inputs being the inputs to that function (the point that the derivative is being calculated at)
+ - The primal outputs being the result of applying the primal function to the primal inputs
+ - The primal pass (also called the forward pass) where the computation is run to get the primal outputs (generally before doing a derivative (i.e. reverse pass) in reverse mode AD).
+ - The primal computation which is the part of the code that is run during the primal pass and must at least compute the primal outputs (but may compute other things to use during the derivative pass).
+ - The primal types being the types of the primal inputs/outputs
 
-###Jacobian Vector Product (jvp):
+###Pullback:
 
-The product of the Jacobian and the vector in question. It is a description of the pushforward in terms of its Jacobian.
+`Pullback(f)` describes the sensitivity of a quantity to the input of `f` as a function of its sensitivity to the output of `f`. Can be represented as the dot product of a vector and the adjoint of the Jacobian.
 
-###Primal:
+####Jacobian Transpose Vector Product (j'vp):
 
-Something relating to the original problem, as opposed to relating to the derivative. In ChainRules, primals are types ("primal types").
+The product of the adjoint of the Jacobian and the vector in question. A description of the pullback in terms of its Jacobian.
 
-###Pullback:
+###Pushforward:
 
-`Pullback(f)` describes the sensitivity of the input of `f` as a function of (for the relative change to) the sensitivity of the output of `f`. Can be represented as the dot product of a vector (left) the adjoint Jacobian (right).
+`Pushforward(f)` describes the sensitivity of a quantity to the output of `f` as a function of its sensitivity to the input of `f`. Can be represented as the dot product of the Jacobian and a vector.
 
-###Pushforward:
+####Jacobian Vector Product (jvp):
+
+The product of the Jacobian and the vector in question.
+
+* **Note:**
+The jvp is a description of the pushforward in terms of its Jacobian and is often used interchangeably with the term pushforward as a result. Strictly speaking, the pushforward computes the jvp (ie the jvp is not normally seen as the name of a function).
 
-`Pushforward(f)` describes the sensitivity of the output of `f` as a function of (for the relative change to) the sensitivity of the input of `f`. Can be represented as the dot product of the Jacobian (left) and a vector (right).
+###`rrule`:
 
-###R-rule:
+A reverse mode rule, that descripes how to propagate the sensitivity into the reverse direction.
 
-A function used in reverse-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pullback.
+The `rrule` fuses the primal computation and the pullback. It takes in the primal function name and the primal arguments. It returns the primal output and the propogation rule (j'vp).
 
 

From d854bce4e840dc9741e6a3e05afc11e30e2cc13d Mon Sep 17 00:00:00 2001
From: Tom Wright <tom.wright@invenia.ca>
Date: Wed, 9 Dec 2020 09:44:27 -0600
Subject: [PATCH 6/6] Adding internal and external links

---
 docs/src/glossary.md | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/docs/src/glossary.md b/docs/src/glossary.md
index 1483ff811..5e47b78bf 100644
--- a/docs/src/glossary.md
+++ b/docs/src/glossary.md
@@ -1,58 +1,58 @@
 # ChainRules Glossary
 
-This glossary serves as a quick reference for common terms used in the field of Automatic Differentiation, as well as those used throughout the documentation relating specifically to ChainRules.
+This glossary serves as a quick reference for common terms used in the field of [Automatic Differentiation](#automatic-differentiation), as well as those used throughout the documentation relating specifically to [`ChainRules.jl`](https://www.juliadiff.org/ChainRulesCore.jl/stable/index.html).
 
 ##Definitions:
 
 ###Adjoint:
 
-The conjugate transpose of the Jacobian for a given function `f`.
+The adjoint is conjugate transpose of the [Jacobian](#jacobian) for a given function `f`.
 
 ###Automatic Differentiation:
 
-Automatic Differentiation is the process of applying numerical methods to solving derivative problems, most often algorithmically, using computer programs to achieve high degrees of accuracy.
+[Automatic Differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is the process of applying numerical methods to solving derivative problems, most often algorithmically, using computer programs to achieve high degrees of accuracy.
 
 ###Derivative:
 
-The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, the derivative is not defined.
-Instead there are the partial derivatives, the directional derivative and the jacobian (called gradient for scalar-valued functions).
+The [derivative](https://en.wikipedia.org/wiki/Derivative) of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, the derivative is not defined.
+Instead there are the [partial derivatives](https://en.wikipedia.org/wiki/Partial_derivative), the [directional derivative](#directional-derivative) and the [jacobian](#jacobian) (called [gradient](#gradient) for scalar-valued functions).
 
 ###Differential:
 
-The differential of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).
+The [differential](https://en.wikipedia.org/wiki/Differential_(mathematics)) of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).
 
 In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential type should represent a difference between two primal typed values.
 
 ####Natural Differential:
 
-A natural differential type for a given primal type is a `ChainRules.jl` specific term for the type people would intuitively associate with representing the difference between two values of the primal type. This is in contrast to the structural differential.
+A natural differential type for a given primal type is a `ChainRules.jl` specific term for the type people would intuitively associate with representing the difference between two values of the primal type. This is in contrast to the [structural differential](#structural-differential).
 * **Note:** Not to be confused with the [natural gradient](https://towardsdatascience.com/natural-gradient-ce454b3dcdfa), which is an unrelated concept.
 
 **eg.** A natural differential type for the primal type `DateTime` could be `Hours`
 
 ####Structural Differential:
 
-If a given primal type `P` does not have a natural differential, we need to come up with one that makes sense. These are called structural differentials and are `ChainRules.jl` specific terms represented as `Composite{P}` and mirrors the structure of the primal type.
+If a given [primal](#primal) type `P` does not have a [natural differential](#natural-differential), we need to come up with one that makes sense. These are called structural differentials and are `ChainRules.jl` specific terms represented as `Composite{P}` and mirrors the structure of the primal type.
 
 ####Thunk:
 
-If we wish to delay the computation of a derivative for whatever reason, we wrap it in a [`Thunk`](https://en.wikipedia.org/wiki/Thunk) or `InplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.
+If we wish to delay the computation of a derivative for whatever reason, we wrap it in a [`Thunk`](https://en.wikipedia.org/wiki/Thunk) or `InplaceableThunk`. It holds off on computing the wrapped [derivative](#derivative) until it is needed.
 
 For the purposes of `ChainRles.jl`, the `AbstractThunk` subtype is an "unnatural" differential type. It is a function set up to act like a differential.
 
 ####Zero:
 
-The additive identity for differentials. It represents the hard zero (ie adding it to anything returns the original thing). `Zero()` can also be a differential type.
+The additive [identity](https://en.wikipedia.org/wiki/Identity_(mathematics)) for differentials. It represents the hard zero (ie adding it to anything returns the original thing). `Zero()` can also be a differential type.
 
 ###Directional Derivative:
 
-The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction (ie. the Jacobian Vector Product). It represents the rate of change of `f` in the given direction. This gets computed by the pushforward function.
+The [directional derivative](https://en.wikipedia.org/wiki/Directional_derivative) of a function `f` at any given point in any given unit-direction is the [gradient](#gradient) multiplied by the direction (ie. the [Jacobian Vector Product](#pushforward)). It represents the rate of change of `f` in the given direction. This gets computed by the [pushforward](#pushforward) function.
 
 ###`frule`:
 
 A forward mode rule, that descripes how to propagate the sensitivity into the forwards direction.
 
-The `frule` fuses the primal computation and the pushforward. It takes in the primal function name, the primal arguments and their matching partial derivatives. It returns the primal output, and the matching directional derivative (jvp).
+The `frule` fuses the [primal](#primal) computation and the pushforward. It takes in the primal function name, the primal arguments and their matching [partial derivatives](https://en.wikipedia.org/wiki/Partial_derivative). It returns the primal output, and the matching [directional derivative](#directional-derivative) (jvp).
 
 ###Gradient:
 
@@ -60,11 +60,11 @@ The gradient of a scalar function `f` represented by `∇f` is a vector function
 
 ###Jacobian:
 
-The Jacobian of a vector-valued function `f` is the matrix of `f`'s first-order partial derivatives.
+The [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant) of a vector-valued function `f` is the matrix of `f`'s first-order [partial derivatives](https://en.wikipedia.org/wiki/Partial_derivative).
 
 ###Primal:
 
-Something relating to the original problem, as opposed to relating to the derivative.
+Something relating to the original problem, as opposed to relating to the [derivative](#derivative).
 Such as:
  - The primal function being the function that is to be differnetiated
  - The primal inputs being the inputs to that function (the point that the derivative is being calculated at)
@@ -75,27 +75,27 @@ Such as:
 
 ###Pullback:
 
-`Pullback(f)` describes the sensitivity of a quantity to the input of `f` as a function of its sensitivity to the output of `f`. Can be represented as the dot product of a vector and the adjoint of the Jacobian.
+[`Pullback(f)`](https://en.wikipedia.org/wiki/Pullback) describes the sensitivity of a quantity to the input of `f` as a function of its sensitivity to the output of `f`. Can be represented as the dot product of a vector and the [adjoint](#adjoint) of the [Jacobian](#jacobian).
 
 ####Jacobian Transpose Vector Product (j'vp):
 
-The product of the adjoint of the Jacobian and the vector in question. A description of the pullback in terms of its Jacobian.
+The product of the [adjoint](#adjoint) of the [Jacobian](#jacobian) and the vector in question. A description of the pullback in terms of its Jacobian.
 
 ###Pushforward:
 
-`Pushforward(f)` describes the sensitivity of a quantity to the output of `f` as a function of its sensitivity to the input of `f`. Can be represented as the dot product of the Jacobian and a vector.
+[`Pushforward(f)`](https://en.wikipedia.org/wiki/Pushforward) describes the sensitivity of a quantity to the output of `f` as a function of its sensitivity to the input of `f`. Can be represented as the dot product of the [Jacobian](#jacobian) and a vector.
 
 ####Jacobian Vector Product (jvp):
 
-The product of the Jacobian and the vector in question.
+The product of the [Jacobian](#jacobian) and the vector in question.
 
 * **Note:**
-The jvp is a description of the pushforward in terms of its Jacobian and is often used interchangeably with the term pushforward as a result. Strictly speaking, the pushforward computes the jvp (ie the jvp is not normally seen as the name of a function).
+The jvp is a description of the pushforward in terms of its [Jacobian](#jacobian) and is often used interchangeably with the term pushforward as a result. Strictly speaking, the pushforward computes the jvp (ie the jvp is not normally seen as the name of a function).
 
 ###`rrule`:
 
 A reverse mode rule, that descripes how to propagate the sensitivity into the reverse direction.
 
-The `rrule` fuses the primal computation and the pullback. It takes in the primal function name and the primal arguments. It returns the primal output and the propogation rule (j'vp).
+The `rrule` fuses the [primal](#primal) computation and the pullback. It takes in the primal function name and the primal arguments. It returns the primal output and the propogation rule (j'vp).