From 3c83538398d9c0b22804bbaf876c7e51554fb1e8 Mon Sep 17 00:00:00 2001
From: Jason Brownlee <jason.brownlee05@gmail.com>
Date: Fri, 20 Dec 2024 09:52:41 +1100
Subject: [PATCH] fix math

---
 .../advanced/new_algorithms.html              |  64 ++++-----
 .../advanced/problem_solving.html             |   4 +-
 .../advanced/visualizing_algorithms.html      |   6 +-
 .../evolution/differential_evolution.html     |   6 +-
 .../evolution/evolution_strategies.html       |  18 +--
 .../evolution/evolutionary_programming.html   |   2 +-
 .../gene_expression_programming.html          |   6 +-
 .../evolution/genetic_algorithm.html          |   2 +-
 .../evolution/genetic_programming.html        |   8 +-
 .../evolution/grammatical_evolution.html      |  20 +--
 .../evolution/learning_classifier_system.html |  28 ++--
 docs/nature-inspired/evolution/nsga.html      |   6 +-
 docs/nature-inspired/evolution/spea.html      |  10 +-
 docs/nature-inspired/immune/airs.html         |  10 +-
 .../immune/clonal_selection_algorithm.html    |   6 +-
 docs/nature-inspired/immune/dca.html          |   8 +-
 .../immune/immune_network_algorithm.html      |   6 +-
 .../immune/negative_selection_algorithm.html  |   4 +-
 docs/nature-inspired/introduction.html        |   6 +-
 .../neural/backpropagation.html               |  16 +--
 .../neural/hopfield_network.html              |  18 +--
 docs/nature-inspired/neural/lvq.html          |   6 +-
 docs/nature-inspired/neural/perceptron.html   |  14 +-
 docs/nature-inspired/neural/som.html          |  10 +-
 .../physical/cultural_algorithm.html          |   2 +-
 .../physical/extremal_optimization.html       |   8 +-
 .../physical/harmony_search.html              |   8 +-
 .../physical/memetic_algorithm.html           |   2 +-
 .../physical/simulated_annealing.html         |   4 +-
 .../compact_genetic_algorithm.html            |   8 +-
 .../probabilistic/cross_entropy.html          |   6 +-
 docs/nature-inspired/sitemap.xml              | 134 +++++++++---------
 .../stochastic/adaptive_random_search.html    |   2 +-
 docs/nature-inspired/stochastic/grasp.html    |  10 +-
 .../stochastic/guided_local_search.html       |  12 +-
 .../stochastic/random_search.html             |   2 +-
 .../stochastic/scatter_search.html            |   2 +-
 .../swarm/ant_colony_system.html              |  18 +--
 docs/nature-inspired/swarm/ant_system.html    |  14 +-
 .../nature-inspired/swarm/bees_algorithm.html |   8 +-
 docs/nature-inspired/swarm/bfoa.html          |  12 +-
 docs/nature-inspired/swarm/pso.html           |   4 +-
 web/generate.rb                               |  40 +++---
 43 files changed, 291 insertions(+), 289 deletions(-)
diff --git a/docs/nature-inspired/advanced/new_algorithms.html b/docs/nature-inspired/advanced/new_algorithms.html
index 88709fca..e6f50eb2 100644
--- a/docs/nature-inspired/advanced/new_algorithms.html
+++ b/docs/nature-inspired/advanced/new_algorithms.html
@@ -51,7 +51,7 @@ <h2><a name='adaptive_systems'>Adaptive Systems</a></h2>
 
 <h3><a name='adaptive_systems_formalism'>Adaptive Systems Formalism</a></h3>
 <p>
-This section presents a brief review of Holland's adaptive systems formalism described in  [<a href='#Holland1975'>Holland1975</a>] (Chapter 2). This presentation focuses particularly on the terms and their description, and has been hybridized with the concise presentation of the formalism by De Jong  [<a href='#Jong1975'>Jong1975</a>] (page 6). The formalism is divided into sections: 1) <em>Primary Objects</em> summarized in Table (below), and 2) <em>Secondary Objects</em> summarized in Table (below). Primary Objects are the conventional objects of an adaptive system: the environment $["e"]$, the strategy or adaptive plan that creates solutions in the environment $["s"]$, and the utility assigned to created solutions $["U"]$.
+This section presents a brief review of Holland's adaptive systems formalism described in  [<a href='#Holland1975'>Holland1975</a>] (Chapter 2). This presentation focuses particularly on the terms and their description, and has been hybridized with the concise presentation of the formalism by De Jong  [<a href='#Jong1975'>Jong1975</a>] (page 6). The formalism is divided into sections: 1) <em>Primary Objects</em> summarized in Table (below), and 2) <em>Secondary Objects</em> summarized in Table (below). Primary Objects are the conventional objects of an adaptive system: the environment $e$, the strategy or adaptive plan that creates solutions in the environment $s$, and the utility assigned to created solutions $U$.
 </p>
 <table border='1'>
 <tr>
@@ -60,25 +60,25 @@ <h3><a name='adaptive_systems_formalism'>Adaptive Systems Formalism</a></h3>
 <td><strong>Description</strong> <br /></td>
 </tr>
 <tr>
-<td>$["e"]$</td>
+<td>$e$</td>
 <td>Environment</td>
 <td>The environment of the system undergoing adaptation. <br /></td>
 </tr>
 <tr>
-<td>$["s"]$</td>
+<td>$s$</td>
 <td>Strategy</td>
 <td>The adaptive plan which determines successive structural modifications in response to the environment. <br /></td>
 </tr>
 <tr>
-<td>$["U"]$</td>
+<td>$U$</td>
 <td>Utility</td>
-<td>A measure of performance or payoff of different structures in the environment. Maps a given solution ($["A"]$) to a real number evaluation. <br /></td>
+<td>A measure of performance or payoff of different structures in the environment. Maps a given solution ($A$) to a real number evaluation. <br /></td>
 </tr>
 <caption align="bottom">Primary Objects in the adaptive systems formalism.</caption>
 </table>
 
 <p>
-Secondary Objects extend beyond the primary objects providing the detail of the formalism. These objects suggest a broader context than that of the instance specific primary objects, permitting the evaluation and comparison of sets of objects such as plans ($["S"]$), environments ($["E"]$), search spaces ($["A"]$), and operators ($["O"]$).
+Secondary Objects extend beyond the primary objects providing the detail of the formalism. These objects suggest a broader context than that of the instance specific primary objects, permitting the evaluation and comparison of sets of objects such as plans ($S$), environments ($E$), search spaces ($A$), and operators ($O$).
 </p>
 <table border='1'>
 <tr>
@@ -87,45 +87,45 @@ <h3><a name='adaptive_systems_formalism'>Adaptive Systems Formalism</a></h3>
 <td><strong>Description</strong> <br /></td>
 </tr>
 <tr>
-<td>$["A"]$</td>
+<td>$A$</td>
 <td>Search Space</td>
 <td>The set of attainable structures, solutions, and the domain of action for an adaptive plan. <br /></td>
 </tr>
 <tr>
-<td>$["E"]$</td>
+<td>$E$</td>
 <td>Environments</td>
-<td>The range of different environments, where $["e"]$ is an instance. It may also represent the unknowns of the strategy about the environment.  <br /></td>
+<td>The range of different environments, where $e$ is an instance. It may also represent the unknowns of the strategy about the environment.  <br /></td>
 </tr>
 <tr>
-<td>$["O"]$</td>
+<td>$O$</td>
 <td>Operators</td>
-<td>Set of operators applied to an instance of $["A"]$ at time $["t"]$ ($["A_t"]$) to transform it into $["A_{t+1}"]$. <br /></td>
+<td>Set of operators applied to an instance of $A$ at time $t$ ($A_t$) to transform it into $A_{t+1}$. <br /></td>
 </tr>
 <tr>
-<td>$["S"]$</td>
+<td>$S$</td>
 <td>Strategies</td>
-<td>Set of plans applicable for a given environment (where $["s"]$ is an instance), that use operators from the set $["O"]$.  <br /></td>
+<td>Set of plans applicable for a given environment (where $s$ is an instance), that use operators from the set $O$.  <br /></td>
 </tr>
 <tr>
-<td>$["X"]$</td>
+<td>$X$</td>
 <td>Criterion</td>
-<td>Used to compare strategies (in the set $["S"]$), under the set of environments ($["E"]$). Takes into account the efficiency of a plan in different environments. <br /></td>
+<td>Used to compare strategies (in the set $S$), under the set of environments ($E$). Takes into account the efficiency of a plan in different environments. <br /></td>
 </tr>
 <tr>
-<td>$["I"]$</td>
+<td>$I$</td>
 <td>Feedback</td>
-<td>Set of possible environmental inputs and signals providing dynamic information to the system about the performance of a particular solution $["A"]$ in a particular environment $["E"]$. <br /></td>
+<td>Set of possible environmental inputs and signals providing dynamic information to the system about the performance of a particular solution $A$ in a particular environment $E$. <br /></td>
 </tr>
 <tr>
-<td>$["M"]$</td>
+<td>$M$</td>
 <td>Memory</td>
-<td>The memory or retained parts of the input history ($["I"]$) for a solution ($["A"]$). <br /></td>
+<td>The memory or retained parts of the input history ($I$) for a solution ($A$). <br /></td>
 </tr>
 <caption align="bottom">Secondary Objects in the adaptive systems formalism.</caption>
 </table>
 
 <p>
-A given adaptive plan acts in discrete time $["t"]$, which is a useful simplification for analysis and computer simulation. A framework for a given adaptive system requires the definition of a set of strategies $["S"]$, a set of environments $["E"]$, and criterion for ranking strategies $["X"]$. A given adaptive plan is specified within this framework given the following set of objects: a search space $["A"]$, a set of operators $["O"]$, and feedback from the environment $["I"]$. Holland proposed a series of fundamental questions when considering the definition for an adaptive system, which he rephrases within the context of the formalism (see Table (below)).
+A given adaptive plan acts in discrete time $t$, which is a useful simplification for analysis and computer simulation. A framework for a given adaptive system requires the definition of a set of strategies $S$, a set of environments $E$, and criterion for ranking strategies $X$. A given adaptive plan is specified within this framework given the following set of objects: a search space $A$, a set of operators $O$, and feedback from the environment $I$. Holland proposed a series of fundamental questions when considering the definition for an adaptive system, which he rephrases within the context of the formalism (see Table (below)).
 </p>
 <table border='1'>
 <tr>
@@ -134,31 +134,31 @@ <h3><a name='adaptive_systems_formalism'>Adaptive Systems Formalism</a></h3>
 </tr>
 <tr>
 <td>To what parts of its environment is the organism (system, organization) adapting?</td>
-<td>What is $["E"]$? <br /></td>
+<td>What is $E$? <br /></td>
 </tr>
 <tr>
 <td>How does the environment act upon the adapting organism (system, organization)?</td>
-<td>What is $["I"]$? <br /></td>
+<td>What is $I$? <br /></td>
 </tr>
 <tr>
 <td>What structures are undergoing adaptation?</td>
-<td>What is $["A"]$? <br /></td>
+<td>What is $A$? <br /></td>
 </tr>
 <tr>
 <td>What are the mechanisms of adaptation?</td>
-<td>What is $["O"]$? <br /></td>
+<td>What is $O$? <br /></td>
 </tr>
 <tr>
 <td>What part of the history of its interaction with the environment does the organism (system, organization) retain in addition to that summarized in the structure tested?</td>
-<td>What is $["M"]$? <br /></td>
+<td>What is $M$? <br /></td>
 </tr>
 <tr>
 <td>What limits are there to the adaptive process?</td>
-<td>What is $["S"]$? <br /></td>
+<td>What is $S$? <br /></td>
 </tr>
 <tr>
 <td>How are different (hypotheses about) adaptive processes to be compared?</td>
-<td>What is $["X"]$? <br /></td>
+<td>What is $X$? <br /></td>
 </tr>
 <caption align="bottom">Questions when investigating adaptive systems, taken from \cite{Holland1975} (pg. 29).</caption>
 </table>
@@ -173,18 +173,18 @@ <h3><a name='some_examples'>Some Examples</a></h3>
 From working within the formalism, Holland makes six observations regarding obstacles that may be encountered whilst investigating adaptive systems  [<a href='#Holland1975'>Holland1975</a>] (pages 159-160):
 </p>
 <ul>
-<li> <em>High cardinality of $["A"]$</em>: makes searches long and storage of relevant data difficult.</li>
+<li> <em>High cardinality of $A$</em>: makes searches long and storage of relevant data difficult.</li>
 <li> <em>Appropriateness of credit</em>: knowledge of the properties about 'successful' structures is incomplete, making it hard to predict good future structures from past structures.</li>
-<li> <em>High dimensionality of $["U"]$ on an $["e"]$</em>: performance is a function of a large number of variables which is difficult for classical optimization methods.</li>
-<li> <em>Non-linearity of $["U"]$ on an $["e"]$</em>: many false optima or false peaks, resulting in the potential for a lot of wasted computation.</li>
+<li> <em>High dimensionality of $U$ on an $e$</em>: performance is a function of a large number of variables which is difficult for classical optimization methods.</li>
+<li> <em>Non-linearity of $U$ on an $e$</em>: many false optima or false peaks, resulting in the potential for a lot of wasted computation.</li>
 <li> <em>Mutual interference of search and exploitation</em>: the exploration (acquisition of new information), exploitation (application of known information) trade-off.</li>
 <li> <em>Relevant non-payoff information</em>: the environment may provide a lot more information in addition to payoff, some of which may be relevant to improved performance.</li>
 </ul>
 <p>
-Cavicchio provides perhaps one of the first applications of the formalism (after Holland) in his dissertation investigating Holland's reproductive plans  [<a href='#Cavicchio1970'>Cavicchio1970</a>] (and to a lesser extent in  [<a href='#Cavicchio1972'>Cavicchio1972</a>]). The work summarizes the formalism, presenting essentially the same framework, although he provides a specialization of the search space $["A"]$. The search space is broken down into a representation (codes), solutions (devices), and a mapping function from representation to solutions. The variation highlights the restriction the representation and mapping have on the designs available to the adaptive plan. Further, such mappings may not be one-to-one, there may be many instances in the representation space that map to the same solution (or the reverse).
+Cavicchio provides perhaps one of the first applications of the formalism (after Holland) in his dissertation investigating Holland's reproductive plans  [<a href='#Cavicchio1970'>Cavicchio1970</a>] (and to a lesser extent in  [<a href='#Cavicchio1972'>Cavicchio1972</a>]). The work summarizes the formalism, presenting essentially the same framework, although he provides a specialization of the search space $A$. The search space is broken down into a representation (codes), solutions (devices), and a mapping function from representation to solutions. The variation highlights the restriction the representation and mapping have on the designs available to the adaptive plan. Further, such mappings may not be one-to-one, there may be many instances in the representation space that map to the same solution (or the reverse).
 </p>
 <p>
-Although not explicitly defined, Holland's specification of structures $["A"]$ is clear in pointing out that the structures are not bound to a level of abstraction; the definition covers structures at all levels. Nevertheless, Cavicchio's specialization for a representation-solution mapping was demonstrated to be useful in his exploration of reproductive plans (early Genetic Algorithms). He proposed that an adaptive system is <em>first order</em> if the utility function $["U"]$ for structures on an environment encompasses feedback $["I"]$.
+Although not explicitly defined, Holland's specification of structures $A$ is clear in pointing out that the structures are not bound to a level of abstraction; the definition covers structures at all levels. Nevertheless, Cavicchio's specialization for a representation-solution mapping was demonstrated to be useful in his exploration of reproductive plans (early Genetic Algorithms). He proposed that an adaptive system is <em>first order</em> if the utility function $U$ for structures on an environment encompasses feedback $I$.
 </p>
 <p>
 Cavicchio described the potential independence (component-wise) and linearity of the utility function with respect to the representation used. De Jong also employed the formalism to investigate reproductive plans in his dissertation research  [<a href='#Jong1975'>Jong1975</a>]. He indicated that the formalism covers the essential characteristics of adaptation, where the performance of a solution is a function of its characteristics and its environment. Adaptation is defined as a strategy for generating better-performing solutions to a problem by reducing initial uncertainty about the environment via feedback from the evaluation of individual solutions. De Jong used the formalism to define a series of genetic reproductive plans, which he investigated in the context of function optimization.
diff --git a/docs/nature-inspired/advanced/problem_solving.html b/docs/nature-inspired/advanced/problem_solving.html
index b69ad348..9dcbb589 100644
--- a/docs/nature-inspired/advanced/problem_solving.html
+++ b/docs/nature-inspired/advanced/problem_solving.html
@@ -182,7 +182,7 @@ <h3><a name='function_approximation'>Function Approximation</a></h3>
 </p>
 <p>
 <strong>Vector Quantization</strong>
-Vector Quantization (VQ) refers to a method of approximating a target function using a set of exemplar (prototype or codebook) vectors. The exemplars represent a discrete subset of the problem, generally restricted to the features of interest using the natural representation of the observations in the problem space, typically an an unconstrained $["n"]$-dimensional real valued space. The VQ method provides the advantage of a non-parametric model of a target function (like instance-based and lazy learning such as the $["k"]$-Nearest-Neighbor method (<em>k</em>NN)) using a symbolic representation that is meaningful in the domain (like tree-based approaches).
+Vector Quantization (VQ) refers to a method of approximating a target function using a set of exemplar (prototype or codebook) vectors. The exemplars represent a discrete subset of the problem, generally restricted to the features of interest using the natural representation of the observations in the problem space, typically an an unconstrained $n$-dimensional real valued space. The VQ method provides the advantage of a non-parametric model of a target function (like instance-based and lazy learning such as the $k$-Nearest-Neighbor method (<em>k</em>NN)) using a symbolic representation that is meaningful in the domain (like tree-based approaches).
 </p>
 <p>
 The promotion of compression addresses the storage and retrieval concerns of <em>k</em>NN, although the selection of codebook vectors (the so-called quantization problem) is a hard problem that is known to be NP-complete  [<a href='#Garey1982'>Garey1982</a>]. More recently Kuncheva and Bezdek have worked towards unifying quantization methods in the application to classification problems, referring to the approaches as Nearest Prototype Classifiers (NPC) and proposing a generalized nearest prototype classifier  [<a href='#Kuncheva1998'>Kuncheva1998</a>] [<a href='#Kuncheva1998a'>Kuncheva1998a</a>].
@@ -199,7 +199,7 @@ <h3><a name='function_approximation'>Function Approximation</a></h3>
 <em>Boosting</em> is based on the principle of combining a set of quasi-independent weak learners that collectively are as effective as a single strong learner  [<a href='#Kearns1988'>Kearns1988</a>] [<a href='#Schapire1992'>Schapire1992</a>]. The seminal approach is called Adaptive Boosting (AdaBoost) that involves the preparation of a series of classifiers, where subsequent classifiers are prepared for the observations that are misclassified by the proceeding classifier models (creation of specialists)  [<a href='#Schapire2003'>Schapire2003</a>].
 </p>
 <p>
-<em>Bootstrap Aggregation</em> (bagging) involves partitioning the observations into $["N"]$ randomly chosen subsets (with re-selection), and training a different model on each  [<a href='#Breiman1996'>Breiman1996</a>]. Although robust to noisy datasets, the approach requires careful consideration as to the consensus mechanism between the independent models for decision making.
+<em>Bootstrap Aggregation</em> (bagging) involves partitioning the observations into $N$ randomly chosen subsets (with re-selection), and training a different model on each  [<a href='#Breiman1996'>Breiman1996</a>]. Although robust to noisy datasets, the approach requires careful consideration as to the consensus mechanism between the independent models for decision making.
 </p>
 <p>
 <em>Stacked Generalization</em> (stacking) involves creating a sequence of models of generally different types arranged into a stack, where subsequently added models generalize the behavior (success or failure) of the model before it with the intent of correcting erroneous decision making  [<a href='#Wolpert1992'>Wolpert1992</a>] [<a href='#Ting1999'>Ting1999</a>].
diff --git a/docs/nature-inspired/advanced/visualizing_algorithms.html b/docs/nature-inspired/advanced/visualizing_algorithms.html
index 9d73adcc..6c519047 100644
--- a/docs/nature-inspired/advanced/visualizing_algorithms.html
+++ b/docs/nature-inspired/advanced/visualizing_algorithms.html
@@ -64,7 +64,7 @@ <h2><a name='plotting_problems'>Plotting Problems</a></h2>
 
 <h3><a name='continuous_function_optimization'>Continuous Function Optimization</a></h3>
 <p>
-A continuous function optimization problem is typically visualized in two dimensions as a line where $["x=input, y=f(input)"]$ or three dimensions as a surface where $["x,y=input, z=f(input)"]$.
+A continuous function optimization problem is typically visualized in two dimensions as a line where $x=input, y=f(input)$ or three dimensions as a surface where $x,y=input, z=f(input)$.
 </p>
 <p>
 Some functions may have many more dimensions, which if the function is linearly separable can be visualized in lower dimensions. Functions that are not linearly-separable may be able to make use of projection techniques such as Principle Component Analysis (PCA). For example, preparing a stratified sample of the search space as vectors with associated cost function value and using PCA to project the vectors onto a two-dimensional plane for visualization.
@@ -73,7 +73,7 @@ <h3><a name='continuous_function_optimization'>Continuous Function Optimization<
 Similarly, the range of each variable input to the function may be large. This may mean that some of the complexity or detail may be lost when the function is visualized as a line or surface. An indication of this detail may be achieved by creating spot-sample plots of narrow sub-sections of the function.
 </p>
 <p>
-Figure (below) provides an example of the Basin function in one dimension. The Basin function is a  continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$. The optimal solution for this function is $["(v_0,\\ldots,v_{n-1})=0.0"]$. Listing (below) provides the Gnuplot script used to prepare the plot ($["n=1"]$).
+Figure (below) provides an example of the Basin function in one dimension. The Basin function is a  continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$. The optimal solution for this function is $(v_0,\ldots,v_{n-1})=0.0$. Listing (below) provides the Gnuplot script used to prepare the plot ($n=1$).
 </p>
 <img src='basin1.png' align='middle' alt='Plot of the Basin function in one-dimension.' class='book_image'>
 <br />
@@ -101,7 +101,7 @@ <h3><a name='continuous_function_optimization'>Continuous Function Optimization<
 <div class='caption'>Gnuplot script for plotting a function in two-dimensions</div>
 
 <p>
-Both plots show the optimum in the center of the domain at $["x=0.0"]$ in one-dimension and $["x,y=0.0"]$ in two-dimensions.
+Both plots show the optimum in the center of the domain at $x=0.0$ in one-dimension and $x,y=0.0$ in two-dimensions.
 </p>
 
 
diff --git a/docs/nature-inspired/evolution/differential_evolution.html b/docs/nature-inspired/evolution/differential_evolution.html
index 7860f345..e907f5c3 100644
--- a/docs/nature-inspired/evolution/differential_evolution.html
+++ b/docs/nature-inspired/evolution/differential_evolution.html
@@ -120,8 +120,8 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Differential evolution was designed for nonlinear, non-differentiable continuous function optimization.</li>
-<li> The weighting factor $["F \\in [0,2]"]$ controls the amplification of differential variation, a value of 0.8 is suggested.</li>
-<li> the crossover weight $["CR \\in [0,1]"]$ probabilistically controls the amount of recombination, a value of 0.9 is suggested.</li>
+<li> The weighting factor $F \in [0,2]$ controls the amplification of differential variation, a value of 0.8 is suggested.</li>
+<li> the crossover weight $CR \in [0,1]$ probabilistically controls the amount of recombination, a value of 0.9 is suggested.</li>
 <li> The initial population of candidate solutions should be randomly generated from within the space of valid solutions.</li>
 <li> The popular configurations are DE/rand/1/* and DE/best/2/*.</li>
 </ul>
@@ -130,7 +130,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Differential Evolution algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is an implementation of Differential Evolution with the DE/rand/1/bin configuration proposed by Storn and Price  [<a href='#Storn1997'>Storn1997</a>]. <br />
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/evolution/evolution_strategies.html b/docs/nature-inspired/evolution/evolution_strategies.html
index 0fc4cb12..7327d082 100644
--- a/docs/nature-inspired/evolution/evolution_strategies.html
+++ b/docs/nature-inspired/evolution/evolution_strategies.html
@@ -58,8 +58,8 @@ <h2><a name='strategy'>Strategy</a></h2>
 
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
-Instances of Evolution Strategies algorithms may be concisely described with a custom terminology in the form $["(\\mu,\\lambda)-ES"]$, where $["\\mu"]$ is number of candidate solutions in the parent generation, and $["\\lambda"]$ is the number of candidate solutions generated from the parent generation. In this configuration, the best $["\\mu"]$ are kept if $["\\lambda > \\mu"]$, where $["\\lambda"]$ must be great or equal to $["\\mu"]$. In addition to the so-called comma-selection Evolution Strategies algorithm, a plus-selection variation may be defined $["(\\mu + \\lambda)-ES"]$, where the best members of the union of the $["\\mu"]$ and $["\\lambda"]$ generations compete based on objective fitness for a position in the next generation. The simplest configuration is the $["(1+1)-ES"]$, which is a type of greedy hill climbing algorithm.
-Algorithm (below) provides a pseudocode listing of the $["(\\mu,\\lambda)-ES"]$ algorithm for minimizing a cost function. The algorithm shows the adaptation of candidate solutions that co-adapt their own strategy parameters that influence the amount of mutation applied to a candidate solutions descendants.
+Instances of Evolution Strategies algorithms may be concisely described with a custom terminology in the form $(\mu,\lambda)-ES$, where $\mu$ is number of candidate solutions in the parent generation, and $\lambda$ is the number of candidate solutions generated from the parent generation. In this configuration, the best $\mu$ are kept if $\lambda > \mu$, where $\lambda$ must be great or equal to $\mu$. In addition to the so-called comma-selection Evolution Strategies algorithm, a plus-selection variation may be defined $(\mu + \lambda)-ES$, where the best members of the union of the $\mu$ and $\lambda$ generations compete based on objective fitness for a position in the next generation. The simplest configuration is the $(1+1)-ES$, which is a type of greedy hill climbing algorithm.
+Algorithm (below) provides a pseudocode listing of the $(\mu,\lambda)-ES$ algorithm for minimizing a cost function. The algorithm shows the adaptation of candidate solutions that co-adapt their own strategy parameters that influence the amount of mutation applied to a candidate solutions descendants.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -86,17 +86,17 @@ <h2><a name='procedure'>Procedure</a></h2>
 <strong><code>End</code></strong><br />
 <strong><code>Return</code></strong> ($S_{best}$)<br />
 </div>
-<div class='caption'>Pseudocode for $["(\\mu,\\lambda)"]$ Evolution Strategies.</div>
+<div class='caption'>Pseudocode for $(\mu,\lambda)$ Evolution Strategies.</div>
 
 
 
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Evolution Strategies uses problem specific representations, such as real values for continuous function optimization.</li>
-<li> The algorithm is commonly configured such that $["1 \\leq \\mu \\leq \\lambda"]$.</li>
-<li> The ratio of $["\\mu"]$ to $["\\lambda"]$ influences the amount of selection pressure (greediness) exerted by the algorithm.</li>
-<li> A contemporary update to the algorithms notation includes a $["\\rho"]$ as $["(\\mu/\\rho,\\lambda)-ES"]$ that specifies the number of parents that will contribute to each new candidate solution using a recombination operator.</li>
-<li> A classical rule used to govern the amount of mutation (standard deviation used in mutation for continuous function optimization) was the $["\\frac{1}{5}"]$-rule, where the ratio of successful mutations should be $["\\frac{1}{5}"]$ of all mutations. If it is greater the variance is increased, otherwise if the ratio is is less, the variance is decreased.</li>
+<li> The algorithm is commonly configured such that $1 \leq \mu \leq \lambda$.</li>
+<li> The ratio of $\mu$ to $\lambda$ influences the amount of selection pressure (greediness) exerted by the algorithm.</li>
+<li> A contemporary update to the algorithms notation includes a $\rho$ as $(\mu/\rho,\lambda)-ES$ that specifies the number of parents that will contribute to each new candidate solution using a recombination operator.</li>
+<li> A classical rule used to govern the amount of mutation (standard deviation used in mutation for continuous function optimization) was the $\frac{1}{5}$-rule, where the ratio of successful mutations should be $\frac{1}{5}$ of all mutations. If it is greater the variance is increased, otherwise if the ratio is is less, the variance is decreased.</li>
 <li> The comma-selection variation of the algorithm can be good for dynamic problem instances given its capability for continued exploration of the search space, whereas the plus-selection variation can be good for refinement and convergence.</li>
 </ul>
 
@@ -104,9 +104,9 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Evolution Strategies algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is a implementation of Evolution Strategies based on simple version described by B&auml;ck and Schwefel  [<a href='#Back1993b'>Back1993b</a>], which was also used as the basis of a detailed empirical study  [<a href='#Yao1997'>Yao1997</a>].
-The algorithm is an $["(30+20)-ES"]$ that adapts both the problem and strategy (standard deviations) variables.
+The algorithm is an $(30+20)-ES$ that adapts both the problem and strategy (standard deviations) variables.
 More contemporary implementations may modify the strategy variables differently, and include an additional set of adapted strategy parameters to influence the direction of mutation (see  [<a href='#Rudolph2000'>Rudolph2000</a>] for a concise description).
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/evolution/evolutionary_programming.html b/docs/nature-inspired/evolution/evolutionary_programming.html
index 118fb007..5c870ca2 100644
--- a/docs/nature-inspired/evolution/evolutionary_programming.html
+++ b/docs/nature-inspired/evolution/evolutionary_programming.html
@@ -115,7 +115,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Evolutionary Programming algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is an implementation of Evolutionary Programming based on the classical implementation for continuous function optimization by Fogel et al.  [<a href='#Fogel1991a'>Fogel1991a</a>] with per-variable adaptive variance based on Fogel's description for a self-adaptive variation on page 160 of his 1995 book  [<a href='#Fogel1995'>Fogel1995</a>].
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/evolution/gene_expression_programming.html b/docs/nature-inspired/evolution/gene_expression_programming.html
index 8560469e..11a5b6c6 100644
--- a/docs/nature-inspired/evolution/gene_expression_programming.html
+++ b/docs/nature-inspired/evolution/gene_expression_programming.html
@@ -107,13 +107,13 @@ <h2><a name='procedure'>Procedure</a></h2>
 
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
-<li> The length of a chromosome is defined by the number of genes, where a gene length is defined by $["h + t"]$. The $["h"]$ is a user defined parameter (such as 10), and $["t"]$ is defined as $["t = h (n-1) + 1"]$, where the $["n"]$ represents the maximum arity of functional nodes in the expression (such as 2 if the arithmetic functions $["\\times, \\div, -, +"]$ are used).</li>
+<li> The length of a chromosome is defined by the number of genes, where a gene length is defined by $h + t$. The $h$ is a user defined parameter (such as 10), and $t$ is defined as $t = h (n-1) + 1$, where the $n$ represents the maximum arity of functional nodes in the expression (such as 2 if the arithmetic functions $\times, \div, -, +$ are used).</li>
 <li> The mutation operator substitutes expressions along the genome, although must respect the gene rules such that function and terminal nodes are mutated in the head of genes, whereas only terminal nodes are substituted in the tail of genes.</li>
 <li> Crossover occurs between two selected parents from the population and can occur based on a one-point cross, two point cross, or a gene-based approach where genes are selected from the parents with uniform probability.</li>
 <li> An inversion operator may be used with a low probability that reverses a small sequence of symbols (1-3) within a section of a gene (tail or head).</li>
 <li> A transposition operator may be used that has a number of different modes, including: duplicate a small sequences (1-3) from somewhere on a gene to the head, small sequences on a gene to the root of the gene, and moving of entire genes in the chromosome. In the case of intra-gene transpositions, the sequence in the head of the gene is moved down to accommodate the copied sequence and the length of the head is truncated to maintain consistent gene sizes.</li>
 <li> A '?' may be included in the terminal set that represents a numeric constant from an array that is evolved on the end of the genome. The constants are read from the end of the genome and are substituted for '?' as the expression tree is created (in breadth first order). Finally the numeric constants are used as array indices in yet another chromosome of numerical values which are substituted into the expression tree.</li>
-<li> Mutation is low (such as $["\\frac{1}{L}"]$), selection can be any of the classical approaches (such as roulette wheel or tournament), and crossover rates are typically high (0.7 of offspring)</li>
+<li> Mutation is low (such as $\frac{1}{L}$), selection can be any of the classical approaches (such as roulette wheel or tournament), and crossover rates are typically high (0.7 of offspring)</li>
 <li> Use multiple sub-expressions linked together on hard problems when one gene is not sufficient to address the problem. The sub-expressions are linked using link expressions which are function nodes that are either statically defined (such as a conjunction) or evolved on the genome with the genes.</li>
 </ul>
 
@@ -121,7 +121,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Gene Expression Programming algorithm implemented in the Ruby Programming Language based on the seminal version proposed by Ferreira  [<a href='#Ferreira2001'>Ferreira2001</a>].
-The demonstration problem is an instance of symbolic regression $["f(x)=x^4+x^3+x^2+x"]$, where $["x\\in[1,10]"]$. The grammar used in this problem is: Functions: $["F=\\{+,-,\\div,\\times,\\}"]$ and Terminals: $["T=\\{x\\}"]$.
+The demonstration problem is an instance of symbolic regression $f(x)=x^4+x^3+x^2+x$, where $x\in[1,10]$. The grammar used in this problem is: Functions: $F={+,-,\div,\times,}$ and Terminals: $T={x}$.
 </p>
 <p>
 The algorithm uses binary tournament selection, uniform crossover and point mutations. The K-expression is decoded to an expression tree in a breadth-first manner, which is then parsed depth first as a Ruby expression string for display and direct evaluation.
diff --git a/docs/nature-inspired/evolution/genetic_algorithm.html b/docs/nature-inspired/evolution/genetic_algorithm.html
index 13ce838c..3fda1873 100644
--- a/docs/nature-inspired/evolution/genetic_algorithm.html
+++ b/docs/nature-inspired/evolution/genetic_algorithm.html
@@ -98,7 +98,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <li> Binary strings (referred to as 'bitstrings') are the classical representation as they can be decoded to almost any desired representation. Real-valued and integer variables can be decoded using the binary coded decimal method, one's or two's complement methods, or the gray code method, the latter of which is generally preferred.</li>
 <li> Problem specific representations and customized genetic operators should be adopted, incorporating as much prior information about the problem domain as possible.</li>
 <li> The size of the population must be large enough to provide sufficient coverage of the domain and mixing of the useful sub-components of the solution   [<a href='#Goldberg1992'>Goldberg1992</a>].</li>
-<li> The Genetic Algorithm is classically configured with a high probability of recombination (such as 95%-99% of the selected population) and a low probability of mutation (such as $["\\frac{1}{L}"]$ where $["L"]$ is the number of components in a solution)  [<a href='#Muhlenbein1992'>Muhlenbein1992</a>] [<a href='#Back1993'>Back1993</a>].</li>
+<li> The Genetic Algorithm is classically configured with a high probability of recombination (such as 95%-99% of the selected population) and a low probability of mutation (such as $\frac{1}{L}$ where $L$ is the number of components in a solution)  [<a href='#Muhlenbein1992'>Muhlenbein1992</a>] [<a href='#Back1993'>Back1993</a>].</li>
 <li> The fitness-proportionate selection of candidate solutions to contribute to the next generation should be neither too greedy (to avoid the takeover of fitter candidate solutions) nor too random.</li>
 </ul>
 
diff --git a/docs/nature-inspired/evolution/genetic_programming.html b/docs/nature-inspired/evolution/genetic_programming.html
index 5e051524..be8f479d 100644
--- a/docs/nature-inspired/evolution/genetic_programming.html
+++ b/docs/nature-inspired/evolution/genetic_programming.html
@@ -123,9 +123,9 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <li> A program may respond to zero or more input values and may produce one or more outputs.</li>
 <li> All functions used in the function node set must return a usable result. For example, the division function must return a sensible value (such as zero or one) when a division by zero occurs.</li>
 <li> All genetic operations ensure (or should ensure) that syntactically valid and executable programs are produced as a result of their application.</li>
-<li> The Genetic Programming algorithm is commonly configured with a high-probability of crossover ($["\\geq 90\\%"]$) and a low-probability of mutation ($["\\leq 1\\%"]$). Other operators such as reproduction and architecture alterations are used with moderate-level probabilities and fill in the probabilistic gap.</li>
+<li> The Genetic Programming algorithm is commonly configured with a high-probability of crossover ($\geq 90%$) and a low-probability of mutation ($\leq 1%$). Other operators such as reproduction and architecture alterations are used with moderate-level probabilities and fill in the probabilistic gap.</li>
 <li> Architecture altering operations are not limited to the duplication and deletion of sub-structures of a given program.</li>
-<li> The crossover genetic operator in the algorithm is commonly configured to select a function as a the cross-point with a high-probability ($["\\geq 90\\%"]$) and low-probability of selecting a terminal as a cross-point ($["\\leq 10\\%"]$).</li>
+<li> The crossover genetic operator in the algorithm is commonly configured to select a function as a the cross-point with a high-probability ($\geq 90%$) and low-probability of selecting a terminal as a cross-point ($\leq 10%$).</li>
 <li> The function set may also include control structures such as conditional statements and loop constructs.</li>
 <li> The Genetic Programing algorithm can be realized as a stack-based virtual machine as opposed to a call graph  [<a href='#Perkis1994'>Perkis1994</a>].</li>
 <li> The Genetic Programming algorithm can make use of Automatically Defined Functions (ADFs) that are sub-graphs and are promoted to the status of functions for reuse and are co-evolved with the programs.</li>
@@ -138,10 +138,10 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 Listing (below) provides an example of the Genetic Programming algorithm implemented in the Ruby Programming Language based on Koza and Poli's tutorial  [<a href='#Koza2005'>Koza2005</a>].
 </p>
 <p>
-The demonstration problem is an instance of a symbolic regression, where a function must be devised to match a set of observations. In this case the target function is a quadratic polynomial $["x^2+x+1"]$ where $["x \\in [-1,1]"]$. The observations are generated directly from the target function without noise for the purposes of this example. In practical problems, if one knew and had access to the target function then the genetic program would not be required.
+The demonstration problem is an instance of a symbolic regression, where a function must be devised to match a set of observations. In this case the target function is a quadratic polynomial $x^2+x+1$ where $x \in [-1,1]$. The observations are generated directly from the target function without noise for the purposes of this example. In practical problems, if one knew and had access to the target function then the genetic program would not be required.
 </p>
 <p>
-The algorithm is configured to search for a program with the function set $["\\{ +, -, \\times, \\div \\}"]$ and the terminal set $["\\{ X, R \\}"]$, where $["X"]$ is the input value, and $["R"]$ is a static random variable generated for a program $["X \\in [-5,5]"]$. A division by zero returns a value of one.
+The algorithm is configured to search for a program with the function set ${ +, -, \times, \div }$ and the terminal set ${ X, R }$, where $X$ is the input value, and $R$ is a static random variable generated for a program $X \in [-5,5]$. A division by zero returns a value of one.
 The fitness of a candidate solution is calculated by evaluating the program on range of random input values and calculating the Root Mean Squared Error (RMSE).
 The algorithm is configured with a 90% probability of crossover, 8% probability of reproduction (copying), and a 2% probability of mutation.
 For brevity, the algorithm does not implement the architecture altering genetic operation and does not bias crossover points towards functions over terminals.
diff --git a/docs/nature-inspired/evolution/grammatical_evolution.html b/docs/nature-inspired/evolution/grammatical_evolution.html
index e628f5f7..11fdfdf1 100644
--- a/docs/nature-inspired/evolution/grammatical_evolution.html
+++ b/docs/nature-inspired/evolution/grammatical_evolution.html
@@ -67,7 +67,7 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 A grammar is defined in Backus Normal Form (BNF), which is a context free grammar expressed as a series of production rules comprised of terminals and non-terminals.
-A variable-length binary string representation is used for the optimization process. Bits are read from the a candidate solutions genome in blocks of 8 called a codon, and decoded to an integer (in the range between 0 and $["2^{8}-1"]$). If the end of the binary string is reached when reading integers, the reading process loops back to the start of the string, effectively creating a circular genome. The integers are mapped to expressions from the BNF until a complete syntactically correct expression is formed. This may not use a solutions entire genome, or use the decoded genome more than once given it's circular nature.
+A variable-length binary string representation is used for the optimization process. Bits are read from the a candidate solutions genome in blocks of 8 called a codon, and decoded to an integer (in the range between 0 and $2^{8}-1$). If the end of the binary string is reached when reading integers, the reading process loops back to the start of the string, effectively creating a circular genome. The integers are mapped to expressions from the BNF until a complete syntactically correct expression is formed. This may not use a solutions entire genome, or use the decoded genome more than once given it's circular nature.
 Algorithm (below) provides a pseudocode listing of the Grammatical Evolution algorithm for minimizing a cost function.
 </p>
 <div class='pseudocode'>
@@ -112,7 +112,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Grammatical Evolution was designed to optimize programs (such as mathematical equations) to specific cost functions.</li>
 <li> Classical genetic operators used by the Genetic Algorithm may be used in the Grammatical Evolution algorithm, such as point mutations and one-point crossover.</li>
-<li> Codons (groups of bits mapped to an integer) are commonly fixed at 8 bits, proving a range of integers $["\\in [0,2^{8}-1]"]$ that is scaled to the range of rules using a modulo function.</li>
+<li> Codons (groups of bits mapped to an integer) are commonly fixed at 8 bits, proving a range of integers $\in [0,2^{8}-1]$ that is scaled to the range of rules using a modulo function.</li>
 <li> Additional genetic operators may be used with variable-length representations such as codon segments, duplication (add to the end), number of codons selected at random, and deletion.</li>
 </ul>
 
@@ -120,25 +120,25 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Grammatical Evolution algorithm implemented in the Ruby Programming Language based on the version described by O'Neill and Ryan  [<a href='#ONeill2001'>ONeill2001</a>].
-The demonstration problem is an instance of symbolic regression $["f(x)=x^4+x^3+x^2+x"]$, where $["x\\in[1,10]"]$.
+The demonstration problem is an instance of symbolic regression $f(x)=x^4+x^3+x^2+x$, where $x\in[1,10]$.
 The grammar used in this problem is:
 </p>
 <ul>
-<li> Non-terminals: $["N=\\{expr,op,pre\\_op\\}"]$</li>
-<li> Terminals: $["T=\\{+, -, \\div, \\times, x, 1.0\\}"]$</li>
-<li> Expression (program): $["S="]$<code>&lt;expr&gt;</code></li>
+<li> Non-terminals: $N={expr,op,pre_op}$</li>
+<li> Terminals: $T={+, -, \div, \times, x, 1.0}$</li>
+<li> Expression (program): $S=$<code>&lt;expr&gt;</code></li>
 </ul>
 <p>
 The production rules for the grammar in BNF are:
 </p>
 <ul>
-<li> <code>&lt;expr&gt;</code> $["::="]$ <code>&lt;expr&gt;&lt;op&gt;&lt;expr&gt;</code> , <code>(&lt;expr&gt;&lt;op&gt;&lt;expr&gt;)</code>, <code>&lt;pre_op&gt;(&lt;expr&gt;)</code>, <code>&lt;var&gt;</code></li>
-<li> <code>&lt;op&gt;</code> $["::="]$ $["+, -, \\div, \\times"]$</li>
-<li> <code>&lt;var&gt;</code> $["::="]$ x, 1.0</li>
+<li> <code>&lt;expr&gt;</code> $::=$ <code>&lt;expr&gt;&lt;op&gt;&lt;expr&gt;</code> , <code>(&lt;expr&gt;&lt;op&gt;&lt;expr&gt;)</code>, <code>&lt;pre_op&gt;(&lt;expr&gt;)</code>, <code>&lt;var&gt;</code></li>
+<li> <code>&lt;op&gt;</code> $::=$ $+, -, \div, \times$</li>
+<li> <code>&lt;var&gt;</code> $::=$ x, 1.0</li>
 </ul>
 <p>
 The algorithm uses point mutation and a codon-respecting one-point crossover operator. Binary tournament selection is used to determine the parent population's contribution to the subsequent generation.
-Binary strings are decoded to integers using an unsigned binary. Candidate solutions are then mapped directly into executable Ruby code and executed. A given candidate solution is evaluated by comparing its output against the target function and taking the sum of the absolute errors over a number of trials. The probabilities of point mutation, codon deletion, and codon duplication are hard coded as relative probabilities to each solution, although should be parameters of the algorithm. In this case they are heuristically defined as $["\\frac{1.0}{L}"]$, $["\\frac{0.5}{NC}"]$ and $["\\frac{1.0}{NC}"]$ respectively, where $["L"]$ is the total number of bits, and $["NC"]$ is the number of codons in a given candidate solution.
+Binary strings are decoded to integers using an unsigned binary. Candidate solutions are then mapped directly into executable Ruby code and executed. A given candidate solution is evaluated by comparing its output against the target function and taking the sum of the absolute errors over a number of trials. The probabilities of point mutation, codon deletion, and codon duplication are hard coded as relative probabilities to each solution, although should be parameters of the algorithm. In this case they are heuristically defined as $\frac{1.0}{L}$, $\frac{0.5}{NC}$ and $\frac{1.0}{NC}$ respectively, where $L$ is the total number of bits, and $NC$ is the number of codons in a given candidate solution.
 </p>
 <p>
 Solutions are evaluated by generating a number of random samples from the domain and calculating the mean error of the program to the expected outcome. Programs that contain a single term or those that return an invalid (NaN) or infinite result are penalized with an enormous error value.
diff --git a/docs/nature-inspired/evolution/learning_classifier_system.html b/docs/nature-inspired/evolution/learning_classifier_system.html
index f2a4315d..d88bc69a 100644
--- a/docs/nature-inspired/evolution/learning_classifier_system.html
+++ b/docs/nature-inspired/evolution/learning_classifier_system.html
@@ -53,7 +53,7 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 The actors of the system include detectors, messages, effectors, feedback, and classifiers. Detectors are used by the system to perceive the state of the environment. Messages are the discrete information packets passed from the detectors into the system. The system performs information processing on messages, and messages may directly result in actions in the environment. Effectors control the actions of the system on and within the environment. In addition to the system actively perceiving via its detections, it may also receive directed feedback from the environment (payoff). Classifiers are condition-action rules that provide a filter for messages. If a message satisfies the conditional part of the classifier, the action of the classifier triggers. Rules act as message processors.
-Message a fixed length bitstring. A classifier is defined as a ternary string with an alphabet $["\\in \\{1, 0, \\#\\}"]$, where the $["\\#"]$ represents do not care (matching either 1 or 0).
+Message a fixed length bitstring. A classifier is defined as a ternary string with an alphabet $\in {1, 0, #}$, where the $#$ represents do not care (matching either 1 or 0).
 </p>
 <p>
 The processing loop for the Learning Classifier system is as follows:
@@ -115,17 +115,17 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 </p>
 <ul>
 <li> Learning Classifier Systems are suited for problems with the following characteristics: perpetually novel events with significant noise, continual real-time requirements for action, implicitly or inexactly defined goals, and sparse payoff or reinforcement obtainable only through long sequences of tasks.</li>
-<li> The learning rate $["\\beta"]$ for a classifier's expected payoff, error, and fitness are typically in the range $["[0.1,0.2]"]$.</li>
-<li> The frequency of running the genetic algorithm $["\\theta_{GA}"]$ should be in the range $["[25,50]"]$.</li>
-<li> The discount factor used in multi-step programs $["\\gamma"]$ are typically in the around $["0.71"]$.</li>
-<li> The minimum error whereby classifiers are considered to have equal accuracy $["\\epsilon_{0}"]$ is typically 10% of the maximum reward.</li>
-<li> The probability of crossover in the genetic algorithm $["\\chi"]$ is typically in the range $["[0.5,1.0]"]$.</li>
-<li> The probability of mutating a single position in a classifier in the genetic algorithm $["\\mu"]$ is typically in the range $["[0.01,0.05]"]$.</li>
-<li> The experience threshold during classifier deletion $["\\theta_{del}"]$ is typically about 20.</li>
-<li> The experience threshold for a classifier during subsumption $["\\theta_{sub}"]$ is typically around 20.</li>
-<li> The initial values for a classifier's expected payoff $["p_1"]$, error $["\\epsilon_1"]$, and fitness $["f_1"]$ are typically small and close to zero.</li>
-<li> The probability of selecting a random action for the purposes of exploration $["p_{exp}"]$ is typically close to 0.5.</li>
-<li> The minimum number of different actions that must be specified in a match set $["\\theta_{mna}"]$ is usually the total number of possible actions in the environment for the input.</li>
+<li> The learning rate $\beta$ for a classifier's expected payoff, error, and fitness are typically in the range $[0.1,0.2]$.</li>
+<li> The frequency of running the genetic algorithm $\theta_{GA}$ should be in the range $[25,50]$.</li>
+<li> The discount factor used in multi-step programs $\gamma$ are typically in the around $0.71$.</li>
+<li> The minimum error whereby classifiers are considered to have equal accuracy $\epsilon_{0}$ is typically 10% of the maximum reward.</li>
+<li> The probability of crossover in the genetic algorithm $\chi$ is typically in the range $[0.5,1.0]$.</li>
+<li> The probability of mutating a single position in a classifier in the genetic algorithm $\mu$ is typically in the range $[0.01,0.05]$.</li>
+<li> The experience threshold during classifier deletion $\theta_{del}$ is typically about 20.</li>
+<li> The experience threshold for a classifier during subsumption $\theta_{sub}$ is typically around 20.</li>
+<li> The initial values for a classifier's expected payoff $p_1$, error $\epsilon_1$, and fitness $f_1$ are typically small and close to zero.</li>
+<li> The probability of selecting a random action for the purposes of exploration $p_{exp}$ is typically close to 0.5.</li>
+<li> The minimum number of different actions that must be specified in a match set $\theta_{mna}$ is usually the total number of possible actions in the environment for the input.</li>
 <li> Subsumption should be used on problem domains that are known contain well defined rules for mapping inputs to outputs.</li>
 </ul>
 
@@ -133,9 +133,9 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Learning Classifier System algorithm implemented in the Ruby Programming Language.
-The problem is an instance of a Boolean multiplexer called the 6-multiplexer. It can be described as a classification problem, where each of the $["2^6"]$ patterns of bits is associated with a boolean class $["\\in \\{1,0\\}"]$. For this problem instance, the first two bits may be decoded as an address into the remaining four bits that specify the class (for example in 100011, '10' decode to the index of '2' in the remaining 4 bits making the class '1'). In propositional logic this problem instance may be described as $["F=(\\neg x_0) (\\neg x_1) x_2 + (\\neg x_0) x_1 x_3 + x_0 (\\neg x_1) x_4 + x_0 x_1 x_5"]$.
+The problem is an instance of a Boolean multiplexer called the 6-multiplexer. It can be described as a classification problem, where each of the $2^6$ patterns of bits is associated with a boolean class $\in {1,0}$. For this problem instance, the first two bits may be decoded as an address into the remaining four bits that specify the class (for example in 100011, '10' decode to the index of '2' in the remaining 4 bits making the class '1'). In propositional logic this problem instance may be described as $F=(\neg x_0) (\neg x_1) x_2 + (\neg x_0) x_1 x_3 + x_0 (\neg x_1) x_4 + x_0 x_1 x_5$.
 The algorithm is an instance of XCS based on the description provided by Butz and Wilson  [<a href='#Butz2002a'>Butz2002a</a>] with the parameters based on the application of XCS to Boolean multiplexer problems by Wilson  [<a href='#Wilson1995'>Wilson1995</a>] [<a href='#Wilson1998'>Wilson1998</a>].
-The population is grown as needed, and subsumption which would be appropriate for the Boolean multiplexer problem was not used for brevity. The multiplexer problem is a single step problem, so the complexities of delayed payoff are not required. A number of parameters were hard coded to recommended values, specifically: $["\\alpha=0.1"]$, $["v=-0.5"]$, $["\\delta=0.1"]$ and $["P_{\\#}=\\frac{1}{3}"]$.
+The population is grown as needed, and subsumption which would be appropriate for the Boolean multiplexer problem was not used for brevity. The multiplexer problem is a single step problem, so the complexities of delayed payoff are not required. A number of parameters were hard coded to recommended values, specifically: $\alpha=0.1$, $v=-0.5$, $\delta=0.1$ and $P_{#}=\frac{1}{3}$.
 </p>
 <pre class='prettyprint lang-rb'>
 def neg(bit)
diff --git a/docs/nature-inspired/evolution/nsga.html b/docs/nature-inspired/evolution/nsga.html
index a9928747..a334c2e7 100644
--- a/docs/nature-inspired/evolution/nsga.html
+++ b/docs/nature-inspired/evolution/nsga.html
@@ -111,9 +111,9 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Non-dominated Sorting Genetic Algorithm II (NSGA-II) implemented in the Ruby Programming Language.
-The demonstration problem is an instance of continuous multiple objective function optimization called SCH (problem one in  [<a href='#Deb2002'>Deb2002</a>]). The problem seeks the minimum of two functions: $["f1=\\sum_{i=1}^n x_{i}^2"]$ and $["f2=\\sum_{i=1}^n (x_{i}-2)^2"]$, $["-10\\leq x_i \\leq 10"]$ and $["n=1"]$. The optimal solution for this function are $["x \\in [0,2]"]$.
+The demonstration problem is an instance of continuous multiple objective function optimization called SCH (problem one in  [<a href='#Deb2002'>Deb2002</a>]). The problem seeks the minimum of two functions: $f1=\sum_{i=1}^n x_{i}^2$ and $f2=\sum_{i=1}^n (x_{i}-2)^2$, $-10\leq x_i \leq 10$ and $n=1$. The optimal solution for this function are $x \in [0,2]$.
 The algorithm is an implementation of NSGA-II based on the presentation by Deb et al.  [<a href='#Deb2002'>Deb2002</a>].
-The algorithm uses a binary string representation (16 bits per objective function parameter) that is decoded and rescaled to the function domain. The implementation uses a uniform crossover operator and point mutations with a fixed mutation rate of $["\\frac{1}{L}"]$, where $["L"]$ is the number of bits in a solution's binary string.
+The algorithm uses a binary string representation (16 bits per objective function parameter) that is decoded and rescaled to the function domain. The implementation uses a uniform crossover operator and point mutations with a fixed mutation rate of $\frac{1}{L}$, where $L$ is the number of bits in a solution's binary string.
 </p>
 <pre class='prettyprint lang-rb'>
 def objective1(vector)
@@ -321,7 +321,7 @@ <h2><a name='references'>References</a></h2>
 <h3><a name='primary_sources'>Primary Sources</a></h3>
 <p>
 Srinivas and Deb proposed the NSGA inspired by Goldberg's notion of a non-dominated sorting procedure  [<a href='#Srinivas1994'>Srinivas1994</a>]. Goldberg proposed a non-dominated sorting procedure in his book in considering the biases in the Pareto optimal solutions provided by VEGA  [<a href='#Goldberg1989'>Goldberg1989</a>]. Srinivas and Deb's NSGA used the sorting procedure as a ranking selection method, and a fitness sharing niching method to maintain stable sub-populations across the Pareto front.
-Deb et al. later extended NSGA to address three criticism of the approach: the $["O(mN^3)"]$ time complexity, the lack of elitism, and the need for a sharing parameter for the fitness sharing niching method  [<a href='#Deb2000'>Deb2000</a>] [<a href='#Deb2002'>Deb2002</a>].
+Deb et al. later extended NSGA to address three criticism of the approach: the $O(mN^3)$ time complexity, the lack of elitism, and the need for a sharing parameter for the fitness sharing niching method  [<a href='#Deb2000'>Deb2000</a>] [<a href='#Deb2002'>Deb2002</a>].
 </p>
 
 
diff --git a/docs/nature-inspired/evolution/spea.html b/docs/nature-inspired/evolution/spea.html
index acb9e461..48342baa 100644
--- a/docs/nature-inspired/evolution/spea.html
+++ b/docs/nature-inspired/evolution/spea.html
@@ -55,9 +55,9 @@ <h2><a name='procedure'>Procedure</a></h2>
 <p>
 Algorithm (below) provides a pseudocode listing of the Strength Pareto Evolutionary Algorithm 2 (SPEA2) for minimizing a cost function.
 The <code>CalculateRawFitness</code> function calculates the raw fitness as the sum of the strength values of the solutions that dominate a given candidate, where strength is the number of solutions that a give solution dominate.
-The <code>CandidateDensity</code> function estimates the density of an area of the Pareto front as $["\\frac{1.0}{\\sigma^k + 2}"]$ where $["\\sigma^k"]$ is the Euclidean distance of the objective values between a given solution the $["k"]$th nearest neighbor of the solution, and $["k"]$ is the square root of the size of the population and archive combined.
+The <code>CandidateDensity</code> function estimates the density of an area of the Pareto front as $\frac{1.0}{\sigma^k + 2}$ where $\sigma^k$ is the Euclidean distance of the objective values between a given solution the $k$th nearest neighbor of the solution, and $k$ is the square root of the size of the population and archive combined.
 The <code>PopulateWithRemainingBest</code> function iteratively fills the archive with the remaining candidate solutions in order of fitness.
-The <code>RemoveMostSimilar</code> function truncates the archive population removing those members with the smallest $["\\sigma^k"]$ values as calculated against the archive.
+The <code>RemoveMostSimilar</code> function truncates the archive population removing those members with the smallest $\sigma^k$ values as calculated against the archive.
 The <code>SelectParents</code> function selects parents from a population using a Genetic Algorithm selection method such as binary tournament selection. The <code>CrossoverAndMutation</code> function performs the crossover and mutation genetic operators from the Genetic Algorithm.
 </p>
 <div class='pseudocode'>
@@ -98,7 +98,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> SPEA was designed for and is suited to combinatorial and continuous function multiple objective optimization problem instances.</li>
 <li> A binary representation can be used for continuous function optimization problems in conjunction with classical genetic operators such as one-point crossover and point mutation.</li>
-<li> A $["k"]$ value of 1 may be used for efficiency whilst still providing useful results.</li>
+<li> A $k$ value of 1 may be used for efficiency whilst still providing useful results.</li>
 <li> The size of the archive is commonly smaller than the size of the population.</li>
 <li> There is a lot of room for implementation optimization in density and Pareto dominance calculations.</li>
 </ul>
@@ -107,9 +107,9 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Strength Pareto Evolutionary Algorithm 2 (SPEA2) implemented in the Ruby Programming Language.
-The demonstration problem is an instance of continuous multiple objective function optimization called SCH (problem one in  [<a href='#Deb2002'>Deb2002</a>]). The problem seeks the minimum of two functions: $["f1=\\sum_{i=1}^n x_{i}^2"]$ and $["f2=\\sum_{i=1}^n (x_{i}-2)^2"]$, $["-10\\leq x_i \\leq 10"]$ and $["n=1"]$. The optimal solutions for this function are $["x \\in [0,2]"]$.
+The demonstration problem is an instance of continuous multiple objective function optimization called SCH (problem one in  [<a href='#Deb2002'>Deb2002</a>]). The problem seeks the minimum of two functions: $f1=\sum_{i=1}^n x_{i}^2$ and $f2=\sum_{i=1}^n (x_{i}-2)^2$, $-10\leq x_i \leq 10$ and $n=1$. The optimal solutions for this function are $x \in [0,2]$.
 The algorithm is an implementation of SPEA2 based on the presentation by Zitzler, Laumanns, and Thiele  [<a href='#Zitzler2002'>Zitzler2002</a>].
-The algorithm uses a binary string representation (16 bits per objective function parameter) that is decoded and rescaled to the function domain. The implementation uses a uniform crossover operator and point mutations with a fixed mutation rate of $["\\frac{1}{L}"]$, where $["L"]$ is the number of bits in a solution's binary string.
+The algorithm uses a binary string representation (16 bits per objective function parameter) that is decoded and rescaled to the function domain. The implementation uses a uniform crossover operator and point mutations with a fixed mutation rate of $\frac{1}{L}$, where $L$ is the number of bits in a solution's binary string.
 </p>
 <pre class='prettyprint lang-rb'>
 def objective1(vector)
diff --git a/docs/nature-inspired/immune/airs.html b/docs/nature-inspired/immune/airs.html
index 8b117cfd..ad0577f2 100644
--- a/docs/nature-inspired/immune/airs.html
+++ b/docs/nature-inspired/immune/airs.html
@@ -70,7 +70,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $dist(x,c) = \sum_{i=1}^{n} (x_i - c_i)^2$
 <p>
-where $["n"]$ is the number of attributes, $["x"]$ is the input vector and $["c"]$ is a given cell vector. The variation of cells during cloning (somatic hypermutation) occurs inversely proportional to the stimulation of a given cell to an input pattern.
+where $n$ is the number of attributes, $x$ is the input vector and $c$ is a given cell vector. The variation of cells during cloning (somatic hypermutation) occurs inversely proportional to the stimulation of a given cell to an input pattern.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -117,20 +117,20 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The AIRS was designed as a supervised algorithm for classification problem domains.</li>
 <li> The AIRS is non-parametric, meaning that it does not rely on assumptions about that structure of the function that is is approximating.</li>
-<li> Real-values in input vectors should be normalized such that $["x \\in [0,1)"]$.</li>
+<li> Real-values in input vectors should be normalized such that $x \in [0,1)$.</li>
 <li> Euclidean distance is commonly used to measure the distance between real-valued vectors (affinity calculation), although other distance measures may be used (such as dot product), and data specific distance measures may be required for non-scalar attributes.</li>
 <li> Cells may be initialized with small random values or more commonly with values from instances in the training set.</li>
-<li> A cell's affinity is typically minimizing, where as a cells stimulation is maximizing and typically $["\\in [0,1]"]$.</li>
+<li> A cell's affinity is typically minimizing, where as a cells stimulation is maximizing and typically $\in [0,1]$.</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Artificial Immune Recognition System implemented in the Ruby Programming Language.
-The problem is a contrived classification problem in a 2-dimensional domain $["x\\in[0,1], y\\in[0,1]"]$ with two classes: 'A' ($["x\\in[0,0.4999999], y\\in[0,0.4999999]"]$) and 'B' ($["x\\in[0.5,1], y\\in[0.5,1]"]$).
+The problem is a contrived classification problem in a 2-dimensional domain $x\in[0,1], y\in[0,1]$ with two classes: 'A' ($x\in[0,0.4999999], y\in[0,0.4999999]$) and 'B' ($x\in[0.5,1], y\in[0.5,1]$).
 </p>
 <p>
-The algorithm is an implementation of the AIRS2 algorithm  [<a href='#Watkins2002b'>Watkins2002b</a>]. An initial pool of memory cells is created, one cell for each class. Euclidean distance divided by the maximum possible distance in the domain is taken as the affinity and stimulation is taken as $["1.0-affinity"]$. The meta-dynamics for memory cells (competition for input patterns) is not performed and may be added into the implementation as an extension.
+The algorithm is an implementation of the AIRS2 algorithm  [<a href='#Watkins2002b'>Watkins2002b</a>]. An initial pool of memory cells is created, one cell for each class. Euclidean distance divided by the maximum possible distance in the domain is taken as the affinity and stimulation is taken as $1.0-affinity$. The meta-dynamics for memory cells (competition for input patterns) is not performed and may be added into the implementation as an extension.
 </p>
 <pre class='prettyprint lang-rb'>
 def random_vector(minmax)
diff --git a/docs/nature-inspired/immune/clonal_selection_algorithm.html b/docs/nature-inspired/immune/clonal_selection_algorithm.html
index c3d8d238..c6f9e146 100644
--- a/docs/nature-inspired/immune/clonal_selection_algorithm.html
+++ b/docs/nature-inspired/immune/clonal_selection_algorithm.html
@@ -103,18 +103,18 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The CLONALG was designed as a general machine learning approach and has been applied to pattern recognition, function optimization, and combinatorial optimization problem domains.</li>
 <li> Binary string representations are used and decoded to a representation suitable for a specific problem domain.</li>
-<li> The number of clones created for each selected member is calculated as a function of the repertoire size $["N_c=round(\\beta \\cdot N)"]$, where $["\\beta"]$ is the user parameter $["Clone_{rate}"]$.</li>
+<li> The number of clones created for each selected member is calculated as a function of the repertoire size $N_c=round(\beta \cdot N)$, where $\beta$ is the user parameter $Clone_{rate}$.</li>
 <li> A rank-based affinity-proportionate function is used to determine the number of clones created for selected members of the population for pattern recognition problem instances.</li>
 <li> The number of random antibodies inserted each iteration is typically very low (1-2).</li>
 <li> Point mutations (bit-flips) are used in the hypermutation operation.</li>
-<li> The function $["exp(-\\rho \\cdot f)"]$ is used to determine the probability of individual component mutation for a given candidate solution, where $["f"]$ is the candidates affinity (normalized maximizing cost value), and $["\\rho"]$ is the user parameter $["Mutation_{rate}"]$.</li>
+<li> The function $exp(-\rho \cdot f)$ is used to determine the probability of individual component mutation for a given candidate solution, where $f$ is the candidates affinity (normalized maximizing cost value), and $\rho$ is the user parameter $Mutation_{rate}$.</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Clonal Selection Algorithm (CLONALG) implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is implemented as described by de Castro and Von Zuben for function optimization  [<a href='#Castro2002a'>Castro2002a</a>].
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/immune/dca.html b/docs/nature-inspired/immune/dca.html
index cd843999..3dafc406 100644
--- a/docs/nature-inspired/immune/dca.html
+++ b/docs/nature-inspired/immune/dca.html
@@ -60,7 +60,7 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 Algorithm (below) provides pseudocode for training a pool of cells in the Dendritic Cell Algorithm, specifically the Deterministic Dendritic Cell Algorithm. Mature migrated cells associate their collected input patterns with anomalies, whereas semi-mature migrated cells associate their collected input patterns as normal.
-The resulting migrated cells can then be used to classify input patterns as normal or anomalous. This can be done through sampling the cells and using a voting mechanism, or more elaborate methods such as a 'mature context antigen value' (MCAV) that uses $["\\frac{M}{Ag}"]$ (where $["M"]$ is the number of mature cells with the antigen and $["Ag"]$ is the sum of the exposures to the antigen by those mature cells), which gives a probability of a pattern being an anomaly.
+The resulting migrated cells can then be used to classify input patterns as normal or anomalous. This can be done through sampling the cells and using a voting mechanism, or more elaborate methods such as a 'mature context antigen value' (MCAV) that uses $\frac{M}{Ag}$ (where $M$ is the number of mature cells with the antigen and $Ag$ is the sum of the exposures to the antigen by those mature cells), which gives a probability of a pattern being an anomaly.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -102,17 +102,17 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Dendritic Cell Algorithm is not specifically a classification algorithm, it may be considered a data filtering method for use in anomaly detection problems.</li>
 <li> The canonical algorithm is designed to operate on a single discrete, categorical or ordinal input and two probabilistic specific signals indicating the heuristic danger or safety of the input.</li>
-<li> The <code>danger</code> and <code>safe</code> signals are problem specific signals of the risk that the input pattern is an anomaly or is normal, both typically  $["\\in [0,100]"]$.</li>
+<li> The <code>danger</code> and <code>safe</code> signals are problem specific signals of the risk that the input pattern is an anomaly or is normal, both typically  $\in [0,100]$.</li>
 <li> The <code>danger</code> and <code>safe</code> signals do not have to be reciprocal, meaning they may provide conflicting information.</li>
 <li> The system was designed to be used in real-time anomaly detection problems, not just static problem.</li>
-<li> Each cells migration threshold is set separately, typically $["\\in [5,15]"]$</li>
+<li> Each cells migration threshold is set separately, typically $\in [5,15]$</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Dendritic Cell Algorithm implemented in the Ruby Programming Language, specifically the Deterministic Dendritic Cell Algorithm (dDCA).
-The problem is a contrived anomaly-detection problem with ordinal inputs $["x\\ \\in\\ [0,50)"]$ , where values that divide by 10 with no remainder are considered anomalies. Probabilistic safe and danger signal functions are provided, suggesting danger signals correctly with $["P(danger)=0.70"]$, and safe signals correctly with $["P(safe)=0.95"]$.
+The problem is a contrived anomaly-detection problem with ordinal inputs $x \in [0,50)$ , where values that divide by 10 with no remainder are considered anomalies. Probabilistic safe and danger signal functions are provided, suggesting danger signals correctly with $P(danger)=0.70$, and safe signals correctly with $P(safe)=0.95$.
 </p>
 <p>
 The algorithm is an implementation of the Deterministic Dendritic Cell Algorithm (dDCA) as described in  [<a href='#Stibor2009'>Stibor2009</a>] [<a href='#Greensmith2008'>Greensmith2008</a>], with verification from  [<a href='#Greensmith2006a'>Greensmith2006a</a>]. The algorithm was designed to be executed as three asynchronous processes in a real-time or semi-real time environment. For demonstration purposes, the implementation separated out the three main processes and executed the sequentially as a training and cell promotion phase followed by a test (labeling phase).
diff --git a/docs/nature-inspired/immune/immune_network_algorithm.html b/docs/nature-inspired/immune/immune_network_algorithm.html
index 674f1c5b..bf630e45 100644
--- a/docs/nature-inspired/immune/immune_network_algorithm.html
+++ b/docs/nature-inspired/immune/immune_network_algorithm.html
@@ -112,8 +112,8 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <li> The addition of random cells each iteration adds a random-restart like capability to the algorithms.</li>
 <li> Suppression based on cell similarity provides a mechanism for reducing redundancy.</li>
 <li> The population size is dynamic, and if it continues to grow it may be an indication of a problem with many local optima or that the affinity threshold may needs to be increased.</li>
-<li> Affinity proportionate mutation is performed using $["c' = c + \\alpha \\times N(1,0)"]$ where $["\\alpha = \\frac{1}{\\beta} \\times exp(-f)"]$, $["N"]$ is a Guassian random number, and $["f"]$ is the fitness of the parent cell, $["\\beta"]$ controls the decay of the function and can be set to 100.</li>
-<li> The affinity threshold is problem and representation specific, for example a $["AffinityThreshold"]$ may be set to an arbitrary value such as 0.1 on a continuous function domain, or calculated as a percentage of the size of the problem space.</li>
+<li> Affinity proportionate mutation is performed using $c' = c + \alpha \times N(1,0)$ where $\alpha = \frac{1}{\beta} \times exp(-f)$, $N$ is a Guassian random number, and $f$ is the fitness of the parent cell, $\beta$ controls the decay of the function and can be set to 100.</li>
+<li> The affinity threshold is problem and representation specific, for example a $AffinityThreshold$ may be set to an arbitrary value such as 0.1 on a continuous function domain, or calculated as a percentage of the size of the problem space.</li>
 <li> The number of random cells inserted may be 40% of the population size.</li>
 <li> The number of clones created for a cell may be small, such as 10.</li>
 </ul>
@@ -122,7 +122,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Optimization Artificial Immune Network (opt-aiNet) implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is an implementation based on the specification by de Castro and Von Zuben  [<a href='#Castro2002c'>Castro2002c</a>].
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/immune/negative_selection_algorithm.html b/docs/nature-inspired/immune/negative_selection_algorithm.html
index 0c07091c..7e019a11 100644
--- a/docs/nature-inspired/immune/negative_selection_algorithm.html
+++ b/docs/nature-inspired/immune/negative_selection_algorithm.html
@@ -109,7 +109,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Negative Selection Algorithm was designed for change detection, novelty detection, intrusion detection and similar pattern recognition and two-class classification problem domains.</li>
-<li> Traditional negative selection algorithms used binary representations and binary matching rules such as Hamming distance, and $["r"]$-contiguous bits.</li>
+<li> Traditional negative selection algorithms used binary representations and binary matching rules such as Hamming distance, and $r$-contiguous bits.</li>
 <li> A data representation should be selected that is most suitable for a given problem domain, and a matching rule is in turn selected or tailored to the data representation.</li>
 <li> Detectors can be prepared with no prior knowledge of the problem domain other than the known (normal or self) dataset.</li>
 <li> The algorithm can be configured to balance between detector convergence (quality of the matches) and the space complexity (number of detectors).</li>
@@ -120,7 +120,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Negative Selection Algorithm implemented in the Ruby Programming Language.
-The demonstration problem is a two-class classification problem where samples are drawn from a two-dimensional domain, where $["x_i \\in [0,1]"]$. Those samples in $["1.0>x_i>0.5"]$ are classified as self and the rest of the space belongs to the non-self class. Samples are drawn from the self class and presented to the algorithm for the preparation of pattern detectors for classifying unobserved samples from the non-self class.
+The demonstration problem is a two-class classification problem where samples are drawn from a two-dimensional domain, where $x_i \in [0,1]$. Those samples in $1.0>x_i>0.5$ are classified as self and the rest of the space belongs to the non-self class. Samples are drawn from the self class and presented to the algorithm for the preparation of pattern detectors for classifying unobserved samples from the non-self class.
 The algorithm creates a set of detectors that do not match the self data, and are then applied to a set of randomly generated samples from the domain. The algorithm uses a real-valued representation. The Euclidean distance function is used during matching and a minimum distance value is specified as a user parameter for approximate matches between patterns. The algorithm includes the additional computationally expensive check for duplicates in the preparation of the self dataset and the detector set.
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/introduction.html b/docs/nature-inspired/introduction.html
index 86573f7c..94a40899 100644
--- a/docs/nature-inspired/introduction.html
+++ b/docs/nature-inspired/introduction.html
@@ -193,7 +193,7 @@ <h3><a name='function_optimization'>Function Optimization</a></h3>
 
 <h4><a name='problem_description'>Problem Description</a></h4>
 <p>
-Mathematically, optimization is defined as the search for a combination of parameters commonly referred to as decision variables ($["x = \\left\\{x_1, x_2, x_3, \\ldots x_n\\right\\}"]$) which minimize or maximize some ordinal quantity ($["c"]$) (typically a scalar  called a score or cost) assigned by an objective function or cost function ($["f"]$), under a set of constraints ($["g = \\left\\{g_1, g_2, g_3, \\ldots g_n\\right\\}"]$). For example, a general minimization case would be as follows: $["f(x') \\leq f(x), \\forall x_i \\in x"]$. Constraints may provide boundaries on decision variables (for example in a real-value hypercube $["\\Re^n"]$), or may generally define regions of feasibility and in-feasibility in the decision variable space. In applied mathematics the field may be referred to as Mathematical Programming. More generally the field may be referred to as Global or Function Optimization given the focus on the objective function. For more general information on optimization refer to Horst et al.  [<a href='#Horst2000'>Horst2000</a>].
+Mathematically, optimization is defined as the search for a combination of parameters commonly referred to as decision variables ($x = \left{x_1, x_2, x_3, \ldots x_n\right}$) which minimize or maximize some ordinal quantity ($c$) (typically a scalar  called a score or cost) assigned by an objective function or cost function ($f$), under a set of constraints ($g = \left{g_1, g_2, g_3, \ldots g_n\right}$). For example, a general minimization case would be as follows: $f(x') \leq f(x), \forall x_i \in x$. Constraints may provide boundaries on decision variables (for example in a real-value hypercube $\Re^n$), or may generally define regions of feasibility and in-feasibility in the decision variable space. In applied mathematics the field may be referred to as Mathematical Programming. More generally the field may be referred to as Global or Function Optimization given the focus on the objective function. For more general information on optimization refer to Horst et al.  [<a href='#Horst2000'>Horst2000</a>].
 </p>
 
 
@@ -216,7 +216,7 @@ <h3><a name='function_approximation'>Function Approximation</a></h3>
 
 <h4><a name='problem_description'>Problem Description</a></h4>
 <p>
-Function Approximation is the problem of finding a function ($["f"]$) that approximates a target function ($["g"]$), where typically the approximated function is selected based on a sample of observations ($["x"]$, also referred to as the training set) taken from the unknown target function.
+Function Approximation is the problem of finding a function ($f$) that approximates a target function ($g$), where typically the approximated function is selected based on a sample of observations ($x$, also referred to as the training set) taken from the unknown target function.
 In machine learning, the function approximation formalism is used to describe general problem types commonly referred to as pattern recognition, such as classification, clustering, and curve fitting (called a decision or discrimination function). Such general problem types are described in terms of approximating an unknown Probability Density Function (PDF), which underlies the relationships in the problem space, and is represented in the sample data. This perspective of such problems is commonly referred to as statistical machine learning and/or density estimation  [<a href='#Fukunaga1990'>Fukunaga1990</a>] [<a href='#Bishop1995'>Bishop1995</a>].
 </p>
 
@@ -288,7 +288,7 @@ <h3><a name='inductive_learning'>Inductive Learning</a></h3>
 The method of acquiring information is called inductive learning or learning from example, where the approach uses the implicit assumption that specific examples are representative of the broader information content of the environment, specifically with regard to anticipated need. Many unconventional optimization approaches maintain a single candidate solution, a population of samples, or a compression thereof that provides both an instantaneous representation of all of the information acquired by the process, and the basis for generating and making future decisions.
 </p>
 <p>
-This method of simultaneously acquiring and improving information from the domain and the optimization of decision making (where to direct future effort) is called the $["k"]$-armed bandit (two-armed and multi-armed bandit) problem from the field of statistical decision making known as game theory  [<a href='#Robbins1952'>Robbins1952</a>] [<a href='#Bergemann2006'>Bergemann2006</a>]. This formalism considers the capability of a strategy to allocate available resources proportional to the future payoff the strategy is expected to receive. The classic example is the 2-armed bandit problem used by Goldberg to describe the behavior of the genetic algorithm  [<a href='#Goldberg1989'>Goldberg1989</a>]. The example involves an agent that learns which one of the two slot machines provides more return by pulling the handle of each (sampling the domain) and biasing future handle pulls proportional to the expected utility, based on the probabilistic experience with the past distribution of the payoff. The formalism may also be used to understand the properties of inductive learning demonstrated by the adaptive behavior of most unconventional optimization algorithms.
+This method of simultaneously acquiring and improving information from the domain and the optimization of decision making (where to direct future effort) is called the $k$-armed bandit (two-armed and multi-armed bandit) problem from the field of statistical decision making known as game theory  [<a href='#Robbins1952'>Robbins1952</a>] [<a href='#Bergemann2006'>Bergemann2006</a>]. This formalism considers the capability of a strategy to allocate available resources proportional to the future payoff the strategy is expected to receive. The classic example is the 2-armed bandit problem used by Goldberg to describe the behavior of the genetic algorithm  [<a href='#Goldberg1989'>Goldberg1989</a>]. The example involves an agent that learns which one of the two slot machines provides more return by pulling the handle of each (sampling the domain) and biasing future handle pulls proportional to the expected utility, based on the probabilistic experience with the past distribution of the payoff. The formalism may also be used to understand the properties of inductive learning demonstrated by the adaptive behavior of most unconventional optimization algorithms.
 </p>
 <p>
 The stochastic iterative process of generate and test can be computationally wasteful, potentially re-searching areas of the problem space already searched, and requiring many trials or samples in order to achieve a 'good enough' solution.
diff --git a/docs/nature-inspired/neural/backpropagation.html b/docs/nature-inspired/neural/backpropagation.html
index 7a40db57..ecaf123d 100644
--- a/docs/nature-inspired/neural/backpropagation.html
+++ b/docs/nature-inspired/neural/backpropagation.html
@@ -67,7 +67,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $activation = \bigg(\sum_{k=1}^{n} w_{k} \times x_{ki}\bigg) + w_{bias} \times 1.0$
 <p>
-where $["n"]$ is the number of weights and inputs, $["x_{ki}"]$ is the $["k^{th}"]$ attribute on the $["i^{th}"]$ input pattern, and $["w_{bias}"]$ is the bias weight. A logistic transfer function (sigmoid) is used to calculate the output for a neuron $["\\in [0,1]"]$ and provide nonlinearities between in the input and output signals: $["\\frac{1}{1+exp(-a)}"]$, where $["a"]$ represents the neuron activation.
+where $n$ is the number of weights and inputs, $x_{ki}$ is the $k^{th}$ attribute on the $i^{th}$ input pattern, and $w_{bias}$ is the bias weight. A logistic transfer function (sigmoid) is used to calculate the output for a neuron $\in [0,1]$ and provide nonlinearities between in the input and output signals: $\frac{1}{1+exp(-a)}$, where $a$ represents the neuron activation.
 </p>
 <p>
 The weight updates use the delta rule, specifically a modified delta rule where error is backwardly propagated through the network, starting at the output layer and weighted back through the previous layers. The following describes the back-propagation of error and weight updates for a single pattern.
@@ -77,25 +77,25 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $es_i = (c_i - o_i) \times td_i$
 <p>
-where $["es_i"]$ is the error signal for the $["i^{th}"]$ node, $["c_i"]$ is the expected output and $["o_i"]$ is the actual output for the $["i^{th}"]$ node. The $["td"]$ term is the derivative of the output of the $["i^{th}"]$ node. If the sigmod transfer function is used, $["td_i"]$ would be $["o_i \\times (1-o_i)"]$ For the hidden nodes, the error signal is the sum of the weighted error signals from the next layer.
+where $es_i$ is the error signal for the $i^{th}$ node, $c_i$ is the expected output and $o_i$ is the actual output for the $i^{th}$ node. The $td$ term is the derivative of the output of the $i^{th}$ node. If the sigmod transfer function is used, $td_i$ would be $o_i \times (1-o_i)$ For the hidden nodes, the error signal is the sum of the weighted error signals from the next layer.
 </p>
 $es_i = \bigg(\sum_{k=1}^n (w_{ik} \times es_k)\bigg) \times td_i$
 <p>
-where $["es_i"]$ is the error signal for the $["i^{th}"]$ node, $["w_{ik}"]$ is the weight between the $["i^{th}"]$ and the $["k^{th}"]$ nodes, and $["es_k"]$ is the error signal of the $["k_th"]$ node.
+where $es_i$ is the error signal for the $i^{th}$ node, $w_{ik}$ is the weight between the $i^{th}$ and the $k^{th}$ nodes, and $es_k$ is the error signal of the $k_th$ node.
 </p>
 <p>
 The error derivatives for each weight are calculated by combining the input to each node and the error signal for the node.
 </p>
 $ed_i = \sum_{k=1}^n es_i \times x_k$
 <p>
-where $["ed_i"]$ is the error derivative for the $["i^{th}"]$ node, $["es_i"]$ is the error signal for the $["i^{th}"]$ node and $["x_k"]$ is the input from the $["k^{th}"]$ node in the previous layer. This process include the bias input that has a constant value.
+where $ed_i$ is the error derivative for the $i^{th}$ node, $es_i$ is the error signal for the $i^{th}$ node and $x_k$ is the input from the $k^{th}$ node in the previous layer. This process include the bias input that has a constant value.
 </p>
 <p>
-Weights are updated in a direction that reduces the error derivative $["ed_i"]$ (error assigned to the weight), metered by a learning coefficient.
+Weights are updated in a direction that reduces the error derivative $ed_i$ (error assigned to the weight), metered by a learning coefficient.
 </p>
 $w_i(t+1) = w_i(t) + (ed_k \times learn_{rate})$
 <p>
-where $["w_i(t+1)"]$ is the updated $["i^{th}"]$ weight, $["ed_k"]$ is the error derivative for the $["k^{th}"]$ node and $["learn_{rate}"]$ is an update coefficient parameter.
+where $w_i(t+1)$ is the updated $i^{th}$ weight, $ed_k$ is the error derivative for the $k^{th}$ node and $learn_{rate}$ is an update coefficient parameter.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -121,12 +121,12 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Back-propagation algorithm can be used to train a multi-layer network to approximate arbitrary non-linear functions and can be used for regression or classification problems.</li>
-<li> Input and output values should be normalized such that $["x \\in [0,1)"]$.</li>
+<li> Input and output values should be normalized such that $x \in [0,1)$.</li>
 <li> The weights can be updated in an online manner (after the exposure to each input pattern) or in batch (after a fixed number of patterns have been observed).</li>
 <li> Batch updates are expected to be more stable than online updates for some complex problems.</li>
 <li> A logistic (sigmoid) transfer function is commonly used to transfer the activation to a binary output value, although other transfer functions can be used such as the hyperbolic tangent (tanh), Gaussian, and softmax.</li>
 <li> It is good practice to expose the system to input patterns in a different random order each enumeration through the input set.</li>
-<li> The initial weights are typically small random values $["\\in [0, 0.5]"]$.</li>
+<li> The initial weights are typically small random values $\in [0, 0.5]$.</li>
 <li> Typically a small number of layers are used such as 2-4 given that the increase in layers result in an increase in the complexity of the system and the time required to train the weights.</li>
 <li> The learning rate can be varied during training, and it is common to introduce a momentum term to limit the rate of change.</li>
 <li> The weights of a given network can be initialized with a global optimization method before being refined using the Back-propagation algorithm.</li>
diff --git a/docs/nature-inspired/neural/hopfield_network.html b/docs/nature-inspired/neural/hopfield_network.html
index 26e44d63..261942fc 100644
--- a/docs/nature-inspired/neural/hopfield_network.html
+++ b/docs/nature-inspired/neural/hopfield_network.html
@@ -70,42 +70,42 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $w_{i,j} = \sum_{k=1}^{N} v_k^i\times v_k^j$
 <p>
-where $["w_{i,j}"]$ is the weight between neuron $["i"]$ and $["j"]$, $["N"]$ is the number of input patterns, $["v"]$ is the input pattern and $["v_k^i"]$ is the $["i^{th}"]$ attribute on the $["k^{th}"]$ input pattern.
+where $w_{i,j}$ is the weight between neuron $i$ and $j$, $N$ is the number of input patterns, $v$ is the input pattern and $v_k^i$ is the $i^{th}$ attribute on the $k^{th}$ input pattern.
 </p>
 <p>
 The propagation of the information through the network can be asynchronous where a random node is selected each iteration, or synchronously, where the output is calculated for each node before being applied to the whole network. Propagation of the information continues until no more changes are made or until a maximum number of iterations has completed, after which the output pattern from the network can be read. The activation for a single node is calculated as follows:
 </p>
 $n_i = \sum_{j=1}^n w_{i,j}\times n_j$
 <p>
-where $["n_i"]$ is the activation of the $["i^{th}"]$ neuron, $["w_{i,j}"]$ with the weight between the nodes $["i"]$ and $["j"]$, and $["n_j"]$ is the output of the $["j^{th}"]$ neuron. The activation is transferred into an output using a transfer function, typically a step function as follows:
+where $n_i$ is the activation of the $i^{th}$ neuron, $w_{i,j}$ with the weight between the nodes $i$ and $j$, and $n_j$ is the output of the $j^{th}$ neuron. The activation is transferred into an output using a transfer function, typically a step function as follows:
 </p>
 <p>
-\[["transfer(n_i) = \\left\\{ \\begin{array}{l l} 1 & \\quad if \\geq \\theta \\\\ -1 & \\quad if < \\theta \\\\ \\end{array} \\right. "]\]
+\[transfer(n_i) = \left{ \begin{array}{l l} 1 & \quad if \geq \theta \ -1 & \quad if < \theta \ \end{array} \right. \]
 </p>
 <p>
-where the threshold $["\\theta"]$ is typically fixed at 0.
+where the threshold $\theta$ is typically fixed at 0.
 </p>
 
 
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Hopfield network may be used to solve the recall problem of matching cues for an input pattern to an associated pre-learned pattern.</li>
-<li> The transfer function for turning the activation of a neuron into an output is typically a step function $["f(a) \\in \\{-1,1\\}"]$ (preferred), or more traditionally $["f(a) \\in \\{0,1\\}"]$.</li>
-<li> The input vectors are typically normalized to boolean values $["x \\in [-1,1]"]$.</li>
+<li> The transfer function for turning the activation of a neuron into an output is typically a step function $f(a) \in {-1,1}$ (preferred), or more traditionally $f(a) \in {0,1}$.</li>
+<li> The input vectors are typically normalized to boolean values $x \in [-1,1]$.</li>
 <li> The network can be propagated asynchronously (where a random node is selected and output generated), or synchronously (where the output for all nodes are calculated before being applied).</li>
 <li> Weights can be learned in a one-shot or incremental method based on how much information is known about the patterns to be learned.</li>
 <li> All neurons in the network are typically both input and output neurons, although other network topologies have been investigated (such as the designation of input and output neurons).</li>
-<li> A Hopfield network has limits on the patterns it can store and retrieve accurately from memory, described by $["N<0.15\\times n"]$ where $["N"]$ is the number of patterns that can be stored and retrieved and $["n"]$ is the number of nodes in the network.</li>
+<li> A Hopfield network has limits on the patterns it can store and retrieve accurately from memory, described by $N<0.15\times n$ where $N$ is the number of patterns that can be stored and retrieved and $n$ is the number of nodes in the network.</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Hopfield Network algorithm implemented in the Ruby Programming Language.
-The problem is an instance of a recall problem where patters are described in terms of a $["3 \\times 3"]$ matrix of binary values ($["\\in \\{-1,1\\}"]$). Once the network has learned the patterns, the system is exposed to perturbed versions of the patterns (with errors introduced) and must respond with the correct pattern. Two patterns are used in this example, specifically 'T', and 'U'.
+The problem is an instance of a recall problem where patters are described in terms of a $3 \times 3$ matrix of binary values ($\in {-1,1}$). Once the network has learned the patterns, the system is exposed to perturbed versions of the patterns (with errors introduced) and must respond with the correct pattern. Two patterns are used in this example, specifically 'T', and 'U'.
 </p>
 <p>
-The algorithm is an implementation of the Hopfield Network with a one-shot training method for the network weights, given that all patterns are already known. The information is propagated through the network using an asynchronous method, which is repeated for a fixed number of iterations. The patterns are displayed to the console during the testing of the network, with the outputs converted from $["\\{-1,1\\}"]$ to $["\\{0,1\\}"]$ for readability.
+The algorithm is an implementation of the Hopfield Network with a one-shot training method for the network weights, given that all patterns are already known. The information is propagated through the network using an asynchronous method, which is repeated for a fixed number of iterations. The patterns are displayed to the console during the testing of the network, with the outputs converted from ${-1,1}$ to ${0,1}$ for readability.
 </p>
 <pre class='prettyprint lang-rb'>
 def random_vector(minmax)
diff --git a/docs/nature-inspired/neural/lvq.html b/docs/nature-inspired/neural/lvq.html
index e5c3305a..91ea3592 100644
--- a/docs/nature-inspired/neural/lvq.html
+++ b/docs/nature-inspired/neural/lvq.html
@@ -68,7 +68,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $dist(x,c) = \sum_{i=1}^{n} (x_i - c_i)^2$
 <p>
-where $["n"]$ is the number of attributes, $["x"]$ is the input vector and $["c"]$ is a given codebook vector.
+where $n$ is the number of attributes, $x$ is the input vector and $c$ is a given codebook vector.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -99,7 +99,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Learning Vector Quantization was designed for classification problems that have existing data sets that can be used to supervise the learning by the system. The algorithm does not support regression problems.</li>
 <li> LVQ is non-parametric, meaning that it does not rely on assumptions about that structure of the function that it is approximating.</li>
-<li> Real-values in input vectors should be normalized such that $["x \\in [0,1)"]$.</li>
+<li> Real-values in input vectors should be normalized such that $x \in [0,1)$.</li>
 <li> Euclidean distance is commonly used to measure the distance between real-valued vectors, although other distance measures may be used (such as dot product), and data specific distance measures may be required for non-scalar attributes.</li>
 <li> There should be sufficient training iterations to expose all the training data to the model multiple times.</li>
 <li> The learning rate is typically linearly decayed over the training period from an initial value to close to zero.</li>
@@ -111,7 +111,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Learning Vector Quantization algorithm implemented in the Ruby Programming Language.
-The problem is a contrived classification problem in a 2-dimensional domain $["x\\in[0,1], y\\in[0,1]"]$ with two classes: 'A' ($["x\\in[0,0.4999999], y\\in[0,0.4999999]"]$) and 'B' ($["x\\in[0.5,1], y\\in[0.5,1]"]$).
+The problem is a contrived classification problem in a 2-dimensional domain $x\in[0,1], y\in[0,1]$ with two classes: 'A' ($x\in[0,0.4999999], y\in[0,0.4999999]$) and 'B' ($x\in[0.5,1], y\in[0.5,1]$).
 </p>
 <p>
 The algorithm was implemented using the LVQ1 variant where the best matching codebook vector is located and moved toward the input vector if it is the same class, or away if the classes differ. A linear decay was used for the learning rate that was updated after each pattern was exposed to the model. The implementation can easily be extended to the other variants of the method.
diff --git a/docs/nature-inspired/neural/perceptron.html b/docs/nature-inspired/neural/perceptron.html
index efe9b712..87951dbd 100644
--- a/docs/nature-inspired/neural/perceptron.html
+++ b/docs/nature-inspired/neural/perceptron.html
@@ -65,11 +65,11 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $activation \leftarrow \sum_{k=1}^{n}\big( w_{k} \times x_{ki}\big) + w_{bias} \times 1.0$
 <p>
-where $["n"]$ is the number of weights and inputs, $["x_{ki}"]$ is the $["k^{th}"]$ attribute on the $["i^{th}"]$ input pattern, and $["w_{bias}"]$ is the bias weight. The weights are updated as follows:
+where $n$ is the number of weights and inputs, $x_{ki}$ is the $k^{th}$ attribute on the $i^{th}$ input pattern, and $w_{bias}$ is the bias weight. The weights are updated as follows:
 </p>
 $w_{i}(t+1) = w_{i}(t) + \alpha \times (e(t)-a(t)) \times x_{i}(t)$
 <p>
-where $["w_i"]$ is the $["i^{th}"]$ weight at time $["t"]$ and $["t+1"]$, $["\\alpha"]$ is the learning rate, $["e(t)"]$ and $["a(t)"]$ are the expected and actual output at time $["t"]$, and $["x_i"]$ is the $["i^{th}"]$ input. This update process is applied to each weight in turn (as well as the bias weight with its contact input).
+where $w_i$ is the $i^{th}$ weight at time $t$ and $t+1$, $\alpha$ is the learning rate, $e(t)$ and $a(t)$ are the expected and actual output at time $t$, and $x_i$ is the $i^{th}$ input. This update process is applied to each weight in turn (as well as the bias weight with its contact input).
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -95,14 +95,14 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Perceptron can be used to approximate arbitrary linear functions and can be used for regression or classification problems.</li>
 <li> The Perceptron cannot learn a non-linear mapping between the input and output attributes. The XOR problem is a classical example of a problem that the Perceptron cannot learn.</li>
-<li> Input and output values should be normalized such that $["x \\in [0,1)"]$.</li>
-<li> The learning rate ($["\\alpha \\in [0,1]"]$) controls the amount of change each error has on the system, lower learning rages are common such as 0.1.</li>
+<li> Input and output values should be normalized such that $x \in [0,1)$.</li>
+<li> The learning rate ($\alpha \in [0,1]$) controls the amount of change each error has on the system, lower learning rages are common such as 0.1.</li>
 <li> The weights can be updated in an online manner (after the exposure to each input pattern) or in batch (after a fixed number of patterns have been observed).</li>
 <li> Batch updates are expected to be more stable than online updates for some complex problems.</li>
 <li> A bias weight is used with a constant input signal to provide stability to the learning process.</li>
-<li> A step transfer function is commonly used to transfer the activation to a binary output value $["1 \\leftarrow activation \\geq 0"]$, otherwise $["0"]$.</li>
+<li> A step transfer function is commonly used to transfer the activation to a binary output value $1 \leftarrow activation \geq 0$, otherwise $0$.</li>
 <li> It is good practice to expose the system to input patterns in a different random order each enumeration through the input set.</li>
-<li> The initial weights are typically small random values, typically $["\\in [0, 0.5]"]$.</li>
+<li> The initial weights are typically small random values, typically $\in [0, 0.5]$.</li>
 </ul>
 
 
@@ -112,7 +112,7 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 The problem is the classical OR boolean problem, where the inputs of the boolean truth table are provided as the two inputs and the result of the boolean OR operation is expected as output.
 </p>
 <p>
-The algorithm was implemented using an online learning method, meaning the weights are updated after each input pattern is observed. A step transfer function is used to convert the activation into a binary output $["\\in\\{0,1\\}"]$. Random samples are taken from the domain to train the weights, and similarly, random samples are drawn from the domain to demonstrate what the network has learned. A bias weight is used for stability with a constant input of 1.0.
+The algorithm was implemented using an online learning method, meaning the weights are updated after each input pattern is observed. A step transfer function is used to convert the activation into a binary output $\in{0,1}$. Random samples are taken from the domain to train the weights, and similarly, random samples are drawn from the domain to demonstrate what the network has learned. A bias weight is used for stability with a constant input of 1.0.
 </p>
 <pre class='prettyprint lang-rb'>
 def random_vector(minmax)
diff --git a/docs/nature-inspired/neural/som.html b/docs/nature-inspired/neural/som.html
index 7886300b..1e1a76a2 100644
--- a/docs/nature-inspired/neural/som.html
+++ b/docs/nature-inspired/neural/som.html
@@ -68,14 +68,14 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $dist(x,c) = \sum_{i=1}^{n} (x_i - c_i)^2$
 <p>
-where $["n"]$ is the number of attributes, $["x"]$ is the input vector and $["c"]$ is a given codebook vector.
+where $n$ is the number of attributes, $x$ is the input vector and $c$ is a given codebook vector.
 </p>
 <p>
 The neighbors of the BMU in the topological structure of the network are selected using a neighborhood size that is linearly decreased during the training of the network. The BMU and all selected neighbors are then adjusted toward the input vector using a learning rate that too is decreased linearly with the training cycles:
 </p>
 $c_i(t+1) = learn_{rate}(t) \times (c_i(t) - x_i)$
 <p>
-where $["c_i(t)"]$ is the $["i^{th}"]$ attribute of a codebook vector at time $["t"]$, $["learn_{rate}"]$ is the current learning rate, an $["x_i"]$ is the $["i^{th}"]$ attribute of a input vector.
+where $c_i(t)$ is the $i^{th}$ attribute of a codebook vector at time $t$, $learn_{rate}$ is the current learning rate, an $x_i$ is the $i^{th}$ attribute of a input vector.
 </p>
 <p>
 The neighborhood is typically square (called bubble) where all neighborhood nodes are updated using the same learning rate for the iteration, or Gaussian where the learning rate is proportional to the neighborhood distance using a Gaussian distribution (neighbors further away from the BMU are updated less).
@@ -111,7 +111,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Self-Organizing Map was designed for unsupervised learning problems such as feature extraction, visualization and clustering. Some extensions of the approach can label the prepared codebook vectors which can be used for classification.</li>
 <li> SOM is non-parametric, meaning that it does not rely on assumptions about that structure of the function that it is approximating.</li>
-<li> Real-values in input vectors should be normalized such that $["x \\in [0,1)"]$.</li>
+<li> Real-values in input vectors should be normalized such that $x \in [0,1)$.</li>
 <li> Euclidean distance is commonly used to measure the distance between real-valued vectors, although other distance measures may be used (such as dot product), and data specific distance measures may be required for non-scalar attributes.</li>
 <li> There should be sufficient training iterations to expose all the training data to the model multiple times.</li>
 <li> The more complex the class distribution, the more codebook vectors that will be required, some problems may need thousands.</li>
@@ -126,10 +126,10 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Self-Organizing Map algorithm implemented in the Ruby Programming Language.
-The problem is a feature detection problem, where the network is expected to learn a predefined shape based on being exposed to samples in the domain. The domain is two-dimensional $["x,y \\in [0,1]"]$, where a shape is pre-defined as a square in the middle of the domain $["x,y \\in [0.3,0.6]"]$. The system is initialized to vectors within the domain although is only exposed to samples within the pre-defined shape during training. The expectation is that the system will model the shape based on the observed samples.
+The problem is a feature detection problem, where the network is expected to learn a predefined shape based on being exposed to samples in the domain. The domain is two-dimensional $x,y \in [0,1]$, where a shape is pre-defined as a square in the middle of the domain $x,y \in [0.3,0.6]$. The system is initialized to vectors within the domain although is only exposed to samples within the pre-defined shape during training. The expectation is that the system will model the shape based on the observed samples.
 </p>
 <p>
-The algorithm is an implementation of the basic Self-Organizing Map algorithm based on the description in Chapter 3 of the seminal book on the technique  [<a href='#Kohonen1995'>Kohonen1995</a>]. The implementation is configured with a $["4 \\times 5"]$ grid of nodes, the Euclidean distance measure is used to determine the BMU and neighbors, a Bubble neighborhood function is used. Error rates are presented to the console, and the codebook vectors themselves are described before and after training. The learning process is incremental rather than batch, for simplicity.
+The algorithm is an implementation of the basic Self-Organizing Map algorithm based on the description in Chapter 3 of the seminal book on the technique  [<a href='#Kohonen1995'>Kohonen1995</a>]. The implementation is configured with a $4 \times 5$ grid of nodes, the Euclidean distance measure is used to determine the BMU and neighbors, a Bubble neighborhood function is used. Error rates are presented to the console, and the codebook vectors themselves are described before and after training. The learning process is incremental rather than batch, for simplicity.
 </p>
 <p>
 An extension to this implementation would be to visualize the resulting network structure in the domain - shrinking from a mesh that covers the whole domain, down to a mesh that only covers the pre-defined shape within the domain.
diff --git a/docs/nature-inspired/physical/cultural_algorithm.html b/docs/nature-inspired/physical/cultural_algorithm.html
index de83c351..5eb19f2d 100644
--- a/docs/nature-inspired/physical/cultural_algorithm.html
+++ b/docs/nature-inspired/physical/cultural_algorithm.html
@@ -105,7 +105,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Cultural Algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 </p>
 <p>
 The Cultural Algorithm was implemented based on the description of the Cultural Algorithm Evolutionary Program (CAEP) presented by Reynolds  [<a href='#Reynolds1999'>Reynolds1999</a>].
diff --git a/docs/nature-inspired/physical/extremal_optimization.html b/docs/nature-inspired/physical/extremal_optimization.html
index e6a35492..53670f7c 100644
--- a/docs/nature-inspired/physical/extremal_optimization.html
+++ b/docs/nature-inspired/physical/extremal_optimization.html
@@ -64,7 +64,7 @@ <h2><a name='strategy'>Strategy</a></h2>
 
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
-Algorithm (below) provides a pseudocode listing of the Extremal Optimization algorithm for minimizing a cost function. The deterministic selection of the worst component in the <code>SelectWeakComponent</code> function and replacement in the <code>SelectReplacementComponent</code> function is classical EO. If these decisions are probabilistic making use of $["\\tau"]$ parameter, this is referred to as $["\\tau"]$-Extremal Optimization.
+Algorithm (below) provides a pseudocode listing of the Extremal Optimization algorithm for minimizing a cost function. The deterministic selection of the worst component in the <code>SelectWeakComponent</code> function and replacement in the <code>SelectReplacementComponent</code> function is classical EO. If these decisions are probabilistic making use of $\tau$ parameter, this is referred to as $\tau$-Extremal Optimization.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -96,9 +96,9 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Extremal Optimization was designed for combinatorial optimization problems, although variations have been applied to continuous function optimization.</li>
-<li> The selection of the worst component and the replacement component each iteration can be deterministic or probabilistic, the latter of which is referred to as $["\\tau"]$-Extremal Optimization given the use of a $["\\tau"]$ parameter.</li>
+<li> The selection of the worst component and the replacement component each iteration can be deterministic or probabilistic, the latter of which is referred to as $\tau$-Extremal Optimization given the use of a $\tau$ parameter.</li>
 <li> The selection of an appropriate scoring function of the components of a solution is the most difficult part in the application of the technique.</li>
-<li> For $["\\tau"]$-Extremal Optimization, low $["\\tau"]$ values are used (such as $["\\tau \\in [1.2,1.6]"]$) have been found to be effective for the TSP.</li>
+<li> For $\tau$-Extremal Optimization, low $\tau$ values are used (such as $\tau \in [1.2,1.6]$) have been found to be effective for the TSP.</li>
 </ul>
 
 
@@ -108,7 +108,7 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 The algorithm is applied to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken from the TSPLIB. The problem seeks a permutation of the order to visit cities (called a tour) that minimizes the total distance traveled. The optimal tour distance for Berlin52 instance is 7542 units.
 </p>
 <p>
-The algorithm implementation is based on the seminal work by Boettcher and Percus  [<a href='#Boettcher1999'>Boettcher1999</a>]. A solution is comprised of a permutation of city components. Each city can potentially form a connection to any other city, and the connections to other cities ordered by distance may be considered its neighborhood. For a given candidate solution, the city components of a solution are scored based on the neighborhood rank of the cities to which they are connected: $["fitness_k \\leftarrow \\frac{3}{r_i + r_j}"]$, where $["r_i"]$ and $["r_j"]$ are the neighborhood ranks of cities $["i"]$ and $["j"]$ against city $["k"]$. A city is selected for modification probabilistically where the probability of selecting a given city is proportional to $["n_i^{-\\tau}"]$, where $["n"]$ is the rank of city $["i"]$. The longest connection is broken, and the city is connected with another neighboring city that is also probabilistically selected.
+The algorithm implementation is based on the seminal work by Boettcher and Percus  [<a href='#Boettcher1999'>Boettcher1999</a>]. A solution is comprised of a permutation of city components. Each city can potentially form a connection to any other city, and the connections to other cities ordered by distance may be considered its neighborhood. For a given candidate solution, the city components of a solution are scored based on the neighborhood rank of the cities to which they are connected: $fitness_k \leftarrow \frac{3}{r_i + r_j}$, where $r_i$ and $r_j$ are the neighborhood ranks of cities $i$ and $j$ against city $k$. A city is selected for modification probabilistically where the probability of selecting a given city is proportional to $n_i^{-\tau}$, where $n$ is the rank of city $i$. The longest connection is broken, and the city is connected with another neighboring city that is also probabilistically selected.
 </p>
 <pre class='prettyprint lang-rb'>
 def euc_2d(c1, c2)
diff --git a/docs/nature-inspired/physical/harmony_search.html b/docs/nature-inspired/physical/harmony_search.html
index 23cb36e6..332ad5b7 100644
--- a/docs/nature-inspired/physical/harmony_search.html
+++ b/docs/nature-inspired/physical/harmony_search.html
@@ -67,7 +67,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $x' \leftarrow x + range \times \epsilon$
 <p>
-where $["range"]$ is a the user parameter (pitch bandwidth) to control the size of the changes, and $["\\epsilon"]$ is a uniformly random number $["\\in [-1,1]"]$.
+where $range$ is a the user parameter (pitch bandwidth) to control the size of the changes, and $\epsilon$ is a uniformly random number $\in [-1,1]$.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -106,8 +106,8 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> Harmony Search was designed as a generalized optimization method for continuous, discrete, and constrained optimization and has been applied to numerous types of optimization problems.</li>
-<li> The harmony memory considering rate (HMCR) $["\\in [0,1]"]$ controls the use of information from the harmony memory or the generation of a random pitch. As such, it controls the rate of convergence of the algorithm and is typically configured $["\\in [0.7,0.95]"]$.</li>
-<li> The pitch adjustment rate (PAR) $["\\in [0,1]"]$ controls the frequency of adjustment of pitches selected from harmony memory, typically configured $["\\in [0.1,0.5]"]$. High values can result in the premature convergence of the search.</li>
+<li> The harmony memory considering rate (HMCR) $\in [0,1]$ controls the use of information from the harmony memory or the generation of a random pitch. As such, it controls the rate of convergence of the algorithm and is typically configured $\in [0.7,0.95]$.</li>
+<li> The pitch adjustment rate (PAR) $\in [0,1]$ controls the frequency of adjustment of pitches selected from harmony memory, typically configured $\in [0.1,0.5]$. High values can result in the premature convergence of the search.</li>
 <li> The pitch adjustment rate and the adjustment method (amount of adjustment or fret width) are typically fixed, having a linear effect through time. Non-linear methods have been considered, for example refer to Geem  [<a href='#Geem2010a'>Geem2010a</a>].</li>
 <li> When creating a new harmony, aggregations of pitches can be taken from across musicians in the harmony memory.</li>
 <li> The harmony memory update is typically a greedy process, although other considerations such as diversity may be used where the most similar harmony is replaced.</li>
@@ -117,7 +117,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Harmony Search algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm implementation and parameterization are based on the description by Yang  [<a href='#Yang2009'>Yang2009</a>], with refinement from Geem  [<a href='#Geem2010a'>Geem2010a</a>].
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/physical/memetic_algorithm.html b/docs/nature-inspired/physical/memetic_algorithm.html
index ebc4d85a..7fe779dd 100644
--- a/docs/nature-inspired/physical/memetic_algorithm.html
+++ b/docs/nature-inspired/physical/memetic_algorithm.html
@@ -102,7 +102,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Memetic Algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The Memetic Algorithm uses a canonical Genetic Algorithm as the global search technique that operates on binary strings, uses tournament selection, point mutations, uniform crossover and a binary coded decimal decoding of bits to real values. A bit climber local search is used that performs probabilistic bit flips (point mutations) and only accepts solutions with the same or improving fitness.
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/physical/simulated_annealing.html b/docs/nature-inspired/physical/simulated_annealing.html
index 4f94ca7e..fb271631 100644
--- a/docs/nature-inspired/physical/simulated_annealing.html
+++ b/docs/nature-inspired/physical/simulated_annealing.html
@@ -98,7 +98,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <li> The convergence proof suggests that with a long enough cooling period, the system will always converge to the global optimum. The downside of this theoretical finding is that the number of samples taken for optimum convergence to occur on some problems may be more than a complete enumeration of the search space.</li>
 <li> Performance improvements can be given with the selection of a candidate move generation scheme (neighborhood) that is less likely to generate candidates of significantly higher cost.</li>
 <li> Restarting the cooling schedule using the best found solution so far can lead to an improved outcome on some problems.</li>
-<li> A common acceptance method is to always accept improving solutions and accept worse solutions with a probability of $["P(accept) \\leftarrow \\exp(\\frac{e-e'}{T})"]$, where $["T"]$ is the current temperature, $["e"]$ is the energy (or cost) of the current solution and $["e'"]$ is the energy of a candidate solution being considered.</li>
+<li> A common acceptance method is to always accept improving solutions and accept worse solutions with a probability of $P(accept) \leftarrow \exp(\frac{e-e'}{T})$, where $T$ is the current temperature, $e$ is the energy (or cost) of the current solution and $e'$ is the energy of a candidate solution being considered.</li>
 <li> The size of the neighborhood considered in generating candidate solutions may also change over time or be influenced by the temperature, starting initially broad and narrowing with the execution of the algorithm.</li>
 <li> A problem specific heuristic method can be used to provide the starting point for the search.</li>
 </ul>
@@ -110,7 +110,7 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 The algorithm is applied to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken from the TSPLIB. The problem seeks a permutation of the order to visit cities (called a tour) that minimizes the total distance traveled. The optimal tour distance for Berlin52 instance is 7542 units.
 </p>
 <p>
-The algorithm implementation uses a two-opt procedure for the neighborhood function and the classical $["P(accept) \\leftarrow \\exp(\\frac{e-e'}{T})"]$ as the acceptance function. A simple linear cooling regime is used with a large initial temperature which is decreased each iteration.
+The algorithm implementation uses a two-opt procedure for the neighborhood function and the classical $P(accept) \leftarrow \exp(\frac{e-e'}{T})$ as the acceptance function. A simple linear cooling regime is used with a large initial temperature which is decreased each iteration.
 </p>
 <pre class='prettyprint lang-rb'>
 def euc_2d(c1, c2)
diff --git a/docs/nature-inspired/probabilistic/compact_genetic_algorithm.html b/docs/nature-inspired/probabilistic/compact_genetic_algorithm.html
index a4559fd7..83afba81 100644
--- a/docs/nature-inspired/probabilistic/compact_genetic_algorithm.html
+++ b/docs/nature-inspired/probabilistic/compact_genetic_algorithm.html
@@ -59,7 +59,7 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 The Compact Genetic Algorithm maintains a real-valued prototype vector that represents the probability of each component being expressed in a candidate solution.
-Algorithm (below) provides a pseudocode listing of the Compact Genetic Algorithm for maximizing a cost function. The parameter $["n"]$ indicates the amount to update probabilities for conflicting bits in each algorithm iteration.
+Algorithm (below) provides a pseudocode listing of the Compact Genetic Algorithm for maximizing a cost function. The parameter $n$ indicates the amount to update probabilities for conflicting bits in each algorithm iteration.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -95,10 +95,10 @@ <h2><a name='procedure'>Procedure</a></h2>
 
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
-<li> The vector update parameter ($["n"]$) influences the amount that the probabilities are updated each algorithm iteration.</li>
-<li> The vector update parameter ($["n"]$) may be considered to be comparable to the population size parameter in the Genetic Algorithm.</li>
+<li> The vector update parameter ($n$) influences the amount that the probabilities are updated each algorithm iteration.</li>
+<li> The vector update parameter ($n$) may be considered to be comparable to the population size parameter in the Genetic Algorithm.</li>
 <li> Early results demonstrate that the cGA may be comparable to a standard Genetic Algorithm on classical binary string optimization problems (such as OneMax).</li>
-<li> The algorithm may be considered to have converged if the vector probabilities are all either $["0"]$ or $["1"]$.</li>
+<li> The algorithm may be considered to have converged if the vector probabilities are all either $0$ or $1$.</li>
 </ul>
 
 
diff --git a/docs/nature-inspired/probabilistic/cross_entropy.html b/docs/nature-inspired/probabilistic/cross_entropy.html
index dd610d9b..46ccdc20 100644
--- a/docs/nature-inspired/probabilistic/cross_entropy.html
+++ b/docs/nature-inspired/probabilistic/cross_entropy.html
@@ -95,15 +95,15 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Cross-Entropy Method was adapted for combinatorial optimization problems, although has been applied to continuous function optimization as well as noisy simulation problems.</li>
-<li> A alpha ($["\\alpha"]$) parameter or learning rate $["\\in [0.1]"]$ is typically set high, such as 0.7.</li>
-<li> A smoothing function can be used to further control the updates the summaries of the distribution(s) of samples from the problem space. For example, in continuous function optimization a $["\\beta"]$ parameter may replace $["\\alpha"]$ for updating the standard deviation, calculated at time $["t"]$ as $["\\beta_{t} = \\beta - \\beta \\times (1-\\frac{1}{t})^q"]$, where $["\\beta"]$ is initially set high $["\\in [0.8, 0.99]"]$ and $["q"]$ is a small integer $["\\in [5, 10]"]$.</li>
+<li> A alpha ($\alpha$) parameter or learning rate $\in [0.1]$ is typically set high, such as 0.7.</li>
+<li> A smoothing function can be used to further control the updates the summaries of the distribution(s) of samples from the problem space. For example, in continuous function optimization a $\beta$ parameter may replace $\alpha$ for updating the standard deviation, calculated at time $t$ as $\beta_{t} = \beta - \beta \times (1-\frac{1}{t})^q$, where $\beta$ is initially set high $\in [0.8, 0.99]$ and $q$ is a small integer $\in [5, 10]$.</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Cross-Entropy Method algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization problem that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization problem that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 </p>
 <p>
 The algorithm was implemented based on a description of the Cross-Entropy Method algorithm for continuous function optimization by Rubinstein and Kroese in Chapter 5 and Appendix A of their book on the method  [<a href='#Rubinstein2004'>Rubinstein2004</a>]. The algorithm maintains means and standard deviations of the distribution of samples for convenience. The means and standard deviations are initialized based on random positions in the problem space and the bounds of the whole problem space respectively. A smoothing parameter is not used on the standard deviations.
diff --git a/docs/nature-inspired/sitemap.xml b/docs/nature-inspired/sitemap.xml
index 5037e390..2b15e777 100644
--- a/docs/nature-inspired/sitemap.xml
+++ b/docs/nature-inspired/sitemap.xml
@@ -1,204 +1,204 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:codesearch="http://www.google.com/codesearch/schemas/sitemap/1.0">
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/racing_algorithms.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/racing_algorithms.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/paradigms.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/paradigms.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/problem_solving.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/problem_solving.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/new_algorithms.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/new_algorithms.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/testing_algorithms.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/testing_algorithms.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced/visualizing_algorithms.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced/visualizing_algorithms.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm/bees_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm/bees_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm/ant_system.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm/ant_system.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm/bfoa.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm/bfoa.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm/pso.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm/pso.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm/ant_colony_system.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm/ant_colony_system.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/swarm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/swarm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/copyright.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/copyright.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/index.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/index.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic/compact_genetic_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic/compact_genetic_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic/umda.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic/umda.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic/cross_entropy.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic/cross_entropy.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic/boa.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic/boa.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic/pbil.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic/pbil.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical/simulated_annealing.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical/simulated_annealing.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical/memetic_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical/memetic_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical/cultural_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical/cultural_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical/harmony_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical/harmony_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/physical/extremal_optimization.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/physical/extremal_optimization.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/reactive_tabu_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/reactive_tabu_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/grasp.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/grasp.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/guided_local_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/guided_local_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/scatter_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/scatter_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/random_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/random_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/tabu_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/tabu_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/hill_climbing_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/hill_climbing_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/adaptive_random_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/adaptive_random_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/iterated_local_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/iterated_local_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic/variable_neighborhood_search.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic/variable_neighborhood_search.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune/clonal_selection_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune/clonal_selection_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune/negative_selection_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune/negative_selection_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune/dca.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune/dca.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune/immune_network_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune/immune_network_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/immune/airs.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/immune/airs.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/acknowledgments.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/acknowledgments.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/advanced.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/advanced.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/grammatical_evolution.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/grammatical_evolution.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/learning_classifier_system.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/learning_classifier_system.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/spea.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/spea.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/differential_evolution.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/differential_evolution.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/evolutionary_programming.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/evolutionary_programming.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/genetic_algorithm.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/genetic_algorithm.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/nsga.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/nsga.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/gene_expression_programming.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/gene_expression_programming.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/evolution_strategies.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/evolution_strategies.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/evolution/genetic_programming.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/evolution/genetic_programming.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/errata.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/errata.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/probabilistic.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/probabilistic.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/preface.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/preface.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/foreword.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/foreword.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural/som.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural/som.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural/lvq.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural/lvq.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural/perceptron.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural/perceptron.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural/hopfield_network.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural/hopfield_network.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/neural/backpropagation.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/neural/backpropagation.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/introduction.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/introduction.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/appendix1.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/appendix1.html</loc>
 	</url>
 	<url>
-		<loc>http://www.cleveralgorithms.com/nature-inspired/stochastic.html</loc>
+		<loc>http://cleveralgorithms.com/nature-inspired/stochastic.html</loc>
 	</url>
 </urlset>
diff --git a/docs/nature-inspired/stochastic/adaptive_random_search.html b/docs/nature-inspired/stochastic/adaptive_random_search.html
index b0b86fcf..3a01ac86 100644
--- a/docs/nature-inspired/stochastic/adaptive_random_search.html
+++ b/docs/nature-inspired/stochastic/adaptive_random_search.html
@@ -109,7 +109,7 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Adaptive Random Search Algorithm implemented in the Ruby Programming Language, based on the specification for 'Adaptive Step-Size Random Search' by Schummer and Steiglitz  [<a href='#Schumer1968'>Schumer1968</a>].
 In the example, the algorithm runs for a fixed number of iterations and returns the best candidate solution discovered.
-The example problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0 < x_i < 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The example problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0 < x_i < 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 </p>
 <pre class='prettyprint lang-rb'>
 def objective_function(vector)
diff --git a/docs/nature-inspired/stochastic/grasp.html b/docs/nature-inspired/stochastic/grasp.html
index 9f3c1835..b16508be 100644
--- a/docs/nature-inspired/stochastic/grasp.html
+++ b/docs/nature-inspired/stochastic/grasp.html
@@ -73,7 +73,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 <div class='caption'>Pseudocode for the GRASP.</div>
 
 <p>
-Algorithm (below) provides the pseudocode the Greedy Randomized Construction function. The function involves the step-wise construction of a candidate solution using a stochastically greedy construction process. The function works by building a Restricted Candidate List (RCL) that constraints the components of a solution (features) that may be selected from each cycle. The RCL may be constrained by an explicit size, or by using a threshold ($["\\alpha \\in [0,1]"]$) on the cost of adding each feature to the current candidate solution.
+Algorithm (below) provides the pseudocode the Greedy Randomized Construction function. The function involves the step-wise construction of a candidate solution using a stochastically greedy construction process. The function works by building a Restricted Candidate List (RCL) that constraints the components of a solution (features) that may be selected from each cycle. The RCL may be constrained by an explicit size, or by using a threshold ($\alpha \in [0,1]$) on the cost of adding each feature to the current candidate solution.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -106,8 +106,8 @@ <h2><a name='procedure'>Procedure</a></h2>
 
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
-<li> The $["\\alpha"]$ threshold defines the amount of greediness of the construction mechanism, where values close to 0 may be too greedy, and values close to 1 may be too generalized.</li>
-<li> As an alternative to using the $["\\alpha"]$ threshold, the RCL can be constrained to the top $["n\\%"]$ of candidate features that may be selected from each construction cycle.</li>
+<li> The $\alpha$ threshold defines the amount of greediness of the construction mechanism, where values close to 0 may be too greedy, and values close to 1 may be too generalized.</li>
+<li> As an alternative to using the $\alpha$ threshold, the RCL can be constrained to the top $n%$ of candidate features that may be selected from each construction cycle.</li>
 <li> The technique was designed for discrete problem classes such as combinatorial optimization problems.</li>
 </ul>
 
@@ -224,8 +224,8 @@ <h3><a name='primary_sources'>Primary Sources</a></h3>
 <h3><a name='learn_more'>Learn More</a></h3>
 <p>
 There are a vast number of review, application, and extension papers for GRASP.
-Pitsoulis and Resende provide an extensive contemporary overview of the field as a review chapter  [<a href='#Pitsoulis2002'>Pitsoulis2002</a>], as does Resende and Ribeiro that includes a clear presentation of the use of the $["\\alpha"]$ threshold parameter instead of a fixed size for the RCL  [<a href='#Resende2003'>Resende2003</a>]. Festa and Resende provide an annotated bibliography as a review chapter that provides some needed insight into large amount of study that has gone into the approach  [<a href='#Festa2002'>Festa2002</a>].
-There are numerous extensions to GRASP, not limited to the popular Reactive GRASP for adapting $["\\alpha"]$  [<a href='#Prais2000'>Prais2000</a>], the use of long term memory to allow the technique to learn from candidate solutions discovered in previous iterations, and parallel implementations of the procedure such as 'Parallel GRASP'  [<a href='#Pardalos1995'>Pardalos1995</a>].
+Pitsoulis and Resende provide an extensive contemporary overview of the field as a review chapter  [<a href='#Pitsoulis2002'>Pitsoulis2002</a>], as does Resende and Ribeiro that includes a clear presentation of the use of the $\alpha$ threshold parameter instead of a fixed size for the RCL  [<a href='#Resende2003'>Resende2003</a>]. Festa and Resende provide an annotated bibliography as a review chapter that provides some needed insight into large amount of study that has gone into the approach  [<a href='#Festa2002'>Festa2002</a>].
+There are numerous extensions to GRASP, not limited to the popular Reactive GRASP for adapting $\alpha$  [<a href='#Prais2000'>Prais2000</a>], the use of long term memory to allow the technique to learn from candidate solutions discovered in previous iterations, and parallel implementations of the procedure such as 'Parallel GRASP'  [<a href='#Pardalos1995'>Pardalos1995</a>].
 </p>
 
 
diff --git a/docs/nature-inspired/stochastic/guided_local_search.html b/docs/nature-inspired/stochastic/guided_local_search.html
index 0db0dbd9..a89537bb 100644
--- a/docs/nature-inspired/stochastic/guided_local_search.html
+++ b/docs/nature-inspired/stochastic/guided_local_search.html
@@ -52,12 +52,12 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 Algorithm (below) provides a pseudocode listing of the Guided Local Search algorithm for minimization.
-The Local Search algorithm used by the Guided Local Search algorithm uses an augmented cost function in the form  $["h(s)=g(s)+\\lambda\\cdot\\sum_{i=1}^{M}f_i"]$, where
-$["h(s)"]$ is the augmented cost function, $["g(s)"]$ is the problem cost function,$["\\lambda"]$ is the 'regularization parameter' (a coefficient for scaling the penalties), $["s"]$ is a locally optimal solution of $["M"]$ features, and $["f_i"]$ is the $["i"]$'th feature in locally optimal solution. The augmented cost function is only used by the local search procedure, the Guided Local Search algorithm uses the problem specific cost function without augmentation.
+The Local Search algorithm used by the Guided Local Search algorithm uses an augmented cost function in the form  $h(s)=g(s)+\lambda\cdot\sum_{i=1}^{M}f_i$, where
+$h(s)$ is the augmented cost function, $g(s)$ is the problem cost function,$\lambda$ is the 'regularization parameter' (a coefficient for scaling the penalties), $s$ is a locally optimal solution of $M$ features, and $f_i$ is the $i$'th feature in locally optimal solution. The augmented cost function is only used by the local search procedure, the Guided Local Search algorithm uses the problem specific cost function without augmentation.
 </p>
 <p>
 Penalties are only updated for those features in a locally optimal solution that maximize utility, updated by adding 1 to the penalty for the future (a counter).
-The utility for a feature is calculated as $["U_{feature}=\\frac{C_{feature}}{1+P_{feature}}"]$, where $["U_{feature}"]$ is the utility for penalizing a feature (maximizing), $["C_{feature}"]$ is the cost of the feature, and $["P_{feature}"]$ is the current penalty for the feature.
+The utility for a feature is calculated as $U_{feature}=\frac{C_{feature}}{1+P_{feature}}$, where $U_{feature}$ is the utility for penalizing a feature (maximizing), $C_{feature}$ is the cost of the feature, and $P_{feature}$ is the current penalty for the feature.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -87,7 +87,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <li> The Guided Local Search procedure is independent of the Local Search procedure embedded within it. A suitable domain-specific search procedure should be identified and employed.</li>
 <li> The Guided Local Search procedure may need to be executed for thousands to hundreds-of-thousands of iterations, each iteration of which assumes a run of a Local Search algorithm to convergence.</li>
 <li> The algorithm was designed for discrete optimization problems where a solution is comprised of independently assessable 'features' such as Combinatorial Optimization, although it has been applied to continuous function optimization modeled as binary strings.</li>
-<li> The $["\\lambda"]$ parameter is a scaling factor for feature penalization that must be in the same proportion to the candidate solution costs from the specific problem instance to which the algorithm is being applied. As such, the value for $["\\lambda"]$ must be meaningful when used within the augmented cost function (such as when it is added to a candidate solution cost in minimization and subtracted from a cost in the case of a maximization problem).</li>
+<li> The $\lambda$ parameter is a scaling factor for feature penalization that must be in the same proportion to the candidate solution costs from the specific problem instance to which the algorithm is being applied. As such, the value for $\lambda$ must be meaningful when used within the augmented cost function (such as when it is added to a candidate solution cost in minimization and subtracted from a cost in the case of a maximization problem).</li>
 </ul>
 
 
@@ -101,8 +101,8 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 A TSP-specific local search algorithm is used called 2-opt that selects two points in a permutation and reconnects the tour, potentially untwisting the tour at the selected points. The stopping condition for 2-opt was configured to be a fixed number of non-improving moves.
 </p>
 <p>
-The equation for setting $["\\lambda"]$ for TSP instances is $["\\lambda = \\alpha\\cdot\\frac{cost(optima)}{N}"]$, where $["N"]$ is the number of cities, $["cost(optima)"]$ is the cost of a local optimum found by a local search, and $["\\alpha\\in (0,1]"]$ (around 0.3 for TSP and 2-opt). The cost of a local optima was fixed to the approximated value of 15000 for the Berlin52 instance.
-The utility function for features (edges) in the TSP is $["U_{edge}=\\frac{D_{edge}}{1+P_{edge}}"]$, where $["U_{edge}"]$ is the utility for penalizing an edge (maximizing), $["D_{edge}"]$ is the cost of the edge (distance between cities) and $["P_{edge}"]$ is the current penalty for the edge.
+The equation for setting $\lambda$ for TSP instances is $\lambda = \alpha\cdot\frac{cost(optima)}{N}$, where $N$ is the number of cities, $cost(optima)$ is the cost of a local optimum found by a local search, and $\alpha\in (0,1]$ (around 0.3 for TSP and 2-opt). The cost of a local optima was fixed to the approximated value of 15000 for the Berlin52 instance.
+The utility function for features (edges) in the TSP is $U_{edge}=\frac{D_{edge}}{1+P_{edge}}$, where $U_{edge}$ is the utility for penalizing an edge (maximizing), $D_{edge}$ is the cost of the edge (distance between cities) and $P_{edge}$ is the current penalty for the edge.
 </p>
 <pre class='prettyprint lang-rb'>
 def euc_2d(c1, c2)
diff --git a/docs/nature-inspired/stochastic/random_search.html b/docs/nature-inspired/stochastic/random_search.html
index bf25910e..b0da8ef1 100644
--- a/docs/nature-inspired/stochastic/random_search.html
+++ b/docs/nature-inspired/stochastic/random_search.html
@@ -87,7 +87,7 @@ <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Random Search Algorithm implemented in the Ruby Programming Language.
 In the example, the algorithm runs for a fixed number of iterations and returns the best candidate solution discovered.
-The example problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The example problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 </p>
 <pre class='prettyprint lang-rb'>
 def objective_function(vector)
diff --git a/docs/nature-inspired/stochastic/scatter_search.html b/docs/nature-inspired/stochastic/scatter_search.html
index b8326dbc..3efeb1e8 100644
--- a/docs/nature-inspired/stochastic/scatter_search.html
+++ b/docs/nature-inspired/stochastic/scatter_search.html
@@ -99,7 +99,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Scatter Search algorithm implemented in the Ruby Programming Language.
-The example problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_1,\\ldots,v_{n})=0.0"]$.
+The example problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_1,\ldots,v_{n})=0.0$.
 </p>
 <p>
 The algorithm is an implementation of Scatter Search as described in an application of the technique to unconstrained non-linear optimization by Glover  [<a href='#Glover2003b'>Glover2003b</a>]. The seeds for initial solutions are generated as random vectors, as opposed to stratified samples. The example was further simplified by not including a restart strategy, and the exclusion of diversity maintenance in the <code>ReferenceSet</code>. A stochastic local search algorithm is used as the embedded heuristic that uses a stochastic step size in the range of half a percent of the search space.
diff --git a/docs/nature-inspired/swarm/ant_colony_system.html b/docs/nature-inspired/swarm/ant_colony_system.html
index 959c80a6..dcdfe26a 100644
--- a/docs/nature-inspired/swarm/ant_colony_system.html
+++ b/docs/nature-inspired/swarm/ant_colony_system.html
@@ -64,25 +64,25 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 Algorithm (below) provides a pseudocode listing of the main Ant Colony System algorithm for minimizing a cost function.
-The probabilistic step-wise construction of solution makes use of both history (pheromone) and problem-specific heuristic information to incrementally construct a solution piece-by-piece. Each component can only be selected if it has not already been chosen (for most combinatorial problems), and for those components that can  be selected from given the current component $["i"]$, their probability for selection is defined as:
+The probabilistic step-wise construction of solution makes use of both history (pheromone) and problem-specific heuristic information to incrementally construct a solution piece-by-piece. Each component can only be selected if it has not already been chosen (for most combinatorial problems), and for those components that can  be selected from given the current component $i$, their probability for selection is defined as:
 </p>
 $P_{i,j} \leftarrow \frac{\tau_{i,j}^{\alpha} \times \eta_{i,j}^{\beta}}{\sum_{k=1}^c \tau_{i,k}^{\alpha} \times \eta_{i,k}^{\beta}}$
 <p>
-where $["\\eta_{i,j}"]$ is the maximizing contribution to the overall score of selecting the component (such as $["\\frac{1.0}{distance_{i,j}}"]$ for the Traveling Salesman Problem), $["\\beta"]$ is the heuristic coefficient (commonly fixed at 1.0), $["\\tau_{i,j}"]$ is the pheromone value for the component, $["\\alpha"]$ is the history coefficient, and $["c"]$ is the set of usable components. A greediness factor ($["q0"]$) is used to influence when to use the above probabilistic component selection and when to greedily select the best possible component.
+where $\eta_{i,j}$ is the maximizing contribution to the overall score of selecting the component (such as $\frac{1.0}{distance_{i,j}}$ for the Traveling Salesman Problem), $\beta$ is the heuristic coefficient (commonly fixed at 1.0), $\tau_{i,j}$ is the pheromone value for the component, $\alpha$ is the history coefficient, and $c$ is the set of usable components. A greediness factor ($q0$) is used to influence when to use the above probabilistic component selection and when to greedily select the best possible component.
 </p>
 <p>
 A local pheromone update is performed for each solution that is constructed to dissuade following solutions to use the same components in the same order, as follows:
 </p>
 $\tau_{i,j} \leftarrow (1-\sigma) \times \tau_{i,j} + \sigma \times \tau_{i,j}^{0}$
 <p>
-where $["\\tau_{i,j}"]$ represents the pheromone for the component (graph edge) ($["i,j"]$), $["\\sigma"]$ is the local pheromone factor, and $["\\tau_{i,j}^{0}"]$ is the initial pheromone value.
+where $\tau_{i,j}$ represents the pheromone for the component (graph edge) ($i,j$), $\sigma$ is the local pheromone factor, and $\tau_{i,j}^{0}$ is the initial pheromone value.
 </p>
 <p>
 At the end of each iteration, the pheromone is updated and decayed using the best candidate solution found thus far (or the best candidate solution found for the iteration), as follows:
 </p>
 $\tau_{i,j} \leftarrow (1-\rho) \times \tau_{i,j} + \rho \times \Delta\tau{i,j}$
 <p>
-where $["\\tau_{i,j}"]$ represents the pheromone for the component (graph edge) ($["i,j"]$), $["\\rho"]$ is the decay factor, and $["\\Delta\\tau{i,j}"]$ is the maximizing solution cost for the best solution found so far if the component $["ij"]$ is used in the globally best known solution, otherwise it is 0.
+where $\tau_{i,j}$ represents the pheromone for the component (graph edge) ($i,j$), $\rho$ is the decay factor, and $\Delta\tau{i,j}$ is the maximizing solution cost for the best solution found so far if the component $ij$ is used in the globally best known solution, otherwise it is 0.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -116,11 +116,11 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Ant Colony System algorithm was designed for use with combinatorial problems such as the TSP, knapsack problem, quadratic assignment problems, graph coloring problems and many others.</li>
-<li> The local pheromone (history) coefficient ($["\\sigma"]$) controls the amount of contribution history plays in a components probability of selection and is commonly set to 0.1.</li>
-<li> The heuristic coefficient ($["\\beta"]$) controls the amount of contribution problem-specific heuristic information plays in a components probability of selection and is commonly between 2 and 5, such as 2.5.</li>
-<li> The decay factor ($["\\rho"]$) controls the rate at which historic information is lost and is commonly set to 0.1.</li>
-<li> The greediness factor ($["q0"]$) is commonly set to 0.9.</li>
-<li> The total number of ants ($["m"]$) is commonly set low, such as 10.</li>
+<li> The local pheromone (history) coefficient ($\sigma$) controls the amount of contribution history plays in a components probability of selection and is commonly set to 0.1.</li>
+<li> The heuristic coefficient ($\beta$) controls the amount of contribution problem-specific heuristic information plays in a components probability of selection and is commonly between 2 and 5, such as 2.5.</li>
+<li> The decay factor ($\rho$) controls the rate at which historic information is lost and is commonly set to 0.1.</li>
+<li> The greediness factor ($q0$) is commonly set to 0.9.</li>
+<li> The total number of ants ($m$) is commonly set low, such as 10.</li>
 </ul>
 
 
diff --git a/docs/nature-inspired/swarm/ant_system.html b/docs/nature-inspired/swarm/ant_system.html
index a6497267..8157dda8 100644
--- a/docs/nature-inspired/swarm/ant_system.html
+++ b/docs/nature-inspired/swarm/ant_system.html
@@ -68,14 +68,14 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $\tau_{i,j} \leftarrow (1-\rho) \times \tau_{i,j} + \sum_{k=1}^m \Delta_{i,j}^k$
 <p>
-where $["\\tau_{i,j}"]$ represents the pheromone for the component (graph edge) ($["i,j"]$), $["\\rho"]$ is the decay factor, $["m"]$ is the number of ants, and  $["\\sum_{k=1}^m \\Delta_{i,j}^k "]$ is the sum of $["\\frac{1}{S_{cost}} "]$ (maximizing solution cost) for those solutions that include component $["i,j"]$. The Pseudocode listing shows this equation as an equivalent as a two step process of decay followed by update for simplicity.
+where $\tau_{i,j}$ represents the pheromone for the component (graph edge) ($i,j$), $\rho$ is the decay factor, $m$ is the number of ants, and  $\sum_{k=1}^m \Delta_{i,j}^k $ is the sum of $\frac{1}{S_{cost}} $ (maximizing solution cost) for those solutions that include component $i,j$. The Pseudocode listing shows this equation as an equivalent as a two step process of decay followed by update for simplicity.
 </p>
 <p>
-The probabilistic step-wise construction of solution makes use of both history (pheromone) and problem-specific heuristic information to incrementally construction a solution piece-by-piece. Each component can only be selected if it has not already been chosen (for most combinatorial problems), and for those components that can  be selected from (given the current component $["i"]$), their probability for selection is defined as:
+The probabilistic step-wise construction of solution makes use of both history (pheromone) and problem-specific heuristic information to incrementally construction a solution piece-by-piece. Each component can only be selected if it has not already been chosen (for most combinatorial problems), and for those components that can  be selected from (given the current component $i$), their probability for selection is defined as:
 </p>
 $P_{i,j} \leftarrow \frac{\tau_{i,j}^{\alpha} \times \eta_{i,j}^{\beta}}{\sum_{k=1}^c \tau_{i,k}^{\alpha} \times \eta_{i,k}^{\beta}}$
 <p>
-where $["\\eta_{i,j}"]$ is the maximizing contribution to the overall score of selecting the component (such as $["\\frac{1.0}{distance_{i,j}}"]$ for the Traveling Salesman Problem), $["\\alpha"]$ is the heuristic coefficient, $["\\tau_{i,j}"]$ is the pheromone value for the component, $["\\beta"]$ is the history coefficient, and $["c"]$ is the set of usable components.
+where $\eta_{i,j}$ is the maximizing contribution to the overall score of selecting the component (such as $\frac{1.0}{distance_{i,j}}$ for the Traveling Salesman Problem), $\alpha$ is the heuristic coefficient, $\tau_{i,j}$ is the pheromone value for the component, $\beta$ is the history coefficient, and $c$ is the set of usable components.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -112,10 +112,10 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Ant Systems algorithm was designed for use with combinatorial problems such as the TSP, knapsack problem, quadratic assignment problems, graph coloring problems and many others.</li>
-<li> The history coefficient ($["\\alpha"]$) controls the amount of contribution history plays in a components probability of selection and is commonly set to 1.0.</li>
-<li> The heuristic coefficient ($["\\beta"]$) controls the amount of contribution problem-specific heuristic information plays in a components probability of selection and is commonly between 2 and 5, such as 2.5.</li>
-<li> The decay factor ($["\\rho"]$) controls the rate at which historic information is lost and is commonly set to 0.5.</li>
-<li> The total number of ants ($["m"]$) is commonly set to the number of components in the problem, such as the number of cities in the TSP.</li>
+<li> The history coefficient ($\alpha$) controls the amount of contribution history plays in a components probability of selection and is commonly set to 1.0.</li>
+<li> The heuristic coefficient ($\beta$) controls the amount of contribution problem-specific heuristic information plays in a components probability of selection and is commonly between 2 and 5, such as 2.5.</li>
+<li> The decay factor ($\rho$) controls the rate at which historic information is lost and is commonly set to 0.5.</li>
+<li> The total number of ants ($m$) is commonly set to the number of components in the problem, such as the number of cities in the TSP.</li>
 </ul>
 
 
diff --git a/docs/nature-inspired/swarm/bees_algorithm.html b/docs/nature-inspired/swarm/bees_algorithm.html
index f3ef6294..5a5d9734 100644
--- a/docs/nature-inspired/swarm/bees_algorithm.html
+++ b/docs/nature-inspired/swarm/bees_algorithm.html
@@ -109,16 +109,16 @@ <h2><a name='procedure'>Procedure</a></h2>
 <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The Bees Algorithm was developed to be used with continuous and combinatorial function optimization problems.</li>
-<li> The $["Patch_{size}"]$ variable is used as the neighborhood size. For example, in a continuous function optimization problem, each dimension of a site would be sampled as $["x_i \\pm (rand() \\times Patch_{size})"]$.</li>
-<li> The $["Patch_{size}"]$ variable is decreased each iteration, typically by a constant amount (such as 0.95).</li>
-<li> The number of elite sites ($["EliteSites_{num}"]$) must be $["<"]$ the number of sites ($["Sites_{num}"]$), and the number of elite bees ($["EliteBees_{num}"]$) is traditionally $["<"]$ the number of other bees ($["OtherBees_{num}"]$).</li>
+<li> The $Patch_{size}$ variable is used as the neighborhood size. For example, in a continuous function optimization problem, each dimension of a site would be sampled as $x_i \pm (rand \times Patch_{size})$.</li>
+<li> The $Patch_{size}$ variable is decreased each iteration, typically by a constant amount (such as 0.95).</li>
+<li> The number of elite sites ($EliteSites_{num}$) must be $<$ the number of sites ($Sites_{num}$), and the number of elite bees ($EliteBees_{num}$) is traditionally $<$ the number of other bees ($OtherBees_{num}$).</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Bees Algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is an implementation of the Bees Algorithm as described in the seminal paper  [<a href='#Pham2006'>Pham2006</a>]. A fixed patch size decrease factor of 0.95 was applied each iteration.
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/swarm/bfoa.html b/docs/nature-inspired/swarm/bfoa.html
index d2ce70d2..dbe1eaf2 100644
--- a/docs/nature-inspired/swarm/bfoa.html
+++ b/docs/nature-inspired/swarm/bfoa.html
@@ -66,14 +66,14 @@ <h2><a name='strategy'>Strategy</a></h2>
 <h2><a name='procedure'>Procedure</a></h2>
 <p>
 Algorithm (below) provides a pseudocode listing of the Bacterial Foraging Optimization Algorithm for minimizing a cost function. Algorithm (below) provides the pseudocode listing for the chemotaxis and swing behaviour of the BFOA algorithm.
-A bacteria cost is derated by its interaction with other cells. This interaction function ($["g()"]$) is calculated as follows:
+A bacteria cost is derated by its interaction with other cells. This interaction function ($g$) is calculated as follows:
 </p>
 $g(cell_k) = \sum_{i=1}^S\bigg[-d_{attr}\times exp\bigg(-w_{attr}\times \sum_{m=1}^P (cell_m^k - other_m^i)^2 \bigg) \bigg] +  \sum_{i=1}^S\bigg[h_{repel}\times exp\bigg(-w_{repel}\times \sum_{m=1}^P cell_m^k - other_m^i)^2 \bigg) \bigg]$
 <p>
-where $["cell_k"]$ is a given cell, $["d_{attr}"]$ and $["w_{attr}"]$ are attraction coefficients, $["h_{repel}"]$ and $["w_{repel}"]$ are repulsion coefficients, $["S"]$ is the number of cells in the population, $["P"]$ is the number of dimensions on a given cells position vector.
+where $cell_k$ is a given cell, $d_{attr}$ and $w_{attr}$ are attraction coefficients, $h_{repel}$ and $w_{repel}$ are repulsion coefficients, $S$ is the number of cells in the population, $P$ is the number of dimensions on a given cells position vector.
 </p>
 <p>
-The remaining parameters of the algorithm are as follows $["Cells_{num}"]$ is the number of cells maintained in the population, $["N_{ed}"]$ is the number of elimination-dispersal steps, $["N_{re}"]$ is the number of reproduction steps, $["N_{c}"]$ is the number of chemotaxis steps, $["N_{s}"]$ is the number of swim steps for a given cell, $["Step_{size}"]$ is a random direction vector with the same number of dimensions as the problem space, and each value $["\\in [-1,1]"]$, and $["P_{ed}"]$ is the probability of a cell being subjected to elimination and dispersal.
+The remaining parameters of the algorithm are as follows $Cells_{num}$ is the number of cells maintained in the population, $N_{ed}$ is the number of elimination-dispersal steps, $N_{re}$ is the number of reproduction steps, $N_{c}$ is the number of chemotaxis steps, $N_{s}$ is the number of swim steps for a given cell, $Step_{size}$ is a random direction vector with the same number of dimensions as the problem space, and each value $\in [-1,1]$, and $P_{ed}$ is the probability of a cell being subjected to elimination and dispersal.
 </p>
 <div class='pseudocode'>
 <strong><strong><code>Input</code></strong></strong>: 
@@ -137,17 +137,17 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <ul>
 <li> The algorithm was designed for application to continuous function optimization problem domains.</li>
 <li> Given the loops in the algorithm, it can be configured numerous ways to elicit different search behavior. It is common to have a large number of chemotaxis iterations, and small numbers of the other iterations.</li>
-<li> The default coefficients for swarming behavior (cell-cell interactions) are as follows $["d_{attract}=0.1"]$, $["w_{attract}=0.2"]$, $["h_{repellant}=d_{attract}"]$, and $["w_{repellant}=10"]$.</li>
+<li> The default coefficients for swarming behavior (cell-cell interactions) are as follows $d_{attract}=0.1$, $w_{attract}=0.2$, $h_{repellant}=d_{attract}$, and $w_{repellant}=10$.</li>
 <li> The step size is commonly a small fraction of the search space, such as 0.1.</li>
 <li> During reproduction, typically half the population with a low health metric are discarded, and two copies of each member from the first (high-health) half of the population are retained.</li>
-<li> The probability of elimination and dispersal ($["p_{ed}"]$) is commonly set quite large, such as 0.25.</li>
+<li> The probability of elimination and dispersal ($p_{ed}$) is commonly set quite large, such as 0.25.</li>
 </ul>
 
 
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Bacterial Foraging Optimization Algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=2"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=2$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is an implementation based on the description on the seminal work  [<a href='#Passino2002'>Passino2002</a>]. The parameters for cell-cell interactions (attraction and repulsion) were taken from the paper, and the various loop parameters were taken from the 'Swarming Effects' example.
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/docs/nature-inspired/swarm/pso.html b/docs/nature-inspired/swarm/pso.html
index 1606d596..14adad52 100644
--- a/docs/nature-inspired/swarm/pso.html
+++ b/docs/nature-inspired/swarm/pso.html
@@ -67,7 +67,7 @@ <h2><a name='procedure'>Procedure</a></h2>
 </p>
 $v_{i}(t+1) = v_{i}(t) + \big( c_1 \times rand() \times (p_{i}^{best} - p_{i}(t)) \big) +   \big( c_2 \times rand() \times (p_{gbest} - p_{i}(t)) \big)$
 <p>
-where $["v_{i}(t+1)"]$ is the new velocity for the $["i^{th}"]$ particle, $["c_1"]$ and $["c_2"]$ are the weighting coefficients for the personal best and global best positions respectively, $["p_{i}(t)"]$ is the $["i^{th}"]$ particle's position at time $["t"]$, $["p_{i}^{best}"]$ is the $["i^{th}"]$ particle's best known position, and $["p_{gbest}"]$ is the best position known to the swarm. The $["rand()"]$ function generate a uniformly random variable $["\\in [0,1]"]$. Variants on this update equation consider best positions within a particles local neighborhood at time $["t"]$.
+where $v_{i}(t+1)$ is the new velocity for the $i^{th}$ particle, $c_1$ and $c_2$ are the weighting coefficients for the personal best and global best positions respectively, $p_{i}(t)$ is the $i^{th}$ particle's position at time $t$, $p_{i}^{best}$ is the $i^{th}$ particle's best known position, and $p_{gbest}$ is the best position known to the swarm. The $rand$ function generate a uniformly random variable $\in [0,1]$. Variants on this update equation consider best positions within a particles local neighborhood at time $t$.
 </p>
 <p>
 A particle's position is updated using:
@@ -125,7 +125,7 @@ <h2><a name='heuristics'>Heuristics</a></h2>
 <h2><a name='code_listing'>Code Listing</a></h2>
 <p>
 Listing (below) provides an example of the Particle Swarm Optimization algorithm implemented in the Ruby Programming Language.
-The demonstration problem is an instance of a continuous function optimization that seeks $["\\min f(x)"]$ where $["f=\\sum_{i=1}^n x_{i}^2"]$, $["-5.0\\leq x_i \\leq 5.0"]$ and $["n=3"]$. The optimal solution for this basin function is $["(v_0,\\ldots,v_{n-1})=0.0"]$.
+The demonstration problem is an instance of a continuous function optimization that seeks $\min f(x)$ where $f=\sum_{i=1}^n x_{i}^2$, $-5.0\leq x_i \leq 5.0$ and $n=3$. The optimal solution for this basin function is $(v_0,\ldots,v_{n-1})=0.0$.
 The algorithm is a conservative version of Particle Swarm Optimization based on the seminal papers. The implementation limits the velocity at a pre-defined maximum, and bounds particles to the search space, reflecting their movement and velocity if the bounds of the space are exceeded. Particles are influenced by the best position found as well as their own personal best position. Natural extensions may consider limiting velocity with an inertia coefficient and including a neighborhood function for the particles.
 </p>
 <pre class='prettyprint lang-rb'>
diff --git a/web/generate.rb b/web/generate.rb
index 7a2b25de..7b5f61ef 100644
--- a/web/generate.rb
+++ b/web/generate.rb
@@ -325,9 +325,9 @@ def character_processing(s)
 # sucks i know
 def post_process_text(s)
   # extract math
-  math, arrays = [], []
-  s.scan(/\$([^$]+)\$/) {|m| math << m } # $$
-  s.scan(/\\\[([^$]+)\\\]/) {|m| arrays << m } #  \[ \]
+  # math, arrays = [], []
+  # s.scan(/\$([^$]+)\$/) {|m| math << m } # $$
+  # s.scan(/\\\[([^$]+)\\\]/) {|m| arrays << m } #  \[ \]
   # citations
   s = replace_citations(s)
   # listings, algorithms, tables
@@ -392,21 +392,23 @@ def post_process_text(s)
   # finally switch ` for ' (late in the subs)
   s = s.gsub("`", "'")
 
-  # put the math back
-  if !math.empty?
-    index = 0
-    s = s.gsub(/\$([^$]+)\$/) do |m|
-      index += 1
-      "$#{math[index - 1]}$"
-    end
-  end
-  if !arrays.empty?
-    index = 0
-    s = s.gsub(/\\\[([^$]+)\\\]/) do |m|
-      index += 1
-      "\\[#{arrays[index - 1]}\\]"
-    end
-  end
+
+  # # put the math back
+  # if !math.empty?
+  #   index = 0
+  #   s = s.gsub(/\$([^$]+)\$/) do |m|
+  #     index += 1
+  #     "$#{math[index - 1]}$"
+  #   end
+  # end
+  # puts(s)
+  # if !arrays.empty?
+  #   index = 0
+  #   s = s.gsub(/\\\[([^$]+)\\\]/) do |m|
+  #     index += 1
+  #     "\\[#{arrays[index - 1]}\\]"
+  #   end
+  # end
   return s
 end
 
@@ -1552,7 +1554,7 @@ def create_sitemap
 	add_line(s, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>")
 	add_line(s, "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\" xmlns:image=\"http://www.google.com/schemas/sitemap-image/1.1\" xmlns:codesearch=\"http://www.google.com/codesearch/schemas/sitemap/1.0\">")
 	# html
-	host = "http://www.cleveralgorithms.com"
+	host = "http://cleveralgorithms.com"
 	dir = "nature-inspired"
 	# all pages
 	Dir.entries(OUTPUT_DIR).each do |file|

Description
$["e"]$	$e$	Environment	The environment of the system undergoing adaptation.
$["s"]$	$s$	Strategy	The adaptive plan which determines successive structural modifications in response to the environment.
$["U"]$	$U$	Utility	A measure of performance or payoff of different structures in the environment. Maps a given solution ($["A"]$) to a real number evaluation.	A measure of performance or payoff of different structures in the environment. Maps a given solution ($A$) to a real number evaluation.
Description
$["A"]$	$A$	Search Space	The set of attainable structures, solutions, and the domain of action for an adaptive plan.
$["E"]$	$E$	Environments	The range of different environments, where $["e"]$ is an instance. It may also represent the unknowns of the strategy about the environment.	The range of different environments, where $e$ is an instance. It may also represent the unknowns of the strategy about the environment.
$["O"]$	$O$	Operators	Set of operators applied to an instance of $["A"]$ at time $["t"]$ ($["A_t"]$) to transform it into $["A_{t+1}"]$.	Set of operators applied to an instance of $A$ at time $t$ ($A_t$) to transform it into $A_{t+1}$.
$["S"]$	$S$	Strategies	Set of plans applicable for a given environment (where $["s"]$ is an instance), that use operators from the set $["O"]$.	Set of plans applicable for a given environment (where $s$ is an instance), that use operators from the set $O$.
$["X"]$	$X$	Criterion	Used to compare strategies (in the set $["S"]$), under the set of environments ($["E"]$). Takes into account the efficiency of a plan in different environments.	Used to compare strategies (in the set $S$), under the set of environments ($E$). Takes into account the efficiency of a plan in different environments.
$["I"]$	$I$	Feedback	Set of possible environmental inputs and signals providing dynamic information to the system about the performance of a particular solution $["A"]$ in a particular environment $["E"]$.	Set of possible environmental inputs and signals providing dynamic information to the system about the performance of a particular solution $A$ in a particular environment $E$.
$["M"]$	$M$	Memory	The memory or retained parts of the input history ($["I"]$) for a solution ($["A"]$).	The memory or retained parts of the input history ($I$) for a solution ($A$).
To what parts of its environment is the organism (system, organization) adapting?	What is $["E"]$?	What is $E$?
How does the environment act upon the adapting organism (system, organization)?	What is $["I"]$?	What is $I$?
What structures are undergoing adaptation?	What is $["A"]$?	What is $A$?
What are the mechanisms of adaptation?	What is $["O"]$?	What is $O$?
What part of the history of its interaction with the environment does the organism (system, organization) retain in addition to that summarized in the structure tested?	What is $["M"]$?	What is $M$?
What limits are there to the adaptive process?	What is $["S"]$?	What is $S$?
How are different (hypotheses about) adaptive processes to be compared?	What is $["X"]$?	What is $X$?