GeoDeepLearnBio_v2.xml

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>Principles of Artificial Neural Networks and Machine Learning for Bioinformatics Applications</title>
<date>2023-08-26</date>
</info>
<section xml:id="_principles_of_artificial_neural_networks_and_machine_learning_for_bioinformatics_applications">
<title>Principles of Artificial Neural Networks and Machine Learning for Bioinformatics Applications</title>
<simpara>Konstantinos Krampis*<superscript>1</superscript>, Eric Ross<superscript>2</superscript>, Olorunseun O. Ogunwobi<superscript>1</superscript>, Grace Ma,<superscript>3</superscript> Raja Mazumder,<superscript>4</superscript> Claudia Wultsch<superscript>1</superscript></simpara>
<simpara><superscript>1</superscript>Belfer Research Facility, Biological Sciences, Hunter College, City University of New York, NY, USA
<superscript>2</superscript>Fox Chase Cancer Center, Philadephia, PA, USA
<superscript>3</superscript>Center for Asian Health, Lewis Katz School of Medicine, Temple University, Philadelphia, PA, USA
<superscript>4</superscript>Biochemistry and Molecular Biology, George Washington University, Washington D.C., USA</simpara>
<simpara><superscript>*</superscript>Corresponding Author, <emphasis>kk104@hunter.cuny.edu</emphasis></simpara>
<section xml:id="_abstract">
<title>ABSTRACT</title>
<simpara>With the exponential growth of machine learning and development of Artificial
Neural Network (ANNs) in recent years, there is great opportunity to leverage
this approach and accelarate biological discoveries through applications in the
analysis of high-throughput data. Various types of datasets, including protein
or gene interaction networks, molecular structures, and cellular signalling
pathways, have already been utilized for machine learning by training ANNs for
inference and pattern classification. However, unlike regular data structures
commonly used in the fields of computer science and engineering, bioinformatics
datasets present challenges that require unique algorithmic approaches. The
recent development of geometric and deep learning approaches within the machine
learning field holds great promise for accelerating the analysis of complex
bioinformatics datasets. Here, we demonstrate the principles of ANNs and their
significance for bioinformatics machine learning by presenting the underlying
mathematical and statistical foundations from group theory, symmetry, and
linear algebra. Furthermore, the structure and functions of ANN algorithms,
which constitute the core principles of artificial intelligence, are explained
in relation to the bioinformatics data domain. In summary, this manuscript
provides guidance for researchers to understand the principles necessary for
practicing machine learning and artificial intelligence, with special
considerations for bioinformatics applications.</simpara>
<simpara>*Keywords:*<emphasis>machine learning, artificial intelligence, bioinformatics, cancer biology, neural networks, symmetry, group theory, algorithms</emphasis>
biology, neural networks, symmetry, group theory, algorithms_</simpara>
</section>
<section xml:id="_simple_summary">
<title>SIMPLE SUMMARY</title>
<simpara>Here, we provide an overview of the foundational formalisms of Artificial
Neural Networks (ANNs), which serve as the basis for Artificial Intelligence
within the broader field of of Machine Learning.  The review is from the
perspective of bioinformatics data, and multiple examples showcasing the
applications of these formalisms to experimental scenarios are presented
herein. The mathematical formalisms are explained in detail, offering
biologists who are not Machine Learning experts the opportunity to understand
the algorithmic basis of Artificial Intelligence as it relates to
bioinformatics applications.</simpara>
</section>
<section xml:id="_introduction">
<title>INTRODUCTION</title>
<simpara>In summary, Artificial Intelligence (AI), Machine Learning (ML), and Deep
Learning (DL) are interconnected concepts with distinct differences: AI is
centered around developing machines capable of performing tasks that require
human intelligence, ML empowers computers to learn from data and make
predictions without explicit programming, and DL employs deep neural networks
to discern patterns from complex datasets. AI encompasses both ML and DL, which
function as subsets of AI. ML algorithms learn patterns from data to facilitate
accurate predictions or decisions and can be categorized into supervised,
unsupervised, and reinforcement learning. DL algorithms, drawing inspiration
from the human brain, utilize deep neural networks to learn and extract
patterns from large-scale datasets. DL has shown success in domains such as
image and speech recognition, natural language processing (NLP), and autonomous
driving.</simpara>
<simpara>In the last decade, technologies such as genomic sequencing have led to an
exponential increase [<link linkend="katz2022sequence">1</link>] in the data describing the
molecular elements, structure, and function of biological systems.
Additionally, data digitization and generation across varied fields such as
physics, software development, and social media [<link linkend="clissa2022survey">2</link>],
has yielded complex datasets of scales previously unavailable to scientists. AI
also provides many opportunities for healthcare, ranging from clinical
decision-support systems to deep-learning based health information management
systems. This abundance of data has played a pivotal role in the rapid progress of
machine learning, deep learning, and artificial intelligence. As a result, we
now have algorithms that can be trained to extract insights from data with a
level of sophistication that closely resembles human intuition.</simpara>
<simpara>While researchers have developed hundreds of successful algorithms, there are
currently a few overarching principles to systematically organize machine
learning algorithms. In a seminal <literal>proto-book</literal> by Bronstein et al.
[<link linkend="bronstein2021geometric">3</link>], various systematization principles for different
Artificial Neural Network (ANN) architectures and deep learning algorithms were
presented. These principles are founded on the concepts of symmetry and
mathematical group theory. Symmetry and invariance are central concepts in
physics, mathematics, and biological systems. Since the early 20<superscript>th</superscript> century,
it has been established that fundamental principles of nature are rooted in
symmetry [<link linkend="noether1918invariante">4</link>]. The authors also introduced the concept
of geometric deep learning and demonstrated how group theory, along with
function invariance and equivariance principles, can serve as foundation for
composing and describing different deep learning algorithms. Along these lines,
the present manuscript explains the structure of ANNs and the core principles
of machine learning algorithms. Additionally, it offers a review of the
mathematical and statistical foundations pertinent to the development of
artificial intelligence applications using bioinformatics data.</simpara>
</section>
<section xml:id="_the_structure_of_artificial_intelligence_and_neural_networks">
<title>THE STRUCTURE OF ARTIFICIAL INTELLIGENCE AND NEURAL NETWORKS</title>
<simpara>We will begin by describing the structures and functions of deep learning and
Artificial Neural Networks (ANNs), which form the foundation of artificial
intelligence [<link linkend="li2019deep">5</link>]. We use a dataset consisting of <emphasis>n</emphasis> pairs of
<inlineequation><alt><![CDATA[\left( x_{i},y_{i} \right)_{n}]]></alt><mathphrase><![CDATA[\left( x_{i},y_{i} \right)_{n}]]></mathphrase></inlineequation>, where <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation>
represents <emphasis>n</emphasis> data points and <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation> their corresponding labels.
Each <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> data point can take the form of a number, a vector (an
array of numbers), or a matrix (a grid of numbers), storing various types of
bioinformatics data. The labels can assume different formats, such as binary
(two-options), like <inlineequation><alt><![CDATA[y_{i} = 1]]></alt><mathphrase><![CDATA[y_{i} = 1]]></mathphrase></inlineequation> "inhibits cancer growth", or
<inlineequation><alt><![CDATA[y_{i} = 0]]></alt><mathphrase><![CDATA[y_{i} = 0]]></mathphrase></inlineequation> "does not inhibit cancer". The labels can also be
continuous numbers, for instance, <inlineequation><alt><![CDATA[y_{i} = 0.3]]></alt><mathphrase><![CDATA[y_{i} = 0.3]]></mathphrase></inlineequation> indicating 30%
inhibition, or a composite label such as <inlineequation><alt><![CDATA[y_{i} = \left( 0,1,0
\right),]]></alt><mathphrase><![CDATA[y_{i} = \left( 0,1,0
\right),]]></mathphrase></inlineequation> which signifies drug attributes like '0 - no inhibition', '1 - yes
for toxicity', '0 - not metabolized', respectively. Similarly, the input data
points can also be composite, for example, <inlineequation><alt><![CDATA[x_{i} = \left( 50,100
\right)]]></alt><mathphrase><![CDATA[x_{i} = \left( 50,100
\right)]]></mathphrase></inlineequation> representing two measuments for a single biological entity.
Regardless of the label structure, the primary objective of deep learning
algorithms and the overarching goal of artificial intelligence applications in
bioinformatics is to first train the ANN using data with known labels.
Subsequently, the ANN is utilized to classify newly generated data by
predicting their labels [<link linkend="Nair2021">6</link>].</simpara>
<simpara>The simplest structure of an artificial neural network, as depicted in <emphasis role="strong">Fig.1,</emphasis>
is "fully connected". In this structure, each neuron <emphasis>k</emphasis> within the ANN
possesses a specific number of incoming and outgoing connections. These
connections correspond to the quantity of neurons present in the previous and
next layers within the neural network [<link linkend="Nair2021">6</link>]. For example, the neuron
<inlineequation><alt><![CDATA[k_{1}^{(1)}]]></alt><mathphrase><![CDATA[k_{1}^{(1)}]]></mathphrase></inlineequation> of the <emphasis>First Layer (1)</emphasis> on <emphasis role="strong">Fig.1</emphasis>, which has
<inlineequation><alt><![CDATA[n = 2]]></alt><mathphrase><![CDATA[n = 2]]></mathphrase></inlineequation> incoming and <inlineequation><alt><![CDATA[n = 3]]></alt><mathphrase><![CDATA[n = 3]]></mathphrase></inlineequation> outgoing connections.
These connections align with the "input layer", which comprises two neurons,
and the three connections extend to the neurons of the internal ("hidden
layer") denoted as <emphasis>Second Layer (2)</emphasis> in the figure. The designation “hidden”
is attributed to the internal layers because they do not directly receive input
data.</simpara>
<simpara>This concept parallels the behavior of neurons engaged in cognition within
animal brains, in contrast to sensory neurons. While the number of neurons in
the hidden layers can vary based on the complexity of the label classification
problem that the ANN is intended to address [<link linkend="uzair2020effects">7</link>], the input
layer must have a precise number of neurons that align with the structure of
the input data. In <emphasis role="strong">Fig. 1,</emphasis> for instance, there are two input neurons, and the
data can take the form <inlineequation><alt><![CDATA[x_{i} = \left( 50,100 \right)]]></alt><mathphrase><![CDATA[x_{i} = \left( 50,100 \right)]]></mathphrase></inlineequation>. Lastly, the
output layer consists of a number of neurons corresponding to the count of
labels <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation> associated with each input data point in the
dataset. In <emphasis role="strong">Fig. 1,</emphasis> a single label is presented.</simpara>
<informalfigure role="middle">
<mediaobject>
<imageobject>
<imagedata fileref="Fig1.svg"/>
</imageobject>
<textobject><phrase>Fig1</phrase></textobject>
</mediaobject>
</informalfigure>
<simpara><?asciidoc-hr?></simpara>
<simpara><emphasis role="strong">Figure 1.</emphasis> An example <emphasis role="strong">Artificial Neural Network (ANN)</emphasis>. The signal
aggregation taking place on the second neuron
<inlineequation><alt><![CDATA[\sigma_{k_{2}^{(2)}}]]></alt><mathphrase><![CDATA[\sigma_{k_{2}^{(2)}}]]></mathphrase></inlineequation> of the second hidden layer, can be expressed
with the formula <inlineequation><alt><![CDATA[\sigma_{k_{2}^{(2)}} =
\sum_{k_{1,2,3}}^{(\begin{matrix} 1 \\ \end{matrix})}w_{k1}*x_{k1} +
w_{k2}*x_{k2} + w_{k3}*x_{k3} - b]]></alt><mathphrase><![CDATA[\sigma_{k_{2}^{(2)}} =
\sum_{k_{1,2,3}}^{(\begin{matrix} 1 \\ \end{matrix})}w_{k1}*x_{k1} +
w_{k2}*x_{k2} + w_{k3}*x_{k3} - b]]></mathphrase></inlineequation>, which is the aggregation of neuron signals
from the first layer, shown as red arrows in the figure. <emphasis>b</emphasis> represents the
threshold that needs to be overcome by the aggregation sum in order for the
neuron to fire, and then the neuron will transmit a signal along the line shown
towards the output on the final layer of the figure. The reader should refer to
the text for more details.
'''</simpara>
<simpara>Similar to neural networks in animal brains, the computational abstractions
used in machine learning and artificial intelligence model neurons as
computational units that execute signal summation and threshold activation
[<link linkend="Renganathan2019">8</link>]. Specifically, each artificial neuron performs a
summation of incoming signals from its connected neighbooring neurons in the
preceding layer on the network, shown for example as red arrows on <emphasis role="strong">Fig.1</emphasis> for
<inlineequation><alt><![CDATA[\sigma_{k_{2}^{(2)}}]]></alt><mathphrase><![CDATA[\sigma_{k_{2}^{(2)}}]]></mathphrase></inlineequation> . The signal processing throughout the ANN
transitions from the input data <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> on the leftmost layer
(<emphasis role="strong">Fig.1</emphasis>) to the output of data labels <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation> on the rightmost
end.  Within each neuron, when the aggregated input reaches a certain
threshold, the neuron "fires" and transmits a signal to the subsequent layer.</simpara>
<simpara>The signals entering the neuron can either be the data directly from the input
layer or signals generated by the activation of neurons in the intermediate
"hidden" layers. The summation and thresholding computation within each neuron
is represented with the function <inlineequation><alt><![CDATA[\sigma_{k} =
\sum_{1}^{k}w_{k}*x_{k} - b]]></alt><mathphrase><![CDATA[\sigma_{k} =
\sum_{1}^{k}w_{k}*x_{k} - b]]></mathphrase></inlineequation>, where <inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation> represents the
connection weights of the preceding neurons.  Each connection arrow in <emphasis role="strong">Fig.1</emphasis>
has a distinct weight, such as, for example, <inlineequation><alt><![CDATA[x_{k1}]]></alt><mathphrase><![CDATA[x_{k1}]]></mathphrase></inlineequation> which is the
incoming signal from the neuron <inlineequation><alt><![CDATA[\sigma_{k_{1}^{(1)}}]]></alt><mathphrase><![CDATA[\sigma_{k_{1}^{(1)}}]]></mathphrase></inlineequation>  to neuron
<inlineequation><alt><![CDATA[\sigma_{k_{2}^{(2)}}]]></alt><mathphrase><![CDATA[\sigma_{k_{2}^{(2)}}]]></mathphrase></inlineequation> , multiplied by the weight
<inlineequation><alt><![CDATA[w_{k1}]]></alt><mathphrase><![CDATA[w_{k1}]]></mathphrase></inlineequation>, which symbolizes the strength of the connection between
these two artificial neurons.</simpara>
<simpara>The weights in artificial neural networks embody the strength of connections
between neurons. They determine the impact of input signals on the final output
of the network. Throughout the training process, these weights are adjusted to
minimize the difference between the network’s predicted and intended output.
The weights govern the information flow within the network, enabling it to
learn and generate precise predictions. Accurately calibrated weights are
crucial for the network to effectively learn patterns and extrapolate its
knowledge to novel input data [<link linkend="Renganathan2019">8</link>].</simpara>
<simpara>For the majority of applications, the weight values <inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation>
constitute the only elements in the ANN structure that are variable and
adjusted by the algorithms during training using the input data. This process
is similar to the biological brain, where learning takes place by strengthening
connections among neurons [<link linkend="wainberg2018deep">9</link>].  However, unlike the
biological brain, the ANNs used for practical data analysis have fixed
connections between neurons and the structure of the neural network remains
unaltered during the process of training and learning to recognize and classify
new data. The last term <emphasis>b</emphasis> in the summation signifies a threshold that must be
surpassed, as in <inlineequation><alt><![CDATA[\sum_{1}^{k}w_{k}*x_{k} > b]]></alt><mathphrase><![CDATA[\sum_{1}^{k}w_{k}*x_{k} > b]]></mathphrase></inlineequation>, to trigger the
activation of a neuron.</simpara>
<simpara>A final step prior to transmitting the neuron’s output value involves the
application of a "logit" function to the summation value that is represented as
<inlineequation><alt><![CDATA[\varphi\left( \sigma_{k} \right)]]></alt><mathphrase><![CDATA[\varphi\left( \sigma_{k} \right)]]></mathphrase></inlineequation>.  <inlineequation><alt><![CDATA[\varphi]]></alt><mathphrase><![CDATA[\varphi]]></mathphrase></inlineequation> can be
selected from a range of non-linear functions contingent on the type of input
data and the specific analysis and data classification domain for which the ANN
will be used [<link linkend="li2019deep">5</link>]. The value of the logit function is the output
of the neuron, which is transmitted to its interconnected neurons in the
subsequent layer through outgoing connections, illustrated as an arrow in
<emphasis role="strong">Fig.1</emphasis> and corresponding to the brain cell axons in the biological analogy.
Multiple layers of interconnected neurons (<emphasis role="strong">Fig.1</emphasis>), along with multiple
connections per layer, each having its own weight <inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation>, together
form the framework of the Artificial Neural Network (ANN).</simpara>
<simpara>From a mathematical formalism perspective, a trained ANN is a function
<inlineequation><alt><![CDATA[f]]></alt><mathphrase><![CDATA[f]]></mathphrase></inlineequation> that predicts labels <inlineequation><alt><![CDATA[y_{\text{pre}d_{i}}]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}}]]></mathphrase></inlineequation>, which
can include categories such as 'no inhibition', 'yes for toxicity' etc., for
different types of input data <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation>, ranging from histology
images to drug molecules represented as graph data structures. Therefore, the
ANN undertakes data classification by operating as a mapping function
<inlineequation><alt><![CDATA[f\left( x_{i} \right) = y_{\text{pre}d_{i}}]]></alt><mathphrase><![CDATA[f\left( x_{i} \right) = y_{\text{pre}d_{i}}]]></mathphrase></inlineequation>, that connects the
input data to the respective labels. Furthermore, the <inlineequation><alt><![CDATA[f\left( x_{i}
\right)]]></alt><mathphrase><![CDATA[f\left( x_{i}
\right)]]></mathphrase></inlineequation> is a non-linear function, since it is an aggregate composition of the
non-linear functions <inlineequation><alt><![CDATA[\varphi\left( \sigma_{k} \right)]]></alt><mathphrase><![CDATA[\varphi\left( \sigma_{k} \right)]]></mathphrase></inlineequation> of the
individual interconnected neurons within the network [<link linkend="li2019deep">5</link>]. As
a result, the <inlineequation><alt><![CDATA[f\left( x_{i} \right)]]></alt><mathphrase><![CDATA[f\left( x_{i} \right)]]></mathphrase></inlineequation> can successfully classify
labels for data inputs originating from complex data distributions. This fact
enables ANNs to attain heightened analytical capability compared to
conventional statistical learning algorithms [<link linkend="tang2019recent">10</link>]. The
<inlineequation><alt><![CDATA[f\left( x_{i} \right)]]></alt><mathphrase><![CDATA[f\left( x_{i} \right)]]></mathphrase></inlineequation> estimation is carried out by fitting a training
dataset, which establishes correlations between labels <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation> and
data points <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation>. With hundreds of papers and monographs that
were written on the technical details of training ANNs, we will next attempt to
briefly summarize the process and direct the reader to provided citations for
further details [<link linkend="Zou2008a">11</link>].</simpara>
<simpara>As mentioned earlier, the only variable elements in the ANN structure are the
weights <inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation> of neuron connections. Therefore, training an ANN
to classify data involves the estimation of these weights. Furthermore, the
training process entails minimizing the error <inlineequation><alt><![CDATA[E]]></alt><mathphrase><![CDATA[E]]></mathphrase></inlineequation>, which is the
difference between the labels <inlineequation><alt><![CDATA[y_{\text{pre}d_{i}}]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}}]]></mathphrase></inlineequation> predicted by
the function <inlineequation><alt><![CDATA[f]]></alt><mathphrase><![CDATA[f]]></mathphrase></inlineequation> and the true labels <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation>. This
error metric is akin to true/false positive and negatives (precision and
recall) used in statistics, however, different formulas are used for its
estimation when dealing with multi-label or complex input data for the ANN (for
further details, refer to [<link linkend="kriegeskorte2019neural">12</link>]).  The estimation of
neuron connection weights <inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation> is executed by the algorithm
through fitting the network function <inlineequation><alt><![CDATA[f]]></alt><mathphrase><![CDATA[f]]></mathphrase></inlineequation> to a large training
dataset of <inlineequation><alt><![CDATA[\left\{ x_{i},y_{i} \right\}_{i}^{n}]]></alt><mathphrase><![CDATA[\left\{ x_{i},y_{i} \right\}_{i}^{n}]]></mathphrase></inlineequation> pairs of input
data and labels, while the error <inlineequation><alt><![CDATA[E]]></alt><mathphrase><![CDATA[E]]></mathphrase></inlineequation> is calculated by using a
subset of the data for testing and validation purposes. The training algorithm
starts with an initial value of the weights, and then performs multiple cycles,
referred to as "epochs", to estimate the function <inlineequation><alt><![CDATA[f.]]></alt><mathphrase><![CDATA[f.]]></mathphrase></inlineequation> This is
achieved by fitting the data <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> to the network and calculating
the error <inlineequation><alt><![CDATA[E]]></alt><mathphrase><![CDATA[E]]></mathphrase></inlineequation> by comparison between the predicted
<inlineequation><alt><![CDATA[y_{\text{pre}d_{i}}]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}}]]></mathphrase></inlineequation> and the true labels <inlineequation><alt><![CDATA[y_{i}]]></alt><mathphrase><![CDATA[y_{i}]]></mathphrase></inlineequation>. At
the end of each cycle, a process called "backpropagation" is performed
[<link linkend="tang2019recent">10</link>], which involves a gradient descent optimization
algorithm, which fine-tunes the weights of individual neurons to minimize
<inlineequation><alt><![CDATA[E]]></alt><mathphrase><![CDATA[E]]></mathphrase></inlineequation>.</simpara>
<simpara>The gradient descent [<link linkend="ruder2016overview">13</link>] optimization examines a large
subset of all possible combinations of weight values, yet as a heuristic
algorithm, it minimizes <inlineequation><alt><![CDATA[E]]></alt><mathphrase><![CDATA[E]]></mathphrase></inlineequation>, but cannot reach zero error. Upon the
completion of multiple training cycles, the training algorithm identifies a set
of weights that best fit the data with minimal error. The ANN settles on the optimal values that
estimate each <inlineequation><alt><![CDATA[\varphi\left( \sigma_{k} \right)]]></alt><mathphrase><![CDATA[\varphi\left( \sigma_{k} \right)]]></mathphrase></inlineequation> function for
<inlineequation><alt><![CDATA[\sigma_{k} = \sum_{1}^{k}w_{k}*x_{k} - b]]></alt><mathphrase><![CDATA[\sigma_{k} = \sum_{1}^{k}w_{k}*x_{k} - b]]></mathphrase></inlineequation>, where
<inlineequation><alt><![CDATA[w_{k}]]></alt><mathphrase><![CDATA[w_{k}]]></mathphrase></inlineequation> is the weight in each interconnected neuron.  Consequently,
the overall function <inlineequation><alt><![CDATA[f]]></alt><mathphrase><![CDATA[f]]></mathphrase></inlineequation> represented by the network is also
estimated, as it comprises the composition of the individual
<inlineequation><alt><![CDATA[\varphi\left( \sigma_{k} \right)]]></alt><mathphrase><![CDATA[\varphi\left( \sigma_{k} \right)]]></mathphrase></inlineequation> neuron functions, as mentioned
earlier. Following the completion of the artificial neural network training,
where the most optimal set of weights is determined, the network is ready to be
used for label prediction with new, unknown <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> data.</simpara>
</section>
<section xml:id="_artificial_intelligence_group_theory_symmetry_and_invariance">
<title>ARTIFICIAL INTELLIGENCE, GROUP THEORY, SYMMETRY AND INVARIANCE</title>
<section xml:id="_data_domains_in_relation_to_group_theory_and_symmetry">
<title>Data domains in relation to group theory and symmetry</title>
<simpara>In the remaining sections, we will examine how the principles of group theory, symmetry,
and invariance provide a foundational framework for comprehending the function
of machine learning algorwthms. Furthermore, the classifying power of ANNs, particularly
in relation to statistical variance, transformations, and non-homogeneity in
the input data. In broad terms, symmetry entails the analysis of geometric and
algebraic mathematical structures and finds applications across different
research fields, including physics, molecular biology, and machine learning. A
core concept in symmetry is invariance, which, in our context, is changing data
coordinates, such as relocating a drug molecule in space or shifting the
position of a cancer histology tissue sample, while maintaining the shape of
the object unchanged [<link linkend="bronstein2021geometric">3</link>]. Following such an
alteration, which will be formally defined later in this text as an <emphasis>invariant
transformation</emphasis>, it becomes imperative for the machine learning algorithms and
ANNs to be capable of identifying a drug molecule even after rotation or
recognizing cancerous tissue from a shifted histology image.</simpara>
<simpara>In order to link the abstract symmetry concepts with data classification in
machine learning, as per the terminology of Bronstein et al., we consider the
input data <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> to originate from a symmetry domain denoted as
<inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>. This <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation> serves as the foundational
structure upon which the data are based, and it is upon this domain structure
that we train artificial neural networks to undertake classification, employing
the label prediction function <inlineequation><alt><![CDATA[f]]></alt><mathphrase><![CDATA[f]]></mathphrase></inlineequation> as mentioned in the earlier
section. For example, microscopy images are essentially 2-dimensional numerical
grids of <emphasis>n x n</emphasis> pixels (<emphasis role="strong">Fig.2a</emphasis>), with each pixel having an assigned value
corresponding to the light intensity captured when the image was taken.</simpara>
<simpara>In this scenario, the data domain is a grid of integers
(<inlineequation><alt><![CDATA[\mathbb{Z}]]></alt><mathphrase><![CDATA[\mathbb{Z}]]></mathphrase></inlineequation>), represented as <inlineequation><alt><![CDATA[\Omega:\mathbb{Z}_{n}
\times \mathbb{Z}_{n}]]></alt><mathphrase><![CDATA[\Omega:\mathbb{Z}_{n}
\times \mathbb{Z}_{n}]]></mathphrase></inlineequation>. Similarly, for color images, the data domain is
<inlineequation><alt><![CDATA[\left. \ x_{i}:\Omega \rightarrow \mathbb{Z}_{n}^{3} \times
\mathbb{Z}_{n}^{3} \right.\ ]]></alt><mathphrase><![CDATA[\left. \ x_{i}:\Omega \rightarrow \mathbb{Z}_{n}^{3} \times
\mathbb{Z}_{n}^{3} \right.\ ]]></mathphrase></inlineequation>, encompassing three overlaid integer grids that
individually represent the green, bluem and red layers composing the color
image [<link linkend="Chartrand2017">14</link>]. In either case, the <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation> contains
all possible combinations of pixel intensities, while the specific pixel value
combinations of the images in the input data <inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> are a "signal"
<inlineequation><alt><![CDATA[\text{X}\left( \Omega \right)]]></alt><mathphrase><![CDATA[\text{X}\left( \Omega \right)]]></mathphrase></inlineequation> from the domain. The ANN’s data
classification and label prediction function <inlineequation><alt><![CDATA[y_{\text{pre}d_{i}} =
f\left( x_{i} \right)]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}} =
f\left( x_{i} \right)]]></mathphrase></inlineequation> is applied upon the signal <inlineequation><alt><![CDATA[\text{X}\left(
\Omega \right),]]></alt><mathphrase><![CDATA[\text{X}\left(
\Omega \right),]]></mathphrase></inlineequation> which fundamentally constitutes a subset of the domain
<inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>.</simpara>
<simpara>A <emphasis>symmetry group</emphasis> <inlineequation><alt><![CDATA[G]]></alt><mathphrase><![CDATA[G]]></mathphrase></inlineequation> contains all possible transformations of the
input signal <inlineequation><alt><![CDATA[\text{X}\left( \Omega \right),]]></alt><mathphrase><![CDATA[\text{X}\left( \Omega \right),]]></mathphrase></inlineequation> referred to as
symmetries <inlineequation><alt><![CDATA[g]]></alt><mathphrase><![CDATA[g]]></mathphrase></inlineequation> or <emphasis>group actions</emphasis>. A symmetry transformation
<inlineequation><alt><![CDATA[g]]></alt><mathphrase><![CDATA[g]]></mathphrase></inlineequation> preserves the properties of the data; for instance, it ensures
that objects within an image remain undistorted during rotation. The
constituents of the symmetry group, denoted as <inlineequation><alt><![CDATA[g \in G,]]></alt><mathphrase><![CDATA[g \in G,]]></mathphrase></inlineequation> are the
associations of two or more coordinate points <inlineequation><alt><![CDATA[u,v \in \Omega]]></alt><mathphrase><![CDATA[u,v \in \Omega]]></mathphrase></inlineequation> on
the data domain (grid in our image example). Between these coordinates, the
image can undergo rotation, shifting or other transformations without any
distortion.</simpara>
<simpara>Consequently, the key aspect of the formal mathematical definition
of the group lies in its capacity to safeguard data attributes during object
distortions that frequently occur during the experimental acquisition of
bioinformatics data. The concept of symmetry groups is important for modeling
the performance of machine learning algorithms, particularly for classifying
the data patterns despite the variability inherently present within the input
data.</simpara>
<informalfigure role="left">
<mediaobject>
<imageobject>
<imagedata fileref="Fig2a.svg"/>
</imageobject>
<textobject><phrase>Fig2a</phrase></textobject>
</mediaobject>
</informalfigure>
<informalfigure role="right">
<mediaobject>
<imageobject>
<imagedata fileref="Fig2b.svg"/>
</imageobject>
<textobject><phrase>Fig2b</phrase></textobject>
</mediaobject>
</informalfigure>
<simpara><?asciidoc-hr?></simpara>
<simpara><emphasis role="strong">Figure 2. (a).</emphasis> A <emphasis>grid</emphasis> data structure representing image pixels, is
formally a <emphasis>graph</emphasis> <emphasis role="strong">(b).</emphasis> A <emphasis>graph</emphasis> <inlineequation><alt><![CDATA[G = (V, E)]]></alt><mathphrase><![CDATA[G = (V, E)]]></mathphrase></inlineequation>, is composed of
<emphasis>nodes</emphasis> <inlineequation><alt><![CDATA[V]]></alt><mathphrase><![CDATA[V]]></mathphrase></inlineequation> shown as circles, and <emphasis>edges</emphasis>  connecting the nodes and
shown as arrows. It can represent a protein, where the amino acids are the
nodes and the peptide bonds between amino acids are the edges.</simpara>
<simpara><?asciidoc-hr?></simpara>
<simpara>Another important data structure within bioinformatics is a <emphasis>graph</emphasis> denoted as
<inlineequation><alt><![CDATA[G = (V,E)]]></alt><mathphrase><![CDATA[G = (V,E)]]></mathphrase></inlineequation>, composed of <emphasis>nodes</emphasis> <inlineequation><alt><![CDATA[V]]></alt><mathphrase><![CDATA[V]]></mathphrase></inlineequation> that signify
biological entities, and <emphasis>edges</emphasis> representing connections between pairs of
nodes (<emphasis role="strong">Fig.</emphasis> <emphasis role="strong">2b</emphasis>). In a specific instance of a graph corresponding to a
real-world object, the edges are a subset of all possible links between nodes.
An example graph data structure for a biological molecule such a protein or a
drug would portray the amino acids or atoms as node entities, while the
chemical bonds between each of these entities are captured as edges. These
edges could signify the carbonyl-amino (C-N) peptide bonds between amino acids
and molecular interactions across the peptide chain on the protein structure,
or the chemical bonds between atoms in a drug molecule
[<link linkend="Kriegeskorte2019">15</link>].</simpara>
<simpara>Furthermore, attributes in the molecular data such as, for example, polarity,
amino acid weight, or drug binding properties can be depicted as
<inlineequation><alt><![CDATA[s]]></alt><mathphrase><![CDATA[s]]></mathphrase></inlineequation> - dimensional node attributes, where <emphasis>s</emphasis> represents the
attributes assigned to each node.  Similarly, edges or even entire graphs can
have attributes, for experimental data measured on the molecular interactions
represented by the edges, and measurements of the properties of the complete
protein or drug. Finally, from an algorithmic perspective, images can be viewed
as a special case of graphs in which the pixels serve as nodes, interconnected
by edges following a structured pattern that generates a grid formation
(<emphasis role="strong">Fig.2a</emphasis>) representing the adjacent positions of the pixels.</simpara>
</section>
<section xml:id="_group_theory_and_symmetry_principles_applied_to_machine_learning">
<title>Group theory and symmetry principles applied to machine learning</title>
<simpara>Having established the mathematical and algorithmic parallels between graphs
and images, we will now utilize the principles of the <emphasis>symmetry group</emphasis>
<inlineequation><alt><![CDATA[G]]></alt><mathphrase><![CDATA[G]]></mathphrase></inlineequation> to examine the analytical and classification power of machine
learning ANNs, with respect to data variability and transformations. Whether it
involves data types like input images or molecules represented as graphs, which
may undergo shifts or rotations, we introduce the concept of invariance guided
by the principles of group theory and symmetry. These foundational mathematical
and algorithmic formalisms serve as the basis for modeling the performance and
output of machine learning algorithms, specifically ANNs, with regard to the
diversity present in the dataset.</simpara>
<simpara>Consecutively, these principles can be extrapolated and generalized to
encompass other types of data beyond graphs and images, for which ANNs are
trained to predict and categorize.  While we present the group and symmetry
definitions following a data-centric approach, we will remain consistent with
the mathematical framework, while describing how the group operations can
effect transformations on the input data. Furthermore, different types of data
may have the same symmetry group, and different transformations could be
performed through identical group operations. For example, an image featuring a
triangle, which essentially is a graph with three nodes, might possess the same
rotational symmetry group as a graph with three nodes or a numerical sequence
of three elements.</simpara>
<simpara>When chemical and biological molecules are represented as graphs as described
earlier, the nodes <inlineequation><alt><![CDATA[V]]></alt><mathphrase><![CDATA[V]]></mathphrase></inlineequation> can be in any order depending on how the
data were measured during the experiment. However, this variation does not
change the underlying information contained in the data. As long as the edges
<emphasis role="strong">E,</emphasis> which represent the connections between molecules, remain unchanged, we
maintain an accurate representation of the molecular entity, irrespective of
the sequence of nodes in <emphasis role="strong">V</emphasis>. In cases where two graphs portraying the same
molecule have identical edges but differ in node arrangement, they are called
<emphasis>isomorphic</emphasis>. It is crucial that any machine learning algorithm designed for
pattern recognition on graphs, should not depend on the ordering of nodes. This
ensures that classification using ANNs and artificial intelligence remain
robust against variations in experiment measurement encountered in real-world
data [<link linkend="AgatonovicKustrin2000">16</link>]. This is something that is taken for
granted with human intelligence, where, for example, we can recognize an object
even when a photograph is rotated at an angle.</simpara>
</section>
<section xml:id="_invariance_and_the_classification_power_of_artificial_neural_networks">
<title>Invariance and the classification power of artificial neural networks</title>
<simpara>Returning to our earlier formal definitions of ANNs as function estimators
fitted to the data, in order for ANNs algorithms to equivalently recognize
<emphasis>isomorphic</emphasis> graphs, the functions <inlineequation><alt><![CDATA[\varphi\left( \sigma_{k}
\right)]]></alt><mathphrase><![CDATA[\varphi\left( \sigma_{k}
\right)]]></mathphrase></inlineequation> and overall <inlineequation><alt><![CDATA[f\left( x_{i} \right)]]></alt><mathphrase><![CDATA[f\left( x_{i} \right)]]></mathphrase></inlineequation> of the ANN acting on
graph data should be <emphasis>permutation invariant</emphasis>. This implies that for any
permutation of the input dataset, the output values of these functions remain
unchanged, regardless of the ordering of the nodes <emphasis role="strong">V</emphasis>. This concept can be
similarly applied to images, which, as previously mentioned, are specialized
instances of fully connected graphs.  Furthermore, these principles can also be
generalized for other data types beyond images or graphs.</simpara>
<simpara>To further formalize the concept of invariance, and considering that both image
and graph examples are essentially points on a grids on a two-dimemensional
plane, we can use linear algebra. Specifically, by using a matrix we can
represent the data transformations as group actions, denoted by
<inlineequation><alt><![CDATA[g]]></alt><mathphrase><![CDATA[g]]></mathphrase></inlineequation>, within the symmetry group <inlineequation><alt><![CDATA[G]]></alt><mathphrase><![CDATA[G]]></mathphrase></inlineequation>. The use of matrices
enables us to connect the group symmetries with the actual data by performing
matrix multiplications that modify the coordinates of the object and
consecutively represent the data transformations through the multiplication.
The dimensions of the matrix, <inlineequation><alt><![CDATA[n \times n,]]></alt><mathphrase><![CDATA[n \times n,]]></mathphrase></inlineequation> typically are similar
to these of the signal space <inlineequation><alt><![CDATA[\text{X}\left( \Omega \right)]]></alt><mathphrase><![CDATA[\text{X}\left( \Omega \right)]]></mathphrase></inlineequation> for
the data (e.g., <inlineequation><alt><![CDATA[\mathbb{Z}_{n} \times \mathbb{Z}_{n}]]></alt><mathphrase><![CDATA[\mathbb{Z}_{n} \times \mathbb{Z}_{n}]]></mathphrase></inlineequation> images).
The matrix dimensions not depend on the size of the group (i.e. the number of
possible symmetries) or the dimensionality of the underlying data domain
<inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>. With this definition in place, we can formalize
symmetries and group actions for modifying data objects, employing matrix and
linear transformations as the foundation for connecting invariance in relation
to variability in the data.</simpara>
<simpara>We will now conclude by establishing the mathematical and linear algebra
formalisms that underlie the resilience of ANNs and machine learning algorithms
in pattern recognition, considering transformations in the data. While our
framework is based on a two-dimensional grid data domain <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>,
the formalisms developed here can also be extrapolated to any number of
dimensions or data formats without loss of generality. First, we will connect
matrices to group actions <inlineequation><alt><![CDATA[g]]></alt><mathphrase><![CDATA[g]]></mathphrase></inlineequation> (such as rotations, shifts) within
the symmetry group <inlineequation><alt><![CDATA[g \in G]]></alt><mathphrase><![CDATA[g \in G]]></mathphrase></inlineequation> by defining a function
<inlineequation><alt><![CDATA[\theta]]></alt><mathphrase><![CDATA[\theta]]></mathphrase></inlineequation> that maps the group to a matrix as <inlineequation><alt><![CDATA[\theta:G
\rightarrow \mathbf{M}]]></alt><mathphrase><![CDATA[\theta:G
\rightarrow \mathbf{M}]]></mathphrase></inlineequation>. As mentioned earlier, a matrix <inlineequation><alt><![CDATA[\mathbf{M}
\in R^{n \times n}]]></alt><mathphrase><![CDATA[\mathbf{M}
\in R^{n \times n}]]></mathphrase></inlineequation> consisting of numerical values (integers, fractions,
positive and negative), when multiplied by the coordinate values of an object
on the plane <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>, results in rotation or shifts of the
object’s coordinates for the exact amount corresponding to the group action
within the symmetry group.</simpara>
<simpara>With these definitions in place, we will now connect the matrix formalisms with
the neural network estimator function <inlineequation><alt><![CDATA[y_{\text{pre}d_{i}} = f\left(
x_{i} \right)]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}} = f\left(
x_{i} \right)]]></mathphrase></inlineequation>, which is identified by adjusting neuron connection weights
during multiple training cycles with the input data. Our goal is to leverage
the mathematical formalisms of group symmetry and invariance to establish the
resilience of ANNs in classifying and assigning labels to new data points
[<link linkend="Eetemadi2019">17</link>]. These data points originate from real-world data that
might contain tranformations and distortions.  First, we define the estimator
function of the ANN to be <emphasis>invariant</emphasis> if the condition for the input data
holds, i.e.  <inlineequation><alt><![CDATA[f(\mathbf{M} \times x_{i}) = f(x_{i})]]></alt><mathphrase><![CDATA[f(\mathbf{M} \times x_{i}) = f(x_{i})]]></mathphrase></inlineequation> for all
matrices <inlineequation><alt><![CDATA[\mathbf{M}]]></alt><mathphrase><![CDATA[\mathbf{M}]]></mathphrase></inlineequation> representing the actions <inlineequation><alt><![CDATA[g \in
G]]></alt><mathphrase><![CDATA[g \in
G]]></mathphrase></inlineequation> within the symmetry group.</simpara>
<simpara>This formula encapsulates the requirement for the neural network function to be
invariant: its output value remains the same whether the input data
<inlineequation><alt><![CDATA[x_{i}]]></alt><mathphrase><![CDATA[x_{i}]]></mathphrase></inlineequation> are transformed or not (e.g., an image or graph is not
rotated on the plane), as represented by the matrix multiplication
<inlineequation><alt><![CDATA[\mathbf{M} \times x_{i}]]></alt><mathphrase><![CDATA[\mathbf{M} \times x_{i}]]></mathphrase></inlineequation>. Therefore, the output values
<inlineequation><alt><![CDATA[y_{\text{pre}d_{i}} = f\left( x_{i} \right)]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}} = f\left( x_{i} \right)]]></mathphrase></inlineequation> produced by the ANN,
which essentially represent predicted output labels (e.g.,
<inlineequation><alt><![CDATA[y_{\text{pre}d_{i}}]]></alt><mathphrase><![CDATA[y_{\text{pre}d_{i}}]]></mathphrase></inlineequation> = potent drug / not potent), based on the
input data, exhibit resilience to noisy and deformed real-world data when the
network estimator function is invariant. In a different case, the estimator
function approximated by the ANN can be <emphasis>equivariant</emphasis> and defined as
<inlineequation><alt><![CDATA[f(\mathbf{M} \times x_{i}) = \mathbf{M} \times f(x_{i})]]></alt><mathphrase><![CDATA[f(\mathbf{M} \times x_{i}) = \mathbf{M} \times f(x_{i})]]></mathphrase></inlineequation>.  This
signifies that the output of the ANN will be modified, but the label prediction
result will shift equally alongside the shift in the input data.</simpara>
</section>
<section xml:id="_neural_networks_and_group_theory_in_relation_to_continuous_data_transformations">
<title>Neural networks and group theory in relation to continuous data transformations</title>
<simpara>Up to this point, we have exclusively discussed discrete tranformations in
linear algebra terms, utilizing matrix multiplications that lead to coordinate
shifts and rigid transformations of the data, like rotating an image or graph
by a specific angle on the grid <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation>.  However, in real-world
data scenarios, we often also encounter continuous, more fine-grained shifts.
In such cases, ANNs algorithms should be able to recognize patterns, classify,
and label the data without any loss of performance [<link linkend="Wright2022">18</link>].
Mathematically, the continuous transformations follow equally with the
invariant and equivariant functions described earlier. For instance, if the
domain <inlineequation><alt><![CDATA[\Omega]]></alt><mathphrase><![CDATA[\Omega]]></mathphrase></inlineequation> contains data with smooth transformations and
shifts, such as moving images (videos) or shifts of molecules and graphs that
maintain <emphasis>continuity</emphasis> in a topological definition
[<link linkend="sutherland2009introduction">19</link>], in this case we deal with a concept
known as <emphasis>homeomorphism</emphasis> instead of <emphasis>invariance</emphasis>.</simpara>
<simpara>Finally, if the rate of continuous transformation of the data is quantifiable,
meaning that the function <inlineequation><alt><![CDATA[\theta,]]></alt><mathphrase><![CDATA[\theta,]]></mathphrase></inlineequation> which maps the group to a
matrix, is <emphasis>differentiable</emphasis>, then the members of the symmetry groups will be
part of a <emphasis>diffeomorphism</emphasis>. As it follows from the principles of calculus, in
this case, infinitely multiple matrices <inlineequation><alt><![CDATA[f(\mathbf{(}M)]]></alt><mathphrase><![CDATA[f(\mathbf{(}M)]]></mathphrase></inlineequation> will be
needed to be produced by <inlineequation><alt><![CDATA[\theta]]></alt><mathphrase><![CDATA[\theta]]></mathphrase></inlineequation> for the continuous change of the
data coordinates at every point. These differentiable data structures are
common with manifolds, which, for example, could be used to represent proteins
in fine detail. In this case, the molecule would be represented as a cloud with
all atomic forces surrounding the structure, as opposed to the discrete data
structure of nodes and edges in a graph. Finally, if the manifold structure
also includes a metric of <emphasis>distance</emphasis> between its points to further quantify the
data transformations, in this case, we will have an <emphasis>isometry</emphasis> during the
transformation due to a group action from the symmetry group.</simpara>
</section>
</section>
<section xml:id="_applications_of_artificial_intelligence_and_neural_networks_in_bioinformatics">
<title>APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND NEURAL NETWORKS IN BIOINFORMATICS</title>
<simpara>Artificial Intelligence (AI) and Deep Learning have emerged as powerful tools
with diverse applications in the field of bioinformatics, and multiple research
studies have been reported in the literature [<link linkend="pmid37446831">20</link>],
[<link linkend="pmid37189058">21</link>], [<link linkend="pmid37043378">22</link>], highlighting the potential of
the technology to revolutionize healthcare and life sciences. One of the
significant applications is drug discovery, as AI algorithms facilitate the
analysis of large datasets of chemical compounds, predicting their
effectiveness and safety [<link linkend="pmid37479540">23</link>], [<link linkend="pmid37458097">24</link>],
[<link linkend="pmid37454742">25</link>]. These studies have demonstrated that AI can accelerate
the drug discovery process by screening potential candidates and optimizing
their properties, resulting in substantial cost and time savings.</simpara>
<simpara>In the field of genomics, AI algorithms have been applied to the analysis of
DNA sequencing and gene expression data, facilitating, for example, the
identification of disease-causing mutations and enhancing our understanding of
genetic variations [<link linkend="pmid37453366">26</link>], [<link linkend="pmid37446311">27</link>],
[<link linkend="pmid37386009">28</link>], [<link linkend="pmid37370847">29</link>].  Moreover, in these studies, genomic
data analysis with AI algorithms has provided critical insights, which can
assist in the development of personalized medicine approaches and as result
tailor treatments to individual patients. Consecutively, the use of AI
algorithms in bioinformatics can contribute to the advancement of precision
medicine.  By integratively analyzing also other omics data (e.g.,
transcriptomics, proteomics, metabolomics), patient data, encompassing genetic
information, medical history, and lifestyle factors, AI-driven insights can
lead to improved predictions of drug responses, identification of potential
side effects, and the recommendation of optimal treatment options for
individual patients.</simpara>
<simpara>This personalized medicine approach can also involve enhancing patient care and
treatment outcomes, through disease diagnosis improved by machine learning
analysis of medical images, including computed tomography (CT) and magnetic
resonance imaging (MRI) scans, X-rays, and histopathology images, of diseases
like cancer [<link linkend="pmid37488621">30</link>], [<link linkend="pmid37478073">31</link>], [<link linkend="pmid37474003">32</link>],
[<link linkend="pmid37449611">33</link>].  The AI algorithms can assist pathologists and
radiologists in rendering precise diagnoses, enabling early detection and
diagnosis, and ultimately contributing to overall improvements in patient
outcomes.</simpara>
<simpara>AI can also play a significant role in assisting the development of
bioinformatics tools and software accelerating the process of code development
for the analysis and interpretation of biological data, such as sequence
alignment, protein structure prediction, and functional annotation
[<link linkend="pmid37329982">34</link>], [<link linkend="pmid37463768">35</link>], [<link linkend="pmid37460991">36</link>].  Furthermore,
AI-powered natural language processing techniques have been
employed to analyze scientific literature, patents, and clinical trial reports.
This capability enables researchers to stay updated about the latest
discoveries and facilitates knowledge discovery in the field.</simpara>
<simpara>Finally, in the area of clinical trials, machine learning algorithms have been
appplied to mine vast amounts of data from clinical trials. As a result, the
rates of success for new drugs and treatment strategies have improved for
patients partipating in the trials [<link linkend="pmid37486997">37</link>],
[<link linkend="pmid37483175">38</link>]. Additional studies have also demonstrated that machine
learning algorithms can result in enhanced optimization of clinical trial
designs, reduction in costs, and an overall acceleration of the drug
development pipelines [<link linkend="pmid37479540">23</link>], [<link linkend="pmid37458097">24</link>].</simpara>
<section xml:id="_conclusion">
<title>CONCLUSION</title>
<simpara>The rapid advancements in the fields of Machine Learning and Artificial
Intelligence in recent years have exerted a substantial influence in the field
of Bioinformatics. With these accelerated developements, the chance to
systematically categorize algorithms and their corresponding applications,
along with their perfomance across various types of bioinformatics data, has
diminished. By harnessing the mathematical formalisms of symmetry and group
theory, we can establish the operational principles of Artificial Intelligence
algorithms concerning bioinformatics data. This not only paves the way for a
deeper understanding of their functionality but also provides insights into the
directions for future development in the field.</simpara>
<simpara><emphasis role="strong">Funding Information:</emphasis> This work has been supported by Award Number U54
CA221704(5) from The National Cancer Institute.</simpara>
<simpara><emphasis role="strong">Author Contributions:</emphasis> K. Krampis wrote the manuscript and performed the
research. C. Wultsch provided overview during the development of the research
and the manuscript. E. Ross, O. Ogunwobi, G. Ma and R. Mazumder contributed to
the development of the research and provided feedback during the development of
the manuscript.</simpara>
<simpara><emphasis role="strong">Conflict of Interest:</emphasis> The authors declare no conflicts of interest.</simpara>
<simpara><emphasis role="strong">Institutional Review Board Statement:</emphasis> Not Applicable.</simpara>
<simpara><emphasis role="strong">Informed Consent Statement:</emphasis> Not Applicable.</simpara>
<simpara><emphasis role="strong">Data Availability Statement:</emphasis> No data were generated as part of the present
review paper.</simpara>
<simpara><emphasis role="strong">Acknowledgments:</emphasis> The authors would like to thank their respective
institutions for supporting their scholarly work.</simpara>
<simpara><emphasis role="strong">Conflicts of Interest:</emphasis> The authors declare no conflict of interest.</simpara>
<simpara>[1] K. Katz, O. Shutov, R. Lapoint, M. Kimelman, J. R. Brister, and C.
    O’Sullivan, “The sequence read archive: a decade more of explosive growth,”
<emphasis>Nucleic acids research</emphasis>, vol. 50, no. D1, pp. D387–D390, 2022.</simpara>
<simpara>[2] L. Clissa, “Survey of Big Data sizes in 2021.” 2022.</simpara>
<simpara>[3] M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric deep
    learning: Grids, groups, graphs, geodesics, and gauges,” <emphasis>arXiv preprint
arXiv:2104.13478</emphasis>, 2021.</simpara>
<simpara>[4] E. Noether, “Invariante variationsprobleme, math-phys,” <emphasis>Klasse,
    pp235-257</emphasis>, 1918.</simpara>
<simpara>[5] Y. Li, C. Huang, L. Ding, Z. Li, Y. Pan, and X. Gao, “Deep learning in
    bioinformatics: Introduction, application, and perspective in the big data
era,” <emphasis>Methods</emphasis>, vol. 166, pp. 4–21, 2019.</simpara>
<simpara>[6] T. M. Nair, “Building and Interpreting Artificial Neural Network Models for
    Biological Systems.,” <emphasis>Methods in molecular biology (Clifton, N.J.)</emphasis>, vol.
2190, pp. 185–194, 2021, doi: 10.1007/978-1-0716-0826-5_8.</simpara>
<simpara>[7] M. Uzair and N. Jamil, “Effects of hidden layers on the efficiency of
    neural networks,” in <emphasis>2020 IEEE 23rd international multitopic conference
(INMIC)</emphasis>, 2020, pp. 1–6.</simpara>
<simpara>[8] V. Renganathan, “Overview of artificial neural network models in the
    biomedical domain.,” <emphasis>Bratislavske lekarske listy</emphasis>, vol. 120, no. 7, pp.
536–540, 2019, doi: 10.4149/BLL_2019_087.</simpara>
<simpara>[9] M. Wainberg, D. Merico, A. Delong, and B. J. Frey, “Deep learning in
    biomedicine,” <emphasis>Nature biotechnology</emphasis>, vol. 36, no. 9, pp. 829–838, 2018.</simpara>
<simpara>[10] B. Tang, Z. Pan, K. Yin, and A. Khateeb, “Recent advances of deep learning
     in bioinformatics and computational biology,” <emphasis>Frontiers in genetics</emphasis>,
vol. 10, p. 214, 2019.</simpara>
<simpara>[11] J. Zou, Y. Han, and S.-S. So, “Overview of artificial neural networks.,”
     <emphasis>Methods in molecular biology (Clifton, N.J.)</emphasis>, vol. 458, pp. 15–23, 2008,
doi: 10.1007/978-1-60327-101-1_2.</simpara>
<simpara>[12] N. Kriegeskorte and T. Golan, “Neural network models and deep learning,”
     <emphasis>Current Biology</emphasis>, vol. 29, no. 7, pp. R231–R236, 2019.</simpara>
<simpara>[13] S. Ruder, “An overview of gradient descent optimization algorithms,”
     <emphasis>arXiv preprint arXiv:1609.04747</emphasis>, 2016.</simpara>
<simpara>[14] G. Chartrand <emphasis>et al.</emphasis>, “Deep Learning: A Primer for Radiologists.,”
     <emphasis>Radiographics : a review publication of the Radiological Society of North
America, Inc</emphasis>, vol. 37, no. 7, pp. 2113–2131, 2017, doi: 10.1148/rg.2017170077.</simpara>
<simpara>[15] N. Kriegeskorte and T. Golan, “Neural network models and deep learning.,”
     <emphasis>Current biology : CB</emphasis>, vol. 29, no. 7, pp. R231–R236, Apr.  2019, doi:
10.1016/j.cub.2019.02.034.</simpara>
<simpara>[16] S. Agatonovic-Kustrin and R. Beresford, “Basic concepts of artificial
     neural network (ANN) modeling and its application in pharmaceutical
research.,” <emphasis>Journal of pharmaceutical and biomedical analysis</emphasis>, vol. 22, no.
5, pp. 717–727, Jun. 2000, doi: 10.1016/s0731-7085(99)00272-1.</simpara>
<simpara>[17] A. Eetemadi and I. Tagkopoulos, “Genetic Neural Networks: an artificial
     neural network architecture for capturing gene expression relationships.,”
<emphasis>Bioinformatics (Oxford, England)</emphasis>, vol. 35, no. 13, pp. 2226–2234, Jul. 2019,
doi: 10.1093/bioinformatics/bty945.</simpara>
<simpara>[18] L. G. Wright <emphasis>et al.</emphasis>, “Deep physical neural networks trained with
     backpropagation.,” <emphasis>Nature</emphasis>, vol. 601, no. 7894, pp. 549–555, Jan. 2022,
doi: 10.1038/s41586-021-04223-6.</simpara>
<simpara>[19] W. A. Sutherland, <emphasis>Introduction to metric and topological spaces</emphasis>.  Oxford
     University Press, 2009.</simpara>
<simpara>[20] M. Lee, “Recent Advances in Deep Learning for Protein-Protein Interaction
     Analysis: A Comprehensive Review,” <emphasis>Molecules</emphasis>, vol. 28, no.  13, Jul.
2023.</simpara>
<simpara>[21] M. Wysocka, O. Wysocki, M. Zufferey, D. Landers, and A. Freitas, “A
     systematic review of biologically-informed deep learning models for
cancer: fundamental trends for encoding and interpreting oncology data,” <emphasis>BMC
Bioinformatics</emphasis>, vol. 24, no. 1, p. 198, May 2023.</simpara>
<simpara>[22] B. Jahanyar, H. Tabatabaee, and A. Rowhanimanesh, “Harnessing Deep
     Learning for Omics in an Era of COVID-19,” <emphasis>OMICS</emphasis>, vol. 27, no. 4, pp.
141–152, Apr. 2023.</simpara>
<simpara>[23] F. W. Pun, I. V. Ozerov, and A. Zhavoronkov, “AI-powered therapeutic
     target discovery,” <emphasis>Trends Pharmacol Sci</emphasis>, Jul. 2023.</simpara>
<simpara>[24] G. Floresta, C. Zagni, V. Patamia, and A. Rescifina, “How can artificial
     intelligence be utilized for de novo drug design against COVID-19
(SARS-CoV-2)?,” <emphasis>Expert Opin Drug Discov</emphasis>, pp. 1–4, Jul. 2023.</simpara>
<simpara>[25] Y. Zhou <emphasis>et al.</emphasis>, “Deep learning in preclinical antibody drug discovery
     and development,” <emphasis>Methods</emphasis>, Jul. 2023.</simpara>
<simpara>[26] A. rez-Mena, E. n, M. J. Alvarez-Cubero, A. Anguita-Ruiz, L. J.
     Martinez-Gonzalez, and J. Alcala-Fdez, “Explainable artificial
intelligence to predict and identify prostate cancer tissue by gene
expression,” <emphasis>Comput Methods Programs Biomed</emphasis>, vol. 240, p. 107719, Jul.  2023.</simpara>
<simpara>[27] W. Wei, Y. Li, and T. Huang, “Using Machine Learning Methods to Study
     Colorectal Cancer Tumor Micro-Environment and Its Biomarkers,” <emphasis>Int J Mol
Sci</emphasis>, vol. 24, no. 13, Jul. 2023.</simpara>
<simpara>[28] D. Shigemizu <emphasis>et al.</emphasis>, “Classification and deep-learning-based prediction
     of Alzheimer disease subtypes by using genomic data,” <emphasis>Transl Psychiatry</emphasis>,
vol. 13, no. 1, p. 232, Jun. 2023.</simpara>
<simpara>[29] Z. Mirza <emphasis>et al.</emphasis>, “Identification of Novel Diagnostic and Prognostic Gene
     Signature Biomarkers for Breast Cancer Using Artificial Intelligence and
Machine Learning Assisted Transcriptomics Analysis,” <emphasis>Cancers (Basel)</emphasis>, vol.
15, no. 12, Jun. 2023.</simpara>
<simpara>[30] R. Adam, K. Dell’Aquila, L. Hodges, T. Maldjian, and T. Q. Duong, “Deep
     learning applications to breast cancer detection by magnetic resonance
imaging: a literature review,” <emphasis>Breast Cancer Res</emphasis>, vol. 25, no. 1, p. 87, Jul.
2023.</simpara>
<simpara>[31] Y. Tong <emphasis>et al.</emphasis>, “Prediction of lymphoma response to CAR T cells by deep
     learning-based image analysis,” <emphasis>PLoS One</emphasis>, vol. 18, no. 7, p.  e0282573,
2023.</simpara>
<simpara>[32] L. R. Archila <emphasis>et al.</emphasis>, “Performance of an Artificial Intelligence Model
     for Recognition and Quantitation of Histologic Features of Eosinophilic
Esophagitis on Biopsy Samples,” <emphasis>Mod Pathol</emphasis>, p. 100285, Jul. 2023.</simpara>
<simpara>[33] Q. Li, A. Sandoval, and B. Chen, “Advancing spinal cord injury research
     with optical clearing, light sheet microscopy, and artificial
intelligence-based image analysis,” <emphasis>Neural Regen Res</emphasis>, vol. 18, no. 12, pp.
2661–2662, Dec. 2023.</simpara>
<simpara>[34] M. Santorsola and F. Lescai, “The promise of explainable deep learning for
     omics data analysis: Adding new discovery tools to AI,” <emphasis>N Biotechnol</emphasis>,
vol. 77, pp. 1–11, Jun. 2023.</simpara>
<simpara>[35] B. Waissengrin <emphasis>et al.</emphasis>, “Artificial intelligence (AI) molecular analysis
     tool assists in rapid treatment decision in lung cancer: a case report,”
<emphasis>J Clin Pathol</emphasis>, Jul. 2023.</simpara>
<simpara>[36] F. Hosseini, F. Asadi, H. Emami, and M. Ebnali, “Machine learning
     applications for early detection of esophageal cancer: a systematic
review,” <emphasis>BMC Med Inform Decis Mak</emphasis>, vol. 23, no. 1, p. 124, Jul. 2023.</simpara>
<simpara>[37] S. M. Ahmed, R. V. Shivnaraine, and J. C. Wu, “FDA Modernization Act 2.0
     Paves the Way to Computational Biology and Clinical Trials in a Dish,”
<emphasis>Circulation</emphasis>, vol. 148, no. 4, pp. 309–311, Jul. 2023.</simpara>
<simpara>[38] A. Aliper <emphasis>et al.</emphasis>, “Prediction of clinical trials outcomes based on
     target choice and clinical trial design with multi-modal artificial
intelligence,” <emphasis>Clin Pharmacol Ther</emphasis>, Jul. 2023.</simpara>
</section>
</section>
</section>
</article>