TensorBFS · mroavi · Sep 10, 2023 · Sep 9, 2023 · Sep 9, 2023 · Sep 9, 2023
diff --git a/docs/make.jl b/docs/make.jl
@@ -48,13 +48,16 @@ makedocs(;
     ),
     pages=[
         "Home" => "index.md",
-        "Background" => "background.md",
+        "Background" => [
+            "Probabilistic Inference" => "probabilisticinference.md",
+            "Tensor Networks" => "tensornetwork.md",
+            "UAI file formats" => "uai-file-formats.md"
+        ],
         "Examples" => [
             "Overview" => "examples-overview.md",
             "Asia Network" => "generated/asia/main.md",
             "Hard-core Lattice Gas" => "generated/hard-core-lattice-gas/main.md",
            ],
-        "UAI file formats" => "uai-file-formats.md",
         "Performance tips" => "generated/performance.md",
         "API" => [
             "Public" => "api/public.md",

diff --git a/docs/src/assets/preambles/the-tensor-network.tex b/docs/src/assets/preambles/the-tensor-network.tex
@@ -0,0 +1,24 @@
+\usepackage{tikz}
+\usepackage{xcolor}
+\usetikzlibrary{positioning}
+
+\definecolor{c01}{HTML}{5790fc}
+\definecolor{c02}{HTML}{f89c20}
+\definecolor{c03}{HTML}{e42536}
+\definecolor{c04}{HTML}{964a8b}
+\definecolor{c05}{HTML}{9c9ca1}
+\definecolor{c06}{HTML}{7a21dd}
+
+\tikzset {
+    mytensor/.style={
+      circle,
+      thick,
+      fill=white,
+      draw=black!100,
+      font=\small,
+      minimum size=0.5cm
+    },
+    myedge/.style={
+      line width=0.80pt,
+    }
+}
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -59,9 +59,10 @@ more complex, real-world models.
 ## Outline
 ```@contents
 Pages = [
-  "background.md",
-  "examples-overview.md",
+  "probabilisticinference.md",
+  "tensornetwork.md",
   "uai-file-formats.md",
+  "examples-overview.md",
   "performance.md",
   "api/public.md",
   "api/internal.md",

diff --git a/docs/src/background.md → docs/src/probabilisticinference.md b/docs/src/background.md → docs/src/probabilisticinference.md
@@ -1,4 +1,4 @@
-# Background
+# Probabilistic inference
 
 *TensorInference* implements efficient methods to perform Bayesian inference in
 *probabilistic graphical models*, such as Bayesian Networks or Markov random

diff --git a/docs/src/tensornetwork.md b/docs/src/tensornetwork.md
@@ -0,0 +1,211 @@
+# Tensor networks
+
+We now introduce the core ideas of tensor networks, highlighting their
+connections with the probabilistic graphical models (PGM) domain to align the terminology between them.
+
+For our purposes, a **tensor** is equivalent with the concept of a factor
+presented above, which we detail more formally below.
+
+## What is a tensor?
+*Definition*: A tensor $T$ is defined as:
+```math
+T: \prod_{V \in \bm{V}} \mathcal{D}_{V} \rightarrow \texttt{number}.
+```
+Here, the function $T$ maps each possible instantiation of the random
+variables in its scope $\bm{V}$ to a generic number type. In the context of tensor networks,
+a minimum requirement is that the number type is a commutative semiring.
+To define a commutative semiring with the addition operation $\oplus$ and the multiplication operation $\odot$ on a set $S$, the following relations must hold for any arbitrary three elements $a, b, c \in S$.
+```math
+\newcommand{\mymathbb}[1]{\mathbb{#1}}
+\begin{align*}
+(a \oplus b) \oplus c = a \oplus (b \oplus c) & \hspace{5em}\text{$\triangleright$ commutative monoid $\oplus$ with identity $\mymathbb{0}$}\\
+a \oplus \mymathbb{0} = \mymathbb{0} \oplus a = a &\\
+a \oplus b = b \oplus a &\\
+&\\
+(a \odot b) \odot c = a \odot (b \odot c)  &   \hspace{5em}\text{$\triangleright$ commutative monoid $\odot$ with identity $\mymathbb{1}$}\\
+a \odot  \mymathbb{1} =  \mymathbb{1} \odot a = a &\\
+a \odot b = b \odot a &\\
+&\\
+a \odot (b\oplus c) = a\odot b \oplus a\odot c  &  \hspace{5em}\text{$\triangleright$ left and right distributive}\\
+(a\oplus b) \odot c = a\odot c \oplus b\odot c &\\
+&\\
+a \odot \mymathbb{0} = \mymathbb{0} \odot a = \mymathbb{0}
+\end{align*}
+```
+Tensors are represented using multidimensional arrays of nonnegative numbers
+with labeled dimensions. These labels correspond to the array's indices, which
+in turn represent the set of random variables that the tensor is a function
+of. Thus, in this context, the terms **label**, **index**, and
+**variable** are synonymous and hence used interchangeably.
+
+## What is a tensor network?
+We now turn our attention to defining a **tensor network**.
+Tensor network a mathematical object that can be used to represent a multilinear map between tensors. It is widely used in condensed matter physics [^Orus2014][^Pfeifer2014] and quantum simulation [^Markov2008][^Pan2022]. It is also a powerful tool for solving combinatorial optimization problems [^Liu2023].
+It is important to note that we use a generalized version of the conventional
+notation, which is also knwon as the [eisnum](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) function that widely used in high performance computing.
+Packages that implement the conventional notation include
+- [numpy](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html)
+- [OMEinsum.jl](https://github.com/under-Peter/OMEinsum.jl)
+- [PyTorch](https://pytorch.org/docs/stable/generated/torch.einsum.html)
+- [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/einsum)
+
+This approach allows us to represent a more extensive set of sum-product multilinear operations between tensors, meeting the requirements of the PGM field.
+
+*Definition*[^Liu2023]: A tensor network is a multilinear map represented by the triple
+$\mathcal{N} = (\Lambda, \mathcal{T}, \bm{\sigma}_0)$, where:
+-  $\Lambda$ is the set of variables present in the network
+    $\mathcal{N}$.
+-  $\mathcal{T} = \{ T^{(k)}_{\bm{\sigma}_k} \}_{k=1}^{M}$ is the set of
+    input tensors, where each tensor $T^{(k)}_{\bm{\sigma}_k}$ is identified
+    by a superscript $(k)$ and has an associated scope $\bm{\sigma}_k$.
+-  $\bm{\sigma}_0$ specifies the scope of the output tensor.
+
+More specifically, each tensor $T^{(k)}_{\bm{\sigma}_k} \in \mathcal{T}$ is
+labeled by a string $\bm{\sigma}_k \in \Lambda^{r \left(T^{(k)} \right)}$, where
+$r \left(T^{(k)} \right)$ is the rank of $T^{(k)}$. The multilinear map, or
+the `contraction`, applied to this triple is defined as
+```math
+\texttt{contract}(\Lambda, \mathcal{T}, \bm{\sigma}_0) = \sum_{\bm{\sigma}_{\Lambda
+\setminus [\bm{\sigma}_0]}} \prod_{k=1}^{M} T^{(k)}_{\bm{\sigma}_k},
+```
+Notably, the summation extends over all instantiations of the variables that
+are not part of the output tensor.
+
+As an example, the matrix multiplication can be specified as a tensor network
+contraction
+```math
+  (AB)_{ik} = \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right),
+```
+where matrices $A$ and $B$ are input tensors labeled by strings $ij, jk \in
+\{i, j, k\}^2$. The output tensor is labeled by string $ik$. The
+summation runs over indices $\Lambda \setminus [ik] = \{j\}$. The contraction
+corresponds to
+```math
+  \texttt{contract}\left(\{i,j,k\}, \{A_{ij}, B_{jk}\}, ik\right) = \sum_j
+  A_{ij}B_{jk},
+```
+In programming languages, this is equivalent to einsum notation `ij, jk -> ik`.
+
+Diagrammatically, a tensor network can be represented as an *open hypergraph*. In the tensor network diagram, a tensor is mapped to a vertex,
+and a variable is mapped to a hyperedge. If and only if tensors share the same variable, we connect
+them with the same hyperedge for that variable. The diagrammatic
+representation of matrix multiplication is as bellow.
+```@eval
+using TikzPictures
+
+tp = TikzPicture(
+  L"""
+    \matrix[row sep=0.8cm,column sep=0.8cm,ampersand replacement= \& ] {
+      \node (1) {};                                               \&
+      \node (a) [mytensor] {$A$};                                 \&
+      \node (b) [mytensor] {$B$};                                 \&
+      \node (2) {};                                               \&
+                                                                  \\
+    };
+    \draw [myedge, color=c01] (1) edge node[below] {$i$} (a);
+    \draw [myedge, color=c02] (a) edge node[below] {$j$} (b);
+    \draw [myedge, color=c03] (b) edge node[below] {$k$} (2);
+""", options="scale=3.8",
+    preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}",
+    )
+save(SVG("the-tensor-network1"), tp)
+```
+
+```@raw html
+<img src="the-tensor-network1.svg"  style="margin-left: auto; margin-right: auto; display:block; width=50%">
+```
+
+Here, we use different colors to denote different hyperedges. Hyperedges for
+$i$ and $j$ are left open to denote variables in the output string
+$\bm{\sigma}_0$. The reason why we should use hyperedges rather than regular edge
+will be made clear by the followng star contraction example.
+```math
+  \texttt{contract}(\{i,j,k,l\}, \{A_{il}, B_{jl}, C_{kl}\}, ijk) = \sum_{l}A_{il}
+  B_{jl} C_{kl}
+```
+In programming languages, this is equivalent to einsum notation `il, jl, kl -> ijk`.
+
+Among the variables, $l$ is shared by all three tensors, hence the diagram can
+not be represented as a simple graph. The hypergraph representation is as
+below.
+```@eval
+using TikzPictures
+
+tp = TikzPicture(
+  L"""
+    \matrix[row sep=0.4cm,column sep=0.4cm,ampersand replacement= \& ] {
+                                  \&
+                                  \&
+      \node[color=c01] (j) {$j$};            \&
+                                  \&
+                                  \&
+                                    \\
+                                  \&
+                                  \&
+      \node (b) [mytensor] {$B$}; \&
+                                  \&
+                                  \&
+                                    \\
+      \node[color=c03] (i) {$i$};            \&
+      \node (a) [mytensor] {$A$}; \&
+      \node[color=c02] (l) {$l$};            \&
+      \node (c) [mytensor] {$C$}; \&
+      \node[color=c04] (k) {$k$};            \&
+                                    \\
+    };
+    \draw [myedge, color=c01] (j) edge (b);
+    \draw [myedge, color=c02] (b) edge (l);
+    \draw [myedge, color=c03] (i) edge (a);
+    \draw [myedge, color=c02] (a) edge (l);
+    \draw [myedge, color=c02] (l) edge (c);
+    \draw [myedge, color=c04] (c) edge (k);
+""", options="",
+    preamble="\\input{" * joinpath(@__DIR__, "assets", "preambles", "the-tensor-network") * "}",
+    )
+save(SVG("the-tensor-network2"), tp)
+```
+
+```@raw html
+<img src="the-tensor-network2.svg"  style="margin-left: auto; margin-right: auto; display:block; width=50%">
+```
+
+As a final comment, repeated indices in the same tensor is not forbidden in
+the definition of a tensor network, hence self-loops are also allowed in a tensor
+network diagram.
+
+## Tensor network contraction orders
+The performance of a tensor network contraction depends on the order in which
+the tensors are contracted. The order of contraction is usually specified by
+binary trees, where the leaves are the input tensors and the internal nodes
+represent the order of contraction. The root of the tree is the output tensor.
+
+Plenty of algorithms have been proposed to find the optimal contraction order, which includes
+- Greedy algorithms
+- Breadth-first search and Dynamic programming [^Pfeifer2014]
+- Graph bipartitioning [^Gray2021]
+- Local search [^Kalachev2021]
+
+Some of them have already been included in the [OMEinsum](https://github.com/under-Peter/OMEinsum.jl) package. Please check [Performance Tips](@ref) for more details.
+
+## References
+
+[^Orus2014]:
+    Orús R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states[J]. Annals of physics, 2014, 349: 117-158.
+
+[^Markov2008]:
+    Markov I L, Shi Y. Simulating quantum computation by contracting tensor networks[J]. SIAM Journal on Computing, 2008, 38(3): 963-981.
+
+[^Pfeifer2014]:
+    Pfeifer R N C, Haegeman J, Verstraete F. Faster identification of optimal contraction sequences for tensor networks[J]. Physical Review E, 2014, 90(3): 033315.
+
+[^Gray2021]:
+    Gray J, Kourtis S. Hyper-optimized tensor network contraction[J]. Quantum, 2021, 5: 410.
+
+[^Kalachev2021]:
+    Kalachev G, Panteleev P, Yung M H. Multi-tensor contraction for XEB verification of quantum circuits[J]. arXiv:2108.05665, 2021.
+
+[^Pan2022]:
+    Pan F, Chen K, Zhang P. Solving the sampling problem of the sycamore quantum circuits[J]. Physical Review Letters, 2022, 129(9): 090502.
+
+[^Liu2023]:
+    Liu J G, Gao X, Cain M, et al. Computing solution space properties of combinatorial optimization problems via generic tensor networks[J]. SIAM Journal on Scientific Computing, 2023, 45(3): A1239-A1270.
diff --git a/examples/asia/main.jl b/examples/asia/main.jl
@@ -60,8 +60,9 @@ tn = TensorNetworkModel(model)
 
 # ---
 
-# Calculate the ``\log_{10}`` partition function 
-probability(tn) |> first |> log10
+# Calculate the partition function.
+# Since the factors in this model is normalized, the partition function is the same as total probability, $1$.
+probability(tn) |> first
 
 # ---
 

diff --git a/examples/hard-core-lattice-gas/main.jl b/examples/hard-core-lattice-gas/main.jl
@@ -26,19 +26,23 @@ using GenericTensorNetworks.Graphs: edges, nv
 graph = unit_disk_graph(vec(sites), blockade_radius)
 show_graph(graph; locs=sites, texts=fill("", length(sites)))
 
-# These constraints defines a independent set problem that characterized by the following energy based model.
-# Let $G = (V, E)$ be a graph, where $V$ is the set of vertices and $E$ be the set of edges. The energy model for the hard-core lattice gas problem is
+# These constraints defines an independent set problem that characterized by the following energy based model.
+# Let $G = (V, E)$ be a graph, where $V$ is the set of vertices and $E$ is the set of edges.
+# The energy model for the hard-core lattice gas problem is
 # ```math
-# E(\mathbf{n}) = -\sum_{i \in V}w_i n_i + \infty \sum_{(i, j) \in E} n_i n_j
+# E(\mathbf{n}) = -\sum_{i \in V}w_i n_i + U \sum_{(i, j) \in E} n_i n_j
 # ```
 # where $n_i \in \{0, 1\}$ is the number of particles at site $i$, and $w_i$ is the weight associated with it. For unweighted graphs, the weights are uniform.
-# The solution space hard-core lattice gas is equivalent to that of an independent set problem. The independent set problem involves finding a set of vertices in a graph such that no two vertices in the set are adjacent (i.e., there is no edge connecting them).
+# $U$ is the repulsive interaction strength between two particles.
+# To represent the independence constraint, we let $U = \infty$, i.e. coexitence of two particles at two sites connected by an edge is completely forbidden.
+# The solution space hard-core lattice gas is equivalent to that of an independent set problem.
+# The independent set problem involves finding a set of vertices in a graph such that no two vertices in the set are adjacent (i.e., there is no edge connecting them).
 # One can create a tensor network based modeling of an independent set problem with package [`GenericTensorNetworks.jl`](https://github.com/QuEraComputing/GenericTensorNetworks.jl).
 using GenericTensorNetworks
 problem = IndependentSet(graph; optimizer=GreedyMethod());
 
-# There has been a lot of discussions related to solution space properties in the `GenericTensorNetworks` [documentaion page](https://queracomputing.github.io/GenericTensorNetworks.jl/dev/generated/IndependentSet/).
-# In this example, we show how to use `TensorInference` to use probabilistic inference for understand the finite temperature properties of this statistic physics model.
+# There are plenty of discussions related to solution space properties in the `GenericTensorNetworks` [documentaion page](https://queracomputing.github.io/GenericTensorNetworks.jl/dev/generated/IndependentSet/).
+# In this example, we show how to use `TensorInference` to use probabilistic inference for understand the finite temperature properties of this statistical model.
 # We use [`TensorNetworkModel`](@ref) to convert a combinatorial optimization problem to a probabilistic model.
 # Here, we let the inverse temperature be $\beta = 3$.
 
@@ -62,7 +66,8 @@ pmodel2 = TensorNetworkModel(problem, β; mars=[[e.src, e.dst] for e in edges(gr
 mars = marginals(pmodel2);
 
 # We show the probability that both sites on an edge are not occupied
-show_graph(graph; locs=sites, edge_colors=[(b = mars[[e.src, e.dst]][1, 1]; (1-b, 1-b, 1-b)) for e in edges(graph)], texts=fill("", nv(graph)), edge_line_width=5)
+show_graph(graph; locs=sites, edge_colors=[(b = mars[[e.src, e.dst]][1, 1]; (1-b, 1-b, 1-b)) for e in edges(graph)], texts=fill("", nv(graph)),
+    edge_line_widths=edge_colors=[8*mars[[e.src, e.dst]][1, 1] for e in edges(graph)])
 
 # ## The most likely configuration
 # The MAP and MMAP can be used to get the most likely configuration given an evidence.
@@ -90,5 +95,5 @@ sum(config2)
 # One can ue [`sample`](@ref) to generate samples from hard-core lattice gas at finite temperature.
 # The return value is a matrix, with the columns correspond to different samples.
 configs = sample(pmodel3, 1000)
-sizes = sum(configs; dims=1)
+sizes = sum.(configs)
 [count(==(i), sizes) for i=0:34]  # counting sizes