Skip to content

Latest commit

 

History

History
146 lines (109 loc) · 5.67 KB

README.org

File metadata and controls

146 lines (109 loc) · 5.67 KB

Diffusion Maps for Dynamics by Marko Budisic (mailto:[email protected])

This is my implementation of Coifman/Lafon algorithm for efficient embeddings of high-dimensional data sets, specifically geared towards analysis of dynamical systems. The use can best be seen in two examples:

=exampleTorus=
demonstrate that the algorithm successfully embeds a 2-torus
=exampleDynamics=
demonstrate how algorithm is used on trajectories of a dynamical system

References:

A brief overview of functionalities in their calling order is provided below. Each .m function contains commented code, so refer to source for details.

Computation functions

exampleDynamics.m

A file demonstrating workflow for analyzing dynamical systems. The dynamics used is a simple planar double-well potential. It should be easy enough to generalize to a 3d system.

The parameters specifying resolution, length of simulated trajectories, and number of Fourier harmonics are at the beginning of the file. They are set to low numbers to facilitate verifying that everything runs smoothly. You can increase them to get a better resolution, but slower run-time, and see how each of them impacts the final picture.

Note that the exampleDynamics will store computed trajectories, but not the averages, into exampleDynamicsTrajectories.mat. If you change Ngrid or Tmax, erase exampleDynamicsTrajectories.mat to re-compute trajectories.

Computation of trajectories and computation of averages can exploit any parallel threads that might be open. Run

>> matlabpool open

before running the example for a speedup.

computeAverages.m

Computes averages of a Fourier basis along a single state-space trajectory. As a result of this function, each trajectory is “described” using a vector of time averages.

(The code is written so that Matlab Coder can automatically generate MEX file from it, speeding up execution. To generate mex files, run

>> deploytool -build computeAverages.prj

in Matlab. The rest of the code will automatically use MEX if available.)

sobolevMatrix.m

Computes pairwise-distances between vectors of time averages using a Sobolev norm. If the state-space is $D$-dimensional, the recommended Sobolev index is $s = -(D+1)/2$. The space of trajectories (ergodic quotient for infinite averages) can be thought of as a graph, where vertices correspond to trajectories, and edges are Sobolev distances stored in the resulting matrix.

(The code is written so that Matlab Coder can automatically generate MEX file from it, speeding up execution. To generate mex files, run

>> deploytool -build computeAverages.prj

in Matlab. The rest of the code will automatically use MEX if available.)

dist2diff.m

Computes diffusion coordinates on a graph based on the pairwise-distance matrix between vertices, like the one generated by =sobolevMatrix= Typically, only a few diffusion coordinates are needed, but it’s possible to compute as many diffusion coordinates as there are trajectories (dimension of the distance matrix).

The diffusion coordinates algorithm depends on the bandwidth parameter $h$, which models the speed at which diffusion proceeds. For data analysis, this parameter can be determined heuristically using, e.g., =nss=function, or it can be tuned manually. A value that is out of acceptable range will result in “important” diffusion eigenvectors, i.e., those carrying new information, to be relegated to higher indices. If $h$ is too small, the ergodic quotient graph will be artificially disconnected, and as a consequence, first diffusion coordinates will look like indicator functions supported on the disconnected components. If $h$ is too large, the first coordinate will be monotonic over data, but the next ones will look like higher harmonics over the same one-dimensional set. In both cases, higher-index diffusion coordinates would find the “important” geometry of the data set, but it’s not easy to say a priori at which coordinate this would happen.

In any case, the algorithm is tolerant to order-of-magnitude changes in \(h\), so the choice is not that crucial. The heuristic algorithm =nss= gives a good starting point for any tuning that might be needed.

nss.m

Heuristically calculates a suitable diffusion bandwidth based on input data. The bandwidth is set to the minimal number \(h\) such that \(h\)-distance of any vertex contains a selected number of other vertices. This ensures one-step diffusion-connectedness of the graph, in coarse terms.

exampleTorus.m

A sanity-check example. The vertices are described by values of Fourier-harmonics sampled on a torus. The diffusion coordinates should correspond to heat-modes on 2-torus, which are again Fourier harmonics. Therefore, embedding in the first few diffusion modes should resemble a 3d image of a doughnut.

The example calls the same functions as seen before:

Visualization functions

viewHarmonic.m

Visualize a Fourier harmonic on a rectangular domain, based on its wavenumber and domain width/height.

License