Welcome!
This repository contains the implementation for the method WaviFM, a Wavelet-based variationally-inferred Factor Model. It also contains example analysis code that reproduces the figures and results from the thesis concerning WaviFM.
Note: environmental setup described below only describe one setting which works, in practice many of the requirement can be relaxed (especially less relevant ones to the code and compilation, such as the VSCode version, Windows version)
Environmental setup
- Python 3.7.12
- Regarding required packages, I recommend creating a Python environment with the provided environment file
./environment.yml
. For example, this can be done in Anaconda prompt by placing this file in the current directory of Anaconda prompt and runningconda env create -f environment.yml
. The resulting environment is calledpy3712
- Doing so will likely install some redundant packages, but at least will not be missing package.
- However, if one wishes to manually install packages, below lists the main packages that should be installed (alongside their dependencies):
- pybind11
- ipykernel
- ipython
- numpy
- pandas
- matplotlib
- seaborn
- pywt
- scipy
- scikit-learn
- Regarding required packages, I recommend creating a Python environment with the provided environment file
- R 4.4.1
- Required packages (and their dependencies) include:
- BiocManager
- STexampleData
- devtools
- OliverXUZY/waveST (on Github)
- Above packages can be installed by running the following code in R
- Required packages (and their dependencies) include:
- G++ 13.2.0 (Rev6, Built by MSYS2 project)
# Install dependency
install.packages("tidyverse")
install.packages("fields")
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("STexampleData")
install.packages("devtools")
library(devtools)
install_github("OliverXUZY/waveST")
Below gives step by step instructions to reproducing results from individual sections in the thesis. This was tested to work with the provided compiled Python bindings file ./build/WaviFM.cp37-win_amd64.pyd
is present in the repository. Compiling the bindings yourself should also work.
Reproducing section 4.1.3:
- Run every Jupyter notebook in
./examples/wavelet_reconstruction/
. In this directory, the Jupyter notebookwavelet_reconstruction_2M.ipynb
andwavelet_reconstruction_10M.ipynb
reproduces results from the thesis for the data set simulated from 2 and 10 ground truth factors, respectively.
Reproducing section 4.1.4:
- Run every Jupyter notebook in
./examples/use_spatial_info/
. In this directory, the Jupyter notebookpolygons.ipynb
,polygonsPermuted.ipynb
,rectangles.ipynb
,rectanglesPermuted.ipynb
reproduces results from the thesis for WaviFM and PCA applied on their namesake data sets. Note that PCA was still ran for spot-permuted data sets so that comparison with PCA inference from non-spot-permuted data sets enable confirmation that indeed spot permutation does not affect PCA inference (up to minor numerical differences).
Reproducing section 4.1.5:
- Run every Jupyter notebook in
./examples/multilength_robustness/
. In this directory, each Jupyter notebook reproduces the RRMSE values for WaviFM, Smoothed PCA, PCA applied to their namesake data sets in the thesis.
Reproducing section 4.1.6: note that exported results for steps 1, 2 are already included in the repository, so one can skip them if one only wants to reproduce the figures without running inference and other preceding steps again.
- Run
./examples/wavest_comparison/data_simulation_and_wavest_inference.Rmd
. This simulates the data, exports the data, and runs waveST inference on the data. - Run
./examples/wavest_comparison/wavifm_inference.ipynb
. This runs WaviFM inference on the data exported in the previous step and exports the inference result. - Run
./examples/wavest_comparison/comparison_analysis.ipynb
. This generates the figures for this section and compares the WaviFM and waveST inference results.
Reproducing section 4.2.1: note that exported results for steps 1, 2, 3 are already included in the repository, so one can skip them if one only wants to reproduce the figures without running inference and other preceding steps again.
- Run
./examples/human_dlpfc/dlpfc_preprocessing_and_to_csv.ipynb
. This loads the DLPFC data, preprocesses this data, and exports it to CSV format use in ensuing inference. - Run
./examples/human_dlpfc/dlpfc_wavest_inference.Rmd
. This runs waveST inference on the DLPFC data. - Run
./examples/human_dlpfc/dlpfc_wavifm_pca_inference.ipynb
. This runs WaviFM and PCA inference on the DLPFC data. - Run
./examples/human_dlpfc/dlpfc_analysis.ipynb
. This generates the figures for this section and compares the WaviFM, PCA and waveST inference results.
Reproducing Appendix C:
- Run
./examples/wavelet_inference/wavelet_inference.ipynb
.
Environmental setup
- Windows 10 Pro, Version 10.0.19045 Build 19045 (I have not verified on whether compilation works on other operating systems)
- VSCode Version 1.91.0, with the following extensions
- C/C++
- C/C++ Extension Pack
- CMake
- CMake Tools
- CMake Version 3.30
- G++ 13.2.0 (Rev6, Built by MSYS2 project)
- Python 3.7.12
- Regarding required packages, I recommend creating a Python environment with the provided environment file
./python_environment.yml
. For example, this can be done in Anaconda prompt by placing this file in the current directory of Anaconda prompt and runningconda env create -f python_environment.yml
. The resulting environment is calledpy3712
- Doing so will likely install some redundant packages, but at least will not be missing package.
- However, if one wishes to manually install packages, below lists the main packages that should be installed (alongside their dependencies):
- pybind11
- Regarding required packages, I recommend creating a Python environment with the provided environment file
Depending on your environment, there may be additional setups required to get C++ and CMake running with VSCode. The following linked series of tutorials for C++ and CMake integration in VSCode may be of use (https://code.visualstudio.com/docs/cpp/introvideos-cpp).
Steps to compiling Python bindings for CAVI
- Open this repository with VSCode
- Delete the
./build
directory. This helps avoid any conflicts with your compilation and existing compilation results. - Edit
set(PYBIND11_PYTHON_VERSION 3.7.12)
andset(PYTHON_EXECUTABLE "C:/Users/yw/miniconda3/envs/py3712/python.exe")
in./CMakeLists.txt
respectively to match your Python version and the corresponding Python executable path. - Run
CMake:Build
command in VSCode - The Python bindings should appear in
.\build\
with suffix.pyd
and a name that depends on the Python version and environment used for compilation. An example of the full filename of such generated binding isWaviFM.cp37-win_amd64.pyd
.
You may also wish to run the tests for the C++ code. This can be done by running the Cmake:Run Tests
command in VSCode
.\build
: contains CMake generated files, albeit currently only contains WaviFM.cp37-win_amd64.pyd
which is the Python 3.7 bindings (compiled from C++ source code) for the CAVI part of WaviFM. This allows one to use the CAVI part of WaviFM and in turn WaviFM as a whole without needing to compile the C++ source code.
.\data
: contains DLPFC data
.\eigen
: contains eigen library for C++, used for some numerical computations in C++ source code.
.\examples
: contains examples of applications of WaviFM that reproduces the figures and results from the thesis concerning WaviFM.
.\googletest
: contains googletest library for C++, used for automated testing of C++ source code.
.\pybind11
: contains pybind11 library for C++, used for compiling Python bindings (from C++ source code) for the CAVI part of WaviFM.
.\scripts
: contains python scripts with containing various functionalities, including running WaviFM, plotting, etc.
.\src
: contains C++ source code for the CAVI part of WaviFM
.\test
: contains testing code for the C++ source code for the CAVI part of WaviFM