DS Analysis Pipeline

This repository contains notebooks and scripts for analyzing brain imaging data through a multi-stage pipeline using self-organizing maps (SOM) and heatmap generation.

Pipeline Overview

Input Images
     │
     ▼
Preprocess + Register
     │
     ▼
[Neural Network]
     │
     ├─────⮞ score-norms (B, 20) ────⮞ SOM Analysis ────⮞ Prototype Identification 
     │                                                              │
     │                                                              ▼
     │                                                      Gather Behavior Scores  ────⮞ Correlate
     │                                                                                       ▲
     └─────⮞ score-images (B,20,H,W,D) ────⮞ Likelihood Model ────⮞ Heatmaps (1,H,W,D) ─────┘

Pipeline Components

1. Image Processing and Feature Extraction

Input: Raw brain imaging data
Neural network processes images to generate:
- Score norms: Batch x 20 dimensional feature vectors
- Score images: Batch x 20 x Height x Width x Depth tensors

2. SOM Analysis Branch

Takes score-norms as input
Trains Self-Organizing Map for pattern discovery
Identifies clusters (prototypes)
Each sample will be matched to a protoype
Enables visualization of data distribution

3. Heatmap Generation Branch

Takes score-images as input
Feeds data through likelihood estimation model
Generates 3D heatmaps (H x W x D) highlighting relevant regions
Provides visualization of model attention/focus areas

4. Correlation Analysis

Combines outputs from both branches
Correlates behavioral scores with heatmap patterns
Enables statistical analysis of relationships between:
- Brain structural patterns
- Behavioral metrics
- Prototype membership

Usage

Requirements

braintypicality-scripts repository
- Contains all necessary code for preparing and processing the data
sade package available at https://github.com/ahsanMah/sade
- Please install and run the docker container

simpsom package avilable at https://github.com/ahsanMah/simpsom

cd /codespace/ && git clone https://github.com/ahsanMah/simpsom
cd /codespace/simpsom && python setup.py install --user

Additionally for plotting, holoviews and hvplot is used
Other requirements (seaborn, pandas, etc.) should be covered by the packages above

Running the Pipeline

Data Preparation

We need to prepare data to be ingested by sade. Namely, the images should be two-channels (i.e. 4D images) with T1 and T2 concatenated. Please refer to the sade documentation for more details. You may use the run_preprocessing.py script to process the data.

To run inference sade will need a file e.g. ibis-inlier.txt with filenames of the images, and a directory where this file is present as specified by --config.data.splits_dir.

Score-Norm Extraction and Heatmap Generation

The inference sript of sade can extract score-norms and create heatmaps. The outputs will be saved as numpy files: <workdir>/experiments/<experiment.id>/<subject-id>.npz. An example config is given below.

python main.py --mode inference \
  --config configs/flows/gmm_flow_config.py \
  --workdir remote_workdir/cuda_opt/learnable/ \
  --config.data.splits_dir /ASD/ahsan_projects/Developer/braintypicality-scripts/split-keys \
  --config.eval.checkpoint_num=150 \
  --config.eval.experiment.flow_checkpoint_path=remote_workdir/cuda_opt/learnable/flow/psz3-globalpsz17-nb20-lr0.0003-bs32-np1024-kimg300_smin1e-2_smax0.8 \
  --config.eval.experiment.train=abcd-train  \
  --config.eval.experiment.inlier=abcd-val \
  --config.eval.experiment.id=default-ckpt-150 \
  --config.flow.patch_batch_size=16384 # Increasing this can help speed up inference - keep it powers of 2

Note

These heatmaps are only rigidly registered to MNI. Any downstream voxel-wise analysis will need them to be deformably registered to the same space.

It is possible to use the sade_registration.py script in braintypicality-scripts to deformably register to MNI. If you choose to use sade_registration.py, here is an example:

# This will *compute* the registrations for the conte dataset
# and save them to the transforms directory specified in the code.
python sade_registration.py --mode compute \
--config /codespace/sade/sade/configs/ve/biggan_config.py \
--dataset conte

# This will *apply* the registrations from the transforms directory
python sade_registration.py --mode apply \
--config /codespace/sade/sade/configs/ve/biggan_config.py \
--dataset conte \
--load_dir /ASD/ahsan_projects/braintypicality/workdir/cuda_opt/learnable/experiments/reprod-correct/conte \
--save_dir /ASD/ahsan_projects/Developer/ds-analysis/ebds/registered-heatmaps/

SOM Analysis

The build-som-plots notebook can be run to produce the SOM clustering and the CSV of samples alongside their cluster IDs. This csv is used by the roi_correlation_analysis notebook

Heatmap Plotting

Heatmaps are stored as Numpy files. So they may be loaded and plotted using any preferred tools. An example is available in the voxel-heatmaps notebook. This will plot the average heatmap across the Down Syndrome samples that belong to the prototype. Recall, that computing the average only makes sense if the images are properly registered to the same space.

Reproducing the ds-analysis

The score-based diffusion model was trained with the following commands. This assumes the script is run from the sade folder inside the sade repository. All commands were run from inside the docker container produced by sade/docker.

python main.py --project architecture --mode train \
--config configs/ve/biggan_config.py
--workdir /workdir/cuda_opt/learnable \
--config.data.cache_rate=1.0 \
--config.model.learnable_embedding \
--config.training.batch_size=8 \ # switched to 16 at ~1 million iter
--cuda_opt

The flow model was run using

python main.py --project flows --mode flow-train \
--config configs/flows/gmm_flow_config.py \
--workdir /ASD/ahsan_projects/braintypicality/workdir/cuda_opt/learnable/ \
--config.data.cache_rate=1 \
--cuda_opt=0 \
--config.msma.min_timestep=1e-2 \
--config.msma.max_timestep=0.8 \
--config.flow.training_kimg=300

The flow model will be created in <workdir>/flow/psz3-globalpsz17-nb20-lr0.0003-bs32-np1024-kimg300_smin1e-2_smax0.8/. Then inference was run with

python main.py --mode inference \
--config configs/flows/gmm_flow_config.py \
--workdir remote_workdir/cuda_opt/learnable/ \
--config.eval.checkpoint_num=150 \
--config.eval.experiment.flow_checkpoint_path=remote_workdir/cuda_opt/learnable/flow/psz3-globalpsz17-nb20-lr0.0003-bs32-np1024-kimg300_smin1e-2_smax0.8 \
--config.eval.experiment.id=default-ckpt-150 \
--config.eval.experiment.train=abcd-test  \ # can select different cohorts
--config.eval.experiment.inlier=ibis-inlier \
--config.eval.experiment.ood=ibis-ds-sa \
--config.flow.patch_batch_size=16384

Data Formats

Inputs

Raw Images: NIfTI format (.nii.gz)
Behavioral Scores: CSV files with subject IDs and metrics

Intermediate Data

score-norms: NumPy arrays (N x 20)
score-images: NumPy arrays (N x 20 x H x W x D)

Outputs

SOM prototype assignments: CSV mapping samples to clusters
Heatmaps: 3D NIfTI files (.nii.gz)
Correlation matrices: CSV files

Visualization

The pipeline includes visualization tools for:

SOM topology and clustering
Heatmap overlays
Correlation matrices

See the notebooks/ directory for interactive visualization examples.

References

[Include relevant papers?]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DS Analysis Pipeline

Pipeline Overview

Pipeline Components

1. Image Processing and Feature Extraction

2. SOM Analysis Branch

3. Heatmap Generation Branch

4. Correlation Analysis

Usage

Requirements

Running the Pipeline

Data Preparation

Score-Norm Extraction and Heatmap Generation

SOM Analysis

Heatmap Plotting

Reproducing the ds-analysis

Data Formats

Inputs

Intermediate Data

Outputs

Visualization

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

DS Analysis Pipeline

Pipeline Overview

Pipeline Components

1. Image Processing and Feature Extraction

2. SOM Analysis Branch

3. Heatmap Generation Branch

4. Correlation Analysis

Usage

Requirements

Running the Pipeline

Data Preparation

Score-Norm Extraction and Heatmap Generation

SOM Analysis

Heatmap Plotting

Reproducing the ds-analysis

Data Formats

Inputs

Intermediate Data

Outputs

Visualization

References