Skip to content

Zero-inflated dimensionality reduction algorithm for single-cell data

License

Notifications You must be signed in to change notification settings

alexisboukouvalas/ZIFA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZIFA

Zero-inflated dimensionality reduction algorithm for single-cell data. Created by Emma Pierson and Christopher Yau.

If you are using count data, we recommend taking the log (ie, Y = log2(1 + count_data)) prior to using ZIFA.

Reference: Dimensionality reduction for zero-inflated single cell gene expression analysis.

Algorithm code is contained in ZIFA.py and block_ZIFA.py. For datasets with more than a few thousand genes, we recommend using block_ZIFA, which subsamples genes in blocks to increase efficiency; it should yield similar results to ZIFA. Runtime for block ZIFA on the full single-cell dataset from Pollen et al, 2014 (~250 samples, ~20,000 genes) is approximately 15 minutes on a quadcore Mac Pro.

Runtime for block ZIFA is roughly linear in the number of samples and the number of genes, and quadratic in the block size. Decreasing the block size may decrease runtime but will also produce less reliable results.

See example.py for a full example demonstrating superior performance over factor analysis.

This code requires pylab, scipy, numpy, and scikits.learn for full functionality.

Please contact [email protected] with any questions or comments.

##Installation

Download the code: git clone https://github.com/epierson9/ZIFA

Install the package: cd ZIFA then python setup.py install

##Sample usage

from ZIFA import ZIFA
from ZIFA import block_ZIFA

To fit ZIFA:

Z, model_params = ZIFA.fitModel(Y, k)

To fit with the block algorithm:

Z, model_params = block_ZIFA.fitModel(Y, k)

or

Z, model_params = block_ZIFA.fitModel(Y, k, n_blocks = desired_n_blocks)

where Y is the observed zero-inflated data, k is the desired number of latent dimensions, and Z is the low-dimensional projection and desired_n_blocks is the number of blocks to divide genes into. By default, the number of blocks is set to n_genes / 500 (yielding a block size of approximately 500).

About

Zero-inflated dimensionality reduction algorithm for single-cell data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%