Confounded

Batch adjustment with adversarial autoencoders

Why use batch adjustment?

Data are biased based on the way they are collected. When analyzing data from multiple sources, that bias can mess up the results, so it is often useful to remove source-based bias, or "batch effects."

Why use Confounded?

Most batch adjusters assume that batch effects are linear and that source bias in one variable doesn't affect bias in other variables. However, modern analysis tools like machine learning are really good at learning nonlinear relationships, so if even very small nonlinear effects still exist after correction, modern analysis can still be biased by those effects. (See the paper for more info and a brief comparison to other methods.)

Confounded uses deep neural networks to identify and remove both linear and nonlinear batch effects.

How does it work?

Confounded uses two neural networks to adjust data for batch effects. One network (the discriminator) looks at the data and learns to tell between batches, and the other network (the autoencoder) makes small tweaks to the data in order to "fool" the discriminator. The autoencoder also tries to keep the adjusted data as similar as possible to the original data. This process continues until the discriminator can't distinguish the batches and the autoencoder is faithfully reproducing the data without batch effects.

Quick Start

Instructions for getting started quickly with Confounded can be found on its Docker page.

Installation

The easiest way to install and run Confounded is through its Docker image. If you want to install and run the source, continue reading.

TL;DR:

git clone https://github.com/jdayton3/Confounded.git
wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh # Or the anaconda installer for your system.
bash Anaconda3-5.3.1-Linux-x86_64.sh # Go through the install process.
conda create -n confounded python=3.6 r-tidyverse scikit-learn
source activate confounded
pip install tensorflow # or tensorflow-gpu

Confounded dependencies

Anaconda with Python 3 (Note: the version of h5py that ships with Anaconda may cause some deprecation warnings.)
Tensorflow

Usage

To run Confounded, run the following command:

python -m confounded path/to/input_data.csv

To see other command line options, run:

python -m confounded -h

Data preparation

Data should be a CSV in Tidy Data format. Additionally, the following specifications must be met:

One column is the sample ID and is called "Sample"
One column is the batch ID and is called "Batch"
Any other categorical column (integer or string type) represents other "meta data"
The rest of the columns are numeric features

For an example of properly formatted data, see /data/example/test_data.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
confounded		confounded
data		data
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_docker		build_docker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Confounded

Batch adjustment with adversarial autoencoders

Why use batch adjustment?

Why use Confounded?

How does it work?

Quick Start

Installation

Confounded dependencies

Usage

Data preparation

About

Releases

Packages

Contributors 3

Languages

License

jdayton3/Confounded

Folders and files

Latest commit

History

Repository files navigation

Confounded

Batch adjustment with adversarial autoencoders

Why use batch adjustment?

Why use Confounded?

How does it work?

Quick Start

Installation

Confounded dependencies

Usage

Data preparation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages