CarDEC Evaluations

CarDEC (Count adapted regularized Deep Embedded Clustering) is a joint deep learning computational tool that is useful for analyses of single-cell RNA-seq data. The CarDEC method's repository can be found here.

This repository is dedicated to providing the code used to perform all evaluations in the CarDEC paper. It includes code used to generate results for CarDEC, and for every competing method:

scVI
DCA + Combat
MNN
Scanorama
scDeepCluster

General Flow

It is recommended the user proceeds as follows.

Clone this repository to their local machine
Download the data from Box.
Install all necessary packages.
Run all evaluations.
Run Rscripts to generate final plots.

Clone this repository to your local machine

Clone this repository to your local machine using the standard procedure.

Download the data from Box

Download the data from Box, and place them into the currently empty data folder.

Install all necessary packages

The user will need to install multiple packages: anaconda, two conda environments containing many dependencies, and a version of R >= 4.0

Install Anaconda

First, install Anaconda if you do not already have it, so that you can access conda commands in terminal.

Set up conda environments

Next, use cardec.yml and cardec_alternatives.yml to set up the "cardec" and "cardec_alternatives" environments respectively.

To do this, simply cd in the cloned "CarDEC_Codes" repository. Once in this directory, run the following two commands.

$ conda env create -f cardec.yml
$ conda env create -f cardec_alternatives.yml

Install R

Lastly, install a version of R. It is highly recommended that the user installs R version >= 4.0. Rstudio is also reccomended for installation, but not required.

Run all evaluations

Next, it is recommended that the user run all of the evaluation notebooks. The user should activate either the cardec or cardec_alternatives environment before opening jupyter to run the python notebooks. This is necessary because these two environments have "nb_conda_kernels" installed, which will allow the user to switch anaconda environments in the jupyter app. The following command will activate the cardec environment.

$ conda activate cardec

Then, open jupyter. The user can use either jupyter notebook or jupyter lab. The following command will open jupyter lab.

$ jupyter lab

Run CarDEC Notebooks

It is recommended that the user first run the CarDEC notebook. Simply, open each of the following notebooks in jupyter. Make sure to set the activate conda kernel in jupyter to "cardec" and then run all cells. Repeat this for every notebook listed below.

Run MNN R Scripts

Next, it is recommended that the user run all scripts to evaluate MNN. For each file in the list below, the user should open R (or Rstudio), and execute the script.

Run Other Method Python Notebooks

In the next step, the user should run the python notebooks to evaluate all methods other than CarDEC and MNN. Simply, open each of the following notebooks in jupyter. Make sure to set the activate conda kernel in jupyter to "cardec_alternatives" and then run all cells. Repeat this for every notebook listed below.

Remark: The Competing Methods for monocyte is a folder. All reproducing codes related to monocyte dataset can be found in this folder.

Run CV Score Notebooks

Lastly, the user should run the python notebooks used to generate the coefficient of variation plots demonstrated in many of the CarDEC paper's figures. Simply, open each of the following notebooks in jupyter. Make sure to set the activate conda kernel in jupyter to "cardec" and then run all cells. Repeat this for every notebook listed below.

Run Rscripts to Generate Final Plots

This last step is purely optional. In the previous steps, all analysis was completed. This final step involves using Rscripts to generate final figures. These Rscripts do not perform any actual analysis, they are simply used in order to generate prettier plots than Python for the paper. For example, all UMAP plots in the paper were generated by running all analysis in Python, exporting the computed UMAP coordinates to a csv file, and then reading this csv into R to build a prettier UMAP plot using ggplot2.

If the user wishes to generate the final plots, they just need to open each folder and run any Rscripts they find. These Rscripts should run in under 30 seconds each since they just read in small csv files and generate UMAP plots. The scripts have names like "figure_make.R", "figure_make_HVGo.R", "figure_make_bybatch.R", etcetra. A few figure folders will not contain Rscripts, which means that no R postprocessing was done to generate final figures.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
Evaluations of CarDEC		Evaluations of CarDEC
Evaluations of Competing Methods		Evaluations of Competing Methods
Figures		Figures
README.md		README.md
cardec.yml		cardec.yml
cardec_alternatives.yml		cardec_alternatives.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CarDEC Evaluations

General Flow

Clone this repository to your local machine

Download the data from Box

Install all necessary packages

Install Anaconda

Set up conda environments

Install R

Run all evaluations

Run CarDEC Notebooks

Run MNN R Scripts

Run Other Method Python Notebooks

Run CV Score Notebooks

Run Rscripts to Generate Final Plots

About

Releases

Packages

Languages

jlakkis/CarDEC_Codes

Folders and files

Latest commit

History

Repository files navigation

CarDEC Evaluations

General Flow

Clone this repository to your local machine

Download the data from Box

Install all necessary packages

Install Anaconda

Set up conda environments

Install R

Run all evaluations

Run CarDEC Notebooks

Run MNN R Scripts

Run Other Method Python Notebooks

Run CV Score Notebooks

Run Rscripts to Generate Final Plots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages