Skip to content

This repository contains a pipeline for analyzing data from scATAC-seq.

Notifications You must be signed in to change notification settings

loosolab/Datenanalyse_2022_23

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datenanalyse_2022_23

This repository contains all the tools and methods developed specifically for the course “Applied data analysis in bioinformatics” from the masters program “Bioinformatik und Systembiologie” at the Justus-Liebig-University and the Technische Hochschule Mittelhessen in the wintern term 2022/2023.

The goal of this course is to develop a pipeline for the Max Planck Institute for Heart and Lung Research which takes data from CATLAS and performs distinct analyses mainly based on the chromatin accessibility1.

Furthermore, this pipeline is organized into two separate packages (WP1/WP2) due to the group distribution of the course. A short description of each package is given below:

  • WP1:
    • The first part of the pipeline contains functions for reading .bed files, plotting and computation of quality control parameters like e.g. mean/median of the fragment lengths or an interpretable score. Additionally, with the help of .gtf files it is possible to calculate the fragment distribution around TSS.
  • WP2:
    • The second part of the pipeline contains functions for calculating the feature overlap for each cell barcode to a given feature and visualize the calculated data with different plots.

Each package also contains a rich README explaining all features and their functionality and how to use them. The slides of the final presentation held on the 01.03.23 can be taken from presentation.pdf. To increase the understanding of the whole pipeline we developed a graphical representation, which can be seen below.

This representation in terms of functionality can be divided into the two packages as follows. The first starts on the left with the reading of the .bed files, followed by two interlocking gears, the quality control functions and parameters and the visualization of these. The result after these two steps is an AnnData object, which in turn is further used by the second package. Here, this object is then additionally filled with information from further .bed and .gtf files. After that, another process of two gears takes place, which is divided into a feature overlap calculation and a visualization step. The result of our pipeline is a rich AnnData object ready for even further analysis. More details are given in the corresponding subfolders.

About

This repository contains a pipeline for analyzing data from scATAC-seq.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published