splicing

Splicing done with MAJIQ tool still a work in progress The purpose of this pipeline is to be able to run MAJIQ using Snakemake. The aim is to make MAJIQ easier to run for non-bioinformaticians and it produces additional parsing and annotation to the MAJIQ output.

BEWARE

I am actively developing how this pipeline works - for now it runs in 4 steps

build
psi
annotate
transcriptome_assembly

After going through several different installation methods for majiq, I found that the easiest/most reliable seems to be installing majiq in a conda environment named "majiq"

therefore, this pipeline assumes that you have a named conda environment called "majiq", which has majiq installed in it. As of Mar 02 2023 - this pipeline is using majiq 2.4.dev4+gdd43612

Transcriptome assembly will merge the bams and then 2 different transcriptome assembly tools, scallop2, and stringtie2 - and then extract the novel exons that match to significant junctions called by MAJIQ

Buyer beware, mileage may vary.

Feel free to email/pop up any issues on the repo

Needed files

Aligned, sorted, and indexed BAM files of RNA-seq. You will need .bam and .bai files for all your samples.
GFF3 and GTF of your species of interest
A formatted sample sheet, see examples and explanation below

Get started

Necessary R packages

If you're just going to run the build + psi workflows you will need

data.table tidyverse optparse glue

Alternatively, there is an environment provided with the necessary packages

After you've installed the necessary software, snakemake, R libraries, MAJIQ itself, you will need to do 3 things to get this pipeline going

Set up a sample sheet
Edit the config/comparisons.yaml
Edit the config/config.yaml

Making a sample sheet

See example data for the formating of sample sheets. The following columns are mandatory: sample_name, group, exclude_sample_downstream_analysis

exclude_sample_downstream_analysis should be present, if you want to exclude a sample it should be a 1, otherwise you can leave it blank

After these 3 critical columns, you can include as many additional columns as you like

Here is an example sample sheet where we have a het, hom, and wt of a mutant

sample_name	group	litter
M323K_HET_1	het	one
M323K_HET_2	het	two
M323K_HET_3	het	three
M323K_HET_4	het	four
M323K_HOM_1	hom	one
M323K_HOM_2	hom	two
M323K_HOM_3	hom	three
M323K_HOM_4	hom	four
M323K_HOM_5	hom	five
M323K_WT_1	wt	one
M323K_WT_2	wt	two
M323K_WT_3	wt	three

My bams are named like this:

M323K_HET_1_unique_rg_fixed.bam

with all bams sharing the _unique_rg_fixed suffix, but I don't include that in the sample_name.

I have three groups which I put in the group column, and then I don't have any reason to exclude any of the samples so I leave that blank as well.

Please use syntactic names for sample_name and group (no spaces, don't start with a number, use underscores and not hyphens) I'm not totally sure if that leads to errors, but I would guess it will.

After that, I've included a column saying which litter the mice came from, but I could include as many additional columns as I like.

PLEASE USE SYNATIC NAMES

That means NO hyphens and NO periods.

M323K_HOM_2 - GOOD M323K.HOM.2 - BAD

sample_name	group	exclude_sample_downstream_analysis	litter
M323K_HET_1	het		1.2
M323K_HET_2	het		two_2

Setting up your comparisons

To compare groups, we need to go int the config/comparisons.yaml and edit it

Here's an example from the sample sheet above:

knockdownexperiment:
  column_name:
    - group
  wt:
    - wt
  hom:
    - hom
controlVersusHets:
  column_name:
    - group
  wt:
    - wt
  het:
    - het
litterComparison:
  column_name:
    - litter
  firstLitters:
    - one
    - two
  secondLitters:
    - three
    - four
    - five

Make sure there is a space between the "-" and the value when you're creating the YAML or it won't be a properly formatted YAML list and the pipeline won't work.

Making the config

Final outputs

Underneath the folder in

majiq_top_level: /SAN/vyplab/alb_projects/data/linked_bams_f210i_brain/majiq/

majiq
├── builder
│   ├── wt_sample1.majiq
│   ├── wt_sample1.sj
│   ├── wt_sample2.majiq
│   ├── wt_sample2.sj
│   ├── mut_sample1.majiq
│   ├── mut_sample1.sj
│   ├── mut_sample2.majiq
│   ├── mut_sample2.sj
│   ├── majiq.log
│   └── splicegraph.sql
├── delta_psi
│   ├── wt_mut.deltapsi.tsv
│   ├── wt_mut.deltapsi.voila
│   └── deltapsi_majiq.log
├── delta_psi_voila_tsv
│   ├── wt_mut.junctions.bed
│   ├── wt_mut.csv
│   ├── wt_mut.gff3
│   ├── wt_mut_parsed_psi.tsv
│   └── wt_mut.psi.tsv
├── run_name_majiqConfig.tsv
├── psi_single
│   ├── wt_sample1.tsv
│   ├── wt_sample1.voila
│   ├── wt_sample2.tsv
│   ├── wt_sample2.voila
│   ├── mut_sample1.tsv
│   ├── mut_sample1.voila
│   ├── mut_sample2.tsv
│   ├── mut_sample2.voila
├── psi_voila_tsv_singlehis
│   ├── wt_sample1.tsv
│   ├── wt_sample1.voila
│   ├── wt_sample2.tsv
│   ├── wt_sample2.voila
│   ├── mut_sample1.tsv
│   ├── mut_sample1.voila
│   ├── mut_sample2.tsv
│   ├── mut_sample2.voila
└── psi
    ├── wt.psi.tsv
    ├── wt.psi.voila
    ├── mut.psi.tsv
    ├── mut.psi.voila
    └── psi_majiq.log

Submitting on SGE

Build step source submit.sh build run_name
PSI step source submit.sh psi run_name
annotate step source submit.sh annotate run_name with whatever run name you'd like

Submitting on Slurm

Build step source submit_slurm.sh build run_name
PSI step source submit_slurm.sh psi run_name with whatever run name you'd like

Running without a cluster

If you don't have a cluster, you can run straight with snakemake snakemake -s workflows/build.smk snakemake -s workflows/psi.smk snakemake -s workflows/annotate.smk

Annotation of splicing events

Annotation is done with a function grabbed directly from source code here: https://github.com/dzhang32/dasper/

Please cite Dasper, Snakemake, and of course MAJIQ if you use this pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 702 Commits
baltica_wf_for_ucl_cluster		baltica_wf_for_ucl_cluster
config		config
envs		envs
example_data		example_data
rules		rules
scripts		scripts
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
cluster_qsub.sh		cluster_qsub.sh
submit.sh		submit.sh
submit_slurm.sh		submit_slurm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

splicing

Needed files

Get started

Necessary R packages

Making a sample sheet

Setting up your comparisons

Making the config

Final outputs

Submitting on SGE

Submitting on Slurm

Running without a cluster

Annotation of splicing events

About

Releases

Packages

Contributors 2

Languages

frattalab/splicing

Folders and files

Latest commit

History

Repository files navigation

splicing

Needed files

Get started

Necessary R packages

Making a sample sheet

Setting up your comparisons

Making the config

Final outputs

Submitting on SGE

Submitting on Slurm

Running without a cluster

Annotation of splicing events

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages