-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotation_dev #9
base: main
Are you sure you want to change the base?
Conversation
This adds some files to assets (to be ignored for now). They will eventually be used for annotating genes and generating web summaries. More importantly, this commit infers the species from the reference genome path and gets the associated genes.gtf file for velocyto to use.
Now, a matrix is
Using gene annotation matrices in assets, the data-matrix is annotated and stored in an AnnData object. It also calculates doublet scores for the matrix. If the reference genome is unsupported, it will just not annotate gene types, but still calculate doublets. RNA-velocity support to be added soon.
Gene annotations
This commit allows the pipeline to annotate annotate the filtered_feature_bc_matrix.h5 file outputted by 'cellranger'. It also allows for optional RNA velocity analysis with 'velocyto'.
This `Nextflow` script will now take `cellranger count` outputs and generate `AnnData` and `Seurat` objects. They are annotated with various QC statistics as well as gene types. The pipeline also generates plots of the QC statistics, while generating an `HTML` document that summarizes the pipeline's outputs. This `HTML` summary is incomplete at the moment and likely to change a lot as the outputs are organized.
Commit |
src/create_gene_annotations.py
Outdated
|
||
# The two currently supported species have different file types | ||
# and formats, so they need to be handled differently | ||
if ds == "hsapiens_gene_ensembl": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this branching behavior feels ugly to me but i don't have a better solution at the moment
This commit creates a nearly fully functional annotation pipeline. After `cellranger count` creates a `filtered_feature_bc_matrix`, `soupx` creates its own from the `cellranger` outputs. The `soupx` matrix should have ambient RNA filtered out. Both matrices are then annotated in parallel for various gene types and doublets, and plots are generated automatically, as well as a web summary including plots. The web summary is as of yet incomplete, as it does not contain a list of the genes annotated, nor does it create the correct directory tree. The flowchart is also incorrect at the moment.
modules/gen_info.nf
Outdated
script: | ||
""" | ||
gen_plots.py | ||
gen_summary.py --summary_dir=${summary_dir} --pubdir=${launchDir / params.pubdir} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pubdir here might need to be changed
The pre-analysis (annotation) pipeline now works as expected and produces an HTML summary with correct information. It also now has a table of annotated genes.
This commit updates the README to contain information about pre-analysis gene/cell annotation for single cell expression. Additionally, it updates `tools.csv` to contain the correct doublet detection algorithm. Finally, it updates `annotate.py` to make it case-insensitive for Ensembl gene IDs in the reference annotations, in case a user passes in their own files with lowercased gene IDs.
modules/ambient_rna.nf
Outdated
|
||
script: | ||
""" | ||
mimic_cellranger.py --soupx_dir=${tool} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the functionality of mimic_cellranger.py
be accomplished via bash cp
commands and changing how inputs are staged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is probably the simplest way to do it, but is this safe? in particular, the rm -r
step scares me
@@ -14,15 +14,21 @@ nextflow.enable.dsl = 2 | |||
|
|||
params.pubdir = params.getOrDefault("pubdir", "pubdir") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the getOrDefault
thing necessary? can't we just do params.thing = 'value'
for the same effect?
params.pubdir = params.getOrDefault("pubdir", "pubdir") | |
params.pubdir = "pubdir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i mean that for all the parameters, not just the pubdir
Simplified pipeline and make plotting more robust.
…etries in elion profile
Remove `**/*test*` from `.gitignore`
The container is now set to pull from the remote container registry.
This commit allows the pipeline to annotate
the filtered_feature_bc_matrix.h5
file outputted by
cellranger
. It also allowsfor optional RNA velocity analysis with
velocyto
.