Annotation_dev #9

ahmed-said-jax · 2023-04-17T18:34:57Z

This commit allows the pipeline to annotate
the filtered_feature_bc_matrix.h5
file outputted by cellranger. It also allows
for optional RNA velocity analysis with velocyto.

This adds some files to assets (to be ignored for now). They will eventually be used for annotating genes and generating web summaries. More importantly, this commit infers the species from the reference genome path and gets the associated genes.gtf file for velocyto to use.

Now, a matrix is

Using gene annotation matrices in assets, the data-matrix is annotated and stored in an AnnData object. It also calculates doublet scores for the matrix. If the reference genome is unsupported, it will just not annotate gene types, but still calculate doublets. RNA-velocity support to be added soon.

Gene annotations

This commit allows the pipeline to annotate annotate the filtered_feature_bc_matrix.h5 file outputted by 'cellranger'. It also allows for optional RNA velocity analysis with 'velocyto'.

This `Nextflow` script will now take `cellranger count` outputs and generate `AnnData` and `Seurat` objects. They are annotated with various QC statistics as well as gene types. The pipeline also generates plots of the QC statistics, while generating an `HTML` document that summarizes the pipeline's outputs. This `HTML` summary is incomplete at the moment and likely to change a lot as the outputs are organized.

ahmed-said-jax · 2023-04-21T02:06:36Z

Commit 43cefcf has not been tested and is nonfunctional.

…backup.

ahmed-said-jax · 2023-05-03T19:52:53Z

src/create_gene_annotations.py

+
+    # The two currently supported species have different file types
+    # and formats, so they need to be handled differently
+    if ds == "hsapiens_gene_ensembl":


this branching behavior feels ugly to me but i don't have a better solution at the moment

This commit creates a nearly fully functional annotation pipeline. After `cellranger count` creates a `filtered_feature_bc_matrix`, `soupx` creates its own from the `cellranger` outputs. The `soupx` matrix should have ambient RNA filtered out. Both matrices are then annotated in parallel for various gene types and doublets, and plots are generated automatically, as well as a web summary including plots. The web summary is as of yet incomplete, as it does not contain a list of the genes annotated, nor does it create the correct directory tree. The flowchart is also incorrect at the moment.

bin/filter_ambient_rna.r

bin/annotate.py

ahmed-said-jax · 2023-05-12T05:24:48Z

modules/gen_info.nf

+    script:
+    """
+    gen_plots.py
+    gen_summary.py --summary_dir=${summary_dir} --pubdir=${launchDir / params.pubdir}


the pubdir here might need to be changed

The pre-analysis (annotation) pipeline now works as expected and produces an HTML summary with correct information. It also now has a table of annotated genes.

…SoupX

This commit updates the README to contain information about pre-analysis gene/cell annotation for single cell expression. Additionally, it updates `tools.csv` to contain the correct doublet detection algorithm. Finally, it updates `annotate.py` to make it case-insensitive for Ensembl gene IDs in the reference annotations, in case a user passes in their own files with lowercased gene IDs.

main.nf

bin/arg_utils.py

bin/extract_files.py

bin/filter_ambient_rna.r

wflynny · 2023-05-26T19:01:22Z

modules/ambient_rna.nf

+
+    script:
+    """
+    mimic_cellranger.py --soupx_dir=${tool}


Can the functionality of mimic_cellranger.py be accomplished via bash cp commands and changing how inputs are staged?

i think this is probably the simplest way to do it, but is this safe? in particular, the rm -r step scares me

bin/annotate.py

bin/gen_plots.py

assets/summaries/no_rna_velo/overview.csv

ahmed-said-jax · 2023-05-30T16:14:06Z

main.nf

@@ -14,15 +14,21 @@ nextflow.enable.dsl = 2

 params.pubdir = params.getOrDefault("pubdir", "pubdir")


is the getOrDefault thing necessary? can't we just do params.thing = 'value' for the same effect?

Suggested change

params.pubdir = params.getOrDefault("pubdir", "pubdir")

params.pubdir = "pubdir"

and i mean that for all the parameters, not just the pubdir

Simplified pipeline and make plotting more robust.

…etries in elion profile

Remove `**/*test*` from `.gitignore`

The container is now set to pull from the remote container registry.

ahmed-said-jax and others added 11 commits April 13, 2023 16:32

Add data-matrix annotating capabilities.

583930e

Now, a matrix is

add comments to bin/annotate.py

ec2f4f3

Merge pull request #8 from ahmed-said-jax:annotation_dev

490f2ac

Gene annotations

test commit to see where this goes

4d6cd0f

random change

c6ba266

removed random comments

5f7eb7c

a random comment to check vscode's source control

6fa3488

remove random comment, checking git version 2.40 with vscode

8b929a1

Add annotation capability and RNA velocity

2f46eb6

This commit allows the pipeline to annotate annotate the filtered_feature_bc_matrix.h5 file outputted by 'cellranger'. It also allows for optional RNA velocity analysis with 'velocyto'.

ahmed-said-jax requested a review from wflynny April 17, 2023 18:36

ahmed-said-jax added 6 commits April 17, 2023 16:14

Change a tuple to a dict for better readability

80899af

change formatting a little

3c62b4f

small formatting changes

b8c335c

Add ability to convert to Seurat and generate a web summary. Untested

b2b946f

Add ambient mRNA filtration using R-packageSoupX

43cefcf

ahmed-said-jax added 3 commits April 25, 2023 19:02

temporary non-functional incomplete commit

20dd8ad

still non-functional and not ready to be reviewed. pushing just as a …

73f3da6

…backup.

another commit to get files from local machine to HPC

dceef67

ahmed-said-jax commented May 3, 2023

View reviewed changes

ahmed-said-jax commented May 4, 2023

View reviewed changes

bin/filter_ambient_rna.r Outdated Show resolved Hide resolved

ahmed-said-jax and others added 5 commits May 3, 2023 22:20

Small changes to fix HTML summary + ambient RNA

ce5ac50

forgot to close a quote

73da43e

plots_dir unnecessary

a4c1bd5

plots_dir unnecessary

fe6ec4c

HTML summary now displays plots correctly

1c61cfd

ahmed-said-jax commented May 10, 2023

View reviewed changes

bin/annotate.py Show resolved Hide resolved

ahmed-said-jax commented May 12, 2023

View reviewed changes

ahmed-said-jax and others added 6 commits May 23, 2023 09:58

Fix bugs and complete HTML summary (version 1).

3ec6e82

The pre-analysis (annotation) pipeline now works as expected and produces an HTML summary with correct information. It also now has a table of annotated genes.

Change doublet detection tool and removed small-sample parameters in …

f6015bf

…SoupX

Fix README grammar errors

8cdbc92

Fix sentence structure in README

5a4beab

Update README.md to make reference genome directory usage clearer

2a18c77

wflynny requested changes May 30, 2023

View reviewed changes

ahmed-said-jax commented May 30, 2023

View reviewed changes

ahmed-said-jax and others added 8 commits May 31, 2023 16:52

Remove extract_files and mimic_cellranger

2685035

Simplified pipeline and make plotting more robust.

Remove the soupx arguments that are for tiny dataset and remove r…

480cf51

…etries in elion profile

Update .gitignore

4288d31

Remove `**/*test*` from `.gitignore`

Update ambient_rna.nf to use container on JAXReg

a91b58a

The container is now set to pull from the remote container registry.

Update seurat.nf to use remote containers

f20c6cb

Update seurat.nf to use remote container registry

716aede

Update annotate.nf to use remote container registry

b0fa605

Sort imports in annotate.py

7d9fac5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation_dev #9

Annotation_dev #9

ahmed-said-jax commented Apr 17, 2023 •

edited

Loading

ahmed-said-jax commented Apr 21, 2023

ahmed-said-jax May 3, 2023

ahmed-said-jax May 12, 2023

wflynny May 26, 2023

ahmed-said-jax May 31, 2023

ahmed-said-jax May 30, 2023

ahmed-said-jax May 31, 2023

		@@ -14,15 +14,21 @@ nextflow.enable.dsl = 2

		params.pubdir = params.getOrDefault("pubdir", "pubdir")

	params.pubdir = params.getOrDefault("pubdir", "pubdir")
	params.pubdir = "pubdir"

Annotation_dev #9

Are you sure you want to change the base?

Annotation_dev #9

Conversation

ahmed-said-jax commented Apr 17, 2023 • edited Loading

ahmed-said-jax commented Apr 21, 2023

ahmed-said-jax May 3, 2023

Choose a reason for hiding this comment

ahmed-said-jax May 12, 2023

Choose a reason for hiding this comment

wflynny May 26, 2023

Choose a reason for hiding this comment

ahmed-said-jax May 31, 2023

Choose a reason for hiding this comment

ahmed-said-jax May 30, 2023

Choose a reason for hiding this comment

ahmed-said-jax May 31, 2023

Choose a reason for hiding this comment

ahmed-said-jax commented Apr 17, 2023 •

edited

Loading