Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #220 and docs #221

Merged
merged 7 commits into from
Jan 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,14 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
3. QC filtering (bulk and single-cell)

- Bulk sequencing filtering:
- Remove chimeric sequences (optional) (`EnchantR`)
- Remove chimeric sequences (optional) (`SHazaM`, `EnchantR`)
- Detect cross-contamination (optional) (`EnchantR`)
- Collapse duplicates (`EnchantR`)
- Collapse duplicates (`Alakazam`, `EnchantR`)
- Single-cell QC filtering (`EnchantR`)
- TODO: explain exactly what is done.
- Removes cells without heavy chains.
- Remove cells with multiple heavy chains.
- Remove sequences in different samples that share the same `cell_id` and nucleotide sequence.
- Modifies `cell_id`s to ensure they are unique in the project.

4. Clonal analysis (bulk and single-cell)

Expand Down
8 changes: 4 additions & 4 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -330,31 +330,31 @@ process {

withName: CHANGEO_CREATEGERMLINES {
publishDir = [
path: { "${params.outdir}/bulk-qc-filtering/01-create-germlines/${meta.id}" },
path: { "${params.outdir}/qc-filtering/bulk-qc-filtering/01-create-germlines/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: REMOVE_CHIMERIC {
publishDir = [
path: { "${params.outdir}/bulk-qc-filtering/02-chimera-filter/${meta.id}" },
path: { "${params.outdir}/qc-filtering/bulk-qc-filtering/02-chimera-filter/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: DETECT_CONTAMINATION {
publishDir = [
path: { "${params.outdir}/bulk-qc-filtering/03-detect_contamination" },
path: { "${params.outdir}/qc-filtering/bulk-qc-filtering/03-detect_contamination" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: COLLAPSE_DUPLICATES {
publishDir = [
path: { "${params.outdir}/bulk-qc-filtering/04-collapse-duplicates/${meta.id}" },
path: { "${params.outdir}/qc-filtering/bulk-qc-filtering/04-collapse-duplicates/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand Down
83 changes: 47 additions & 36 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ IgBLAST's results are parsed and standardized with [MakeDB](https://changeo.read

</details>

A table is generated that retains sequences with concordant locus in the `v_call` and `locus` fields, with a `sequence_alignment` with a maximum of 10% of Ns and a length of at least 200 informative nucleotides (not `-`, `.` or `N`).
A table is generated that retains sequences with concordant locus in the `v_call` and `locus` fields, with a `sequence_alignment` with a maximum of 10% of Ns and a length of at least 200 informative nucleotides (not `-`, `.` or `N`).

### Removal of non-productive sequences

Expand Down Expand Up @@ -308,10 +308,10 @@ Non-functional sequences identified with IgBLAST are removed with [ParseDb](http
<details markdown="1">
<summary>Output files</summary>

- `bulk-qc-filtering/01-create-germlines/<sampleID>`
- `qc-filtering/bulk-qc-filtering/01-create-germlines/<sampleID>`
- `*log.txt`: Log of the process that will be parsed to generate a report.
- `*germ-pass.tsv`: Rearrangement table in AIRR-C format with an additional
field with the reconstructed germline sequence for each sequence.
field with the reconstructed germline sequence for each sequence.

</details>

Expand All @@ -322,10 +322,10 @@ Reconstructing the germline sequences with the [CreateGermlines](https://changeo
<details markdown="1">
<summary>Output files</summary>

- `bulk-qc-filtering/02-chimera-filter/<sampleID>`
- `qc-filtering/bulk-qc-filtering/02-chimera-filter/<sampleID>`
- `*log.txt`: Log of the process that will be parsed to generate a report.
- `*chimera-pass.tsv`: Rearrangement table in AIRR-C format sequences that
passed the chimera removal filter.
passed the chimera removal filter.
- `<sampleID>_chimera_report`: Report with plots showing the mutation patterns

</details>
Expand All @@ -338,10 +338,10 @@ the Immcantation R package [SHazaM](https://shazam.readthedocs.io/en/stable/).
<details markdown="1">
<summary>Output files. Optional. </summary>

- `bulk-qc-filtering/03-detect_contamination`
- `qc-filtering/bulk-qc-filtering/03-detect_contamination`
- `*log.txt`: Log of the process that will be parsed to generate a report.
- `*cont-flag.tsv`: Rearrangement table in AIRR-C format with sequences that
passed the chimera removal filter.
passed the chimera removal filter.
- `all_reps_cont_report`: Report.

</details>
Expand All @@ -353,11 +353,11 @@ This folder is genereated when `detect_contamination` is set to `true`.
<details markdown="1">
<summary>Output files. </summary>

- `bulk-qc-filtering/04-collapse-duplicates/<sampleID>`
- `qc-filtering/bulk-qc-filtering/04-collapse-duplicates/<sampleID>`
- `*log.txt`: Log of the process that will be parsed to generate a report.
- `*collapse_report/`: Report.
- `repertoires/*collapse-pass.tsv`: Rearrangement table in AIRR-C format with duplicated
sequences removed.
sequences removed.

</details>

Expand All @@ -370,7 +370,7 @@ This folder is genereated when `detect_contamination` is set to `true`.
- `*log.txt`: Log of the process that will be parsed to generate a report.
- `*all_reps_scqc_report/`: Report.
- `*scqc-pass.tsv`: Rearrangement table in AIRR-C format with sequences that
passed the quality filtering.
passed the quality filtering.

</details>

Expand All @@ -382,73 +382,84 @@ This folder is genereated when `detect_contamination` is set to `true`.
- `clonal_analysis/find-threshold/`
- `*log`: Log of the process that will be parsed to generate a report.
- `all_reps_threshold-mean.tsv`: Mean of all hamming distance thresholds of the
Junction regions as determined by Shazam.
Junction regions as determined by Shazam.
- `all_reps_threshold-summary.tsv`: Thresholds for each group of `--cloneby` samples.
- `all_reps_dist_report`: Report

</details>

Determining the hamming distance threshold of the junction regions for clonal determination using [Shazam](https://shazam.readthedocs.io) when `clonal_threshold` is set to `auto`.

## TODO updata scoper: Change-O define clones
## SCOPer define clones

### Define clones

<details markdown="1">
<summary>Output files</summary>

- `changeo/06-define_clones/<subjectID>`
- `tab`: Table in AIRR format containing the assigned gene information and an additional field with the clone id.

</details>

Assigning clones to the sequences obtained from IgBlast with the [DefineClones](https://changeo.readthedocs.io/en/version-0.4.5/tools/DefineClones.html?highlight=DefineClones) Immcantation tool.

### Reconstruct germlines

<details markdown="1">
<summary>Output files</summary>
- `clonal_analysis/define_clones/<subjectID>`
- `*log`: Log of the process that will be parsed to generate a report.
- `repertoires/<sampleID>_clone-pass.tsv`: Rearrangement tables in AIRR-C format with sequences that
passed the clonal assignment step. The field `clone_id` contains the clonal clusters identifiers.
- `tables/`: Table in AIRR format containing the assigned gene information and an additional field with the clone id.
- `clonal_abundance.tsv`
- `clonal_diversity.tsv`
- `clone_sizes_table.tsv`
- `num_clones_table_nosingle.tsv`
- `num_clones_table.tsv`
- `ggplots/`: Diversity and abundance plots as `ggplot` objects.
- `figures/`: Clone size, diversity and abundance `png` plots.

- `changeo/07-create_germlines/<subjectID>`
- `tab`: Table in AIRR format contaning the assigned gene information and an additional field with the germline reconstructed gene calls.
A similar output folder `clonal_analysis/define_clones/all_reps_clone_report` is generated for all data.

</details>

Reconstructing the germline sequences with the [CreateGermlines](https://changeo.readthedocs.io/en/version-0.4.5/tools/CreateGermlines.html#creategermlines) Immcantation tool.
Assigning clones to the sequences obtained from IgBlast with the [scoper::hierarchicalClones](https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/) Immcantation tool.

#

## Lineage reconstruction

<details markdown="1">
<summary>Output files</summary>

- `lineage_reconstruction/`
- `tab`
- `Clones_table_patient.tsv`: contains a summary of the clones found for the patient, and the number of unique and total sequences identified in each clone.
- `Clones_table_patient_filtered_between_3_and_1000.tsv`: contains a summary of the clones found for the patient, and the number of unique and total sequences identified in each clone, filtered by clones of size between 3 and 1000, for which the lineages were reconstructed and the trees plotted.
- `xxx_germ-pass.tsv`: AIRR format table with all the sequences from a patient after the germline annotation step.
- `Clone_tree_plots`: Contains a rooted graphical representation of each of the clones, saved in pdf format.
- `Graphml_trees`: All lineage trees for the patient exported in a GraphML format: `All_graphs_patient.graphml`.
- `clonal_analysis/dowser_lineages/`
- `<sampleID>*log`: Log of the process that will be parsed to generate a report.
- `<sample1ID>_dowser_report`: Report

</details>

Reconstructing clonal linage with the [Alakazam R package](https://alakazam.readthedocs.io/en/stable/) from the Immcantation toolset.
Reconstructing clonal lineage with [IgPhyML](https://igphyml.readthedocs.io/en/stable/) and
[dowser](https://dowser.readthedocs.io/en/stable/topics/getTrees/) from the Immcantation toolset.

## Repertoire comparison

<details markdown="1">
<summary>Output files</summary>

- `repertoire_comparison/`
- `repertoire_analysis/repertoire_comparison/`
- `all_data.tsv`: AIRR format table containing the processed sequence information for all subjects.
- `Abundance`: contains clonal abundance calculation plots and tables.
- `Diversity`: contains diversity calculation plots and tables.
- `V_family`: contains V gene and family distribution calculation plots and tables.
- `Bcellmagic_report.html`: Contains the repertoire comparison results in an html report form: Abundance, Diversity, V gene usage tables and plots. Comparison between treatments and subjects.
- `Airrflow_report.html`: Contains the repertoire comparison results in an html report form: Abundance, Diversity, V gene usage tables and plots. Comparison between treatments and subjects.

</details>

Calculation of several repertoire characteristics (diversity, abundance, V gene usage) for comparison between subjects, time points and cell populations. An Rmarkdown report is generated with the [Alakazam R package](https://alakazam.readthedocs.io/en/stable/).

## Tracking number of reads

<details markdown="1">
<summary>Output files</summary>

- `report_file_size/file_size_report`: Report summarizing the number of sequences after the most important pipeline steps.
- `tables/*tsv`: Tables with the number of sequences at each processing step.

</details>

Parsing the logs from the previous processes. Summary of the number of sequences left after each of the most important pipeline steps.

## Log parsing

<details markdown="1">
Expand Down