diff --git a/README.md b/README.md index 8b15959b..6f711108 100644 --- a/README.md +++ b/README.md @@ -15,8 +15,8 @@ ## Introduction ** nf-core/airrflow ** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) -toolset. The input data can be (a) targeted amplicon bulk sequencing data of the V, D, J and C regions -of the B/T-cell receptor with multiplex PCR or 5' RACE protocol or (b) assembled reads (bulk or single cell). +toolset. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions +of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, or assembled reads (bulk or single cell). ![nf-core/airrflow overview](docs/images/airrflow_workflow_overview.png) @@ -26,14 +26,14 @@ On release, automated continuous integration tests run the pipeline on a full-si ## Pipeline summary -nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing. Several protocols are supported, please see the [usage documenation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. +nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data. Several protocols are supported, please see the [usage documenation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. ![nf-core/airrflow overview](docs/images/metro-map-airrflow.png) 1. QC and sequence assembly (bulk only) -- Raw read quality control, adapter trimming and clipping (`Fastp`) -- Filtering sequences by sequencing quality (`pRESTO FilterSeq`). +- Raw read quality control, adapter trimming and clipping (`Fastp`). +- Filtering sequences by base quality (`pRESTO FilterSeq`). - Mask amplicon primers (`pRESTO MaskPrimers`). - Pair read mates (`pRESTO PairSeq`). - For UMI-based sequencing: @@ -45,7 +45,7 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single 2. V(D)J annotation and filtering (bulk and single-cell) -- Assigning gene segment alleles with `IgBlast` using the IMGT database (`Change-O AssignGenes`). +- Assigning gene segments with `IgBlast` using the IMGT database (`Change-O AssignGenes`). - Annotate alignments in AIRR format (`Change-O MakeDB`) - Filter by alignment quality (locus matching v_call chain, min 200 informative positions, max 10% N nucleotides) - Filter productive sequences (`Change-O ParseDB split`) @@ -66,7 +66,7 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single 4. Clonal analysis (bulk and single-cell) -- Find Hamming distance threshold for clone definition (`SHazaM`, `EnchantR`). +- Find threshold for clone definition (`SHazaM`, `EnchantR`). - Create germlines and define clones, repertoire analysis (`Change-O`, `EnchantR`). - Build lineage trees (`SCOPer`, `IgphyML`, `EnchantR`). diff --git a/conf/modules.config b/conf/modules.config index fc21a51c..5e349293 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -89,6 +89,14 @@ process { ext.args = '--quiet' } + withName: 'MERGE_UMI' { + publishDir = [ + [ + enabled: false + ] + ] + } + // ----------------- // sequence assembly // ----------------- @@ -264,6 +272,14 @@ process { ] } + withName: 'UNZIP_DB' { + publishDir = [ + [ + enabled: false + ] + ] + } + withName: CHANGEO_CONVERTDB_FASTA_FROM_AIRR { publishDir = [ path: { "${params.outdir}/vdj_annotation/convert-db/${meta.id}" }, @@ -442,7 +458,7 @@ process { withName: PARSE_LOGS { publishDir = [ - path: { "${params.outdir}/parsed-logs" }, + path: { "${params.outdir}/parsed_logs" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] diff --git a/docs/output.md b/docs/output.md index 29567be9..ca017312 100644 --- a/docs/output.md +++ b/docs/output.md @@ -10,39 +10,48 @@ The directories listed below will be created in the results directory after the The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -TODO: update this to add/remove lines - -- [FastP](#fastp) - read quality control, adapter trimming and read clipping -- [pRESTO](#presto) - read pre-processing - - [Filter by sequence quality](#filter-by-sequence-quality) - filter sequences by quality - - [Mask primers](#mask-primers) - Masking primers - - [Pair mates](#pair-mates) - Pairing sequence mates. +- [QC and sequence assembly (bulk only)](#sequence-assembly) + - [FastP](#fastp) - read quality control, adapter trimming and read clipping. + - [Filter by sequence quality](#filter-by-sequence-quality) - filter sequences by base quality. + - [Mask primers](#mask-primers) - Mask amplicon primers. + - [Pair mates](#pair-mates) - Pair read mates. - [Cluster sets](#cluster-sets) - Cluster sequences according to similarity. - [Build consensus](#build-UMI-consensus) - Build consensus of sequences with the same UMI barcode. - [Re-pair mates](#re-pair-mates) - Re-pairing sequence mates. - [Assemble mates](#assemble-mates) - Assemble sequence mates. - [Remove duplicates](#remove-duplicates) - Remove and annotate read duplicates. - [Filter sequences for at least 2 representative](#filter-sequences-for-at-least-2-representative) Filter sequences that do not have at least 2 duplicates. -- [FastQC](#fastqc) - read quality control post-assembly -- [Change-O](#change-o) - Assign genes and clonotyping + - [FastQC](#fastqc) - read quality control post-assembly +- [VDJ annotation](#vdj-annotation) - Assign genes and clonotyping + - [Convert to fasta](#convert-input-to-fasta-optional) - [Assign genes with Igblast](#assign-genes-with-igblast) - [Make database from assigned genes](#make-database-from-assigned-genes) + - [Quality filter alignments](#quality-filter-alignments) - [Removal of non-productive sequences](#removal-of-non-productive-sequences) - - [Selection of IGH / TR sequences](#selection-of-IGH-/-TR-sequences) - - [Convert database to fasta](#convert-database-to-fasta) -- [Shazam](#shazam) - Genotyping and Clonal threshold - - [Genotyping and hamming distance threshold](#determining-hamming-distance-threshold) -- [Change-O define clones](#change-o-define-clones) - - [Define clones](#define-clones) - Defining clonal B-cell or T-cell groups - - [Reconstruct germlines](#reconstruct-germlines) - Reconstruct gene calls of germline sequences -- [Lineage reconstruction](#lineage-reconstruction) - Clonal lineage reconstruction. + - [Removal of sequences with junction length not multiple of 3](#removal-of-sequences-with-junction-length-not-multiple-of-3) + - [Annotate metadata](#annotate-metadata) +- [Bulk QC filtering](#bulk-qc-filtering) + - [Reconstruct germlines](#reconstruct-germlines) + - [Chimeric read filtering](#chimeric-read-filtering-optional) + - [Detect contamination](#detect-contamination-optional) + - [Collapse duplicates](#collapse-duplicates) +- [Single cell QC](#single-cell-qc) +- [Clonal analysis](#clonal-analysis) + - [Find clonal threshold](#find-clonal-threshold) + - [SCOPer define clones](#scoper-define-clones) - Defining clonal B-cell or T-cell groups + - [Dowser lineage reconstruction](#dowser-lineage-reconstruction) - Clonal lineage reconstruction. - [Repertoire analysis](#repertoire-analysis) - Repertoire analysis and comparison. +- [Report file size](#report-file-size) - Log parsing. - [Log parsing](#log-parsing) - Log parsing. -- [Databases](#databases) -- [MultiQC](#MultiQC) - MultiQC +- [Databases](#databases) - Downloaded databases. +- [MultiQC](#MultiQC) - MultiQC report. - [Pipeline information](#pipeline-information) - Pipeline information -## Fastp +## Sequence assembly + +> **NB:** If using the sans-UMI subworkflow by specifying `umi_length=0`, the presto directory ordering numbers will differ e.g., mate pair assembly results will be output to `presto/01-assemblepairs/` as this will be the first presto step. + +### Fastp
Output files @@ -57,10 +66,6 @@ TODO: update this to add/remove lines [fastp](https://doi.org/10.1093/bioinformatics/bty560) gives general quality metrics about your sequenced reads, as well as allows filtering reads by quality, trimming adapters and clipping reads at 5' or 3' ends. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [fastp documentation](https://github.com/OpenGene/fastp). -## presto - -> **NB:** If using the sans-UMI subworkflow by specifying `umi_length=0`, the presto directory ordering numbers will differ e.g., mate pair assembly results will be output to `presto/01-assemblepairs/` as this will be the first presto step. - ### Filter by sequence quality
@@ -187,7 +192,7 @@ Remove duplicates using [CollapseSeq](https://presto.readthedocs.io/en/stable/to Remove sequences which do not have 2 representative using [SplitSeq](https://presto.readthedocs.io/en/stable/tools/SplitSeq.html) from the pRESTO Immcantation toolset. -## FastQC +### FastQC
Output files @@ -209,9 +214,9 @@ Remove sequences which do not have 2 representative using [SplitSeq](https://pre > **NB:** Two sets of FastQC plots are displayed in the MultiQC report: first for the raw _untrimmed_ and unmated reads and secondly for the assembled and QC filtered reads (but before collapsing duplicates). They may contain adapter sequence and potentially regions with low quality. -## Change-O +## VDJ annotation -### Convert input to fasta, if needed +### Convert input to fasta (optional)
Output files. Optional. @@ -253,7 +258,7 @@ Assign genes with Igblast, using the IMGT database is performed by the [AssignGe IgBLAST's results are parsed and standardized with [MakeDB](https://changeo.readthedocs.io/en/stable/examples/igblast.html#processing-the-output-of-igblast) to follow the [AIRR Community standards](https://docs.airr-community.org/en/stable/datarep/rearrangements.html) for rearrangement data. -### Quality filter sequences +### Quality filter alignments
Output files @@ -290,7 +295,7 @@ Non-functional sequences identified with IgBLAST are removed with [ParseDb](http
-### Add metadata +### Annotate metadata
Output files @@ -301,7 +306,7 @@ Non-functional sequences identified with IgBLAST are removed with [ParseDb](http
-## Shazam +## Bulk QC filtering ### Reconstruct germlines @@ -317,7 +322,7 @@ Non-functional sequences identified with IgBLAST are removed with [ParseDb](http Reconstructing the germline sequences with the [CreateGermlines](https://changeo.readthedocs.io/en/stable/tools/CreateGermlines.html#creategermlines) Immcantation tool. -### Chimera filter +### Chimeric read filtering (optional)
Output files @@ -333,7 +338,7 @@ Reconstructing the germline sequences with the [CreateGermlines](https://changeo Mutations patterns in different window sizes are analyzed with functions from the Immcantation R package [SHazaM](https://shazam.readthedocs.io/en/stable/). -### Detect contamination +### Detect contamination (optional)
Output files. Optional. @@ -361,7 +366,7 @@ This folder is genereated when `detect_contamination` is set to `true`.
-### Single cell QC +## Single cell QC
Output files. @@ -374,12 +379,14 @@ This folder is genereated when `detect_contamination` is set to `true`.
-### Determining hamming distance threshold +## Clonal analysis + +### Find clonal threshold
Output files -- `clonal_analysis/find-threshold/` +- `clonal_analysis/find_threshold/` - `*log`: Log of the process that will be parsed to generate a report. - `all_reps_threshold-mean.tsv`: Mean of all hamming distance thresholds of the Junction regions as determined by Shazam. @@ -390,9 +397,7 @@ This folder is genereated when `detect_contamination` is set to `true`. Determining the hamming distance threshold of the junction regions for clonal determination using [Shazam](https://shazam.readthedocs.io) when `clonal_threshold` is set to `auto`. -## SCOPer define clones - -### Define clones +### SCOPer define clones
Output files @@ -416,9 +421,7 @@ A similar output folder `clonal_analysis/define_clones/all_reps_clone_report` is Assigning clones to the sequences obtained from IgBlast with the [scoper::hierarchicalClones](https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/) Immcantation tool. -# - -## Lineage reconstruction +### Dowser Lineage reconstruction
Output files @@ -432,7 +435,7 @@ Assigning clones to the sequences obtained from IgBlast with the [scoper::hierar Reconstructing clonal lineage with [IgPhyML](https://igphyml.readthedocs.io/en/stable/) and [dowser](https://dowser.readthedocs.io/en/stable/topics/getTrees/) from the Immcantation toolset. -## Repertoire comparison +## Repertoire analysis
Output files @@ -448,7 +451,7 @@ Reconstructing clonal lineage with [IgPhyML](https://igphyml.readthedocs.io/en/s Calculation of several repertoire characteristics (diversity, abundance, V gene usage) for comparison between subjects, time points and cell populations. An Rmarkdown report is generated with the [Alakazam R package](https://alakazam.readthedocs.io/en/stable/). -## Tracking number of reads +## Report file size
Output files @@ -476,6 +479,8 @@ Parsing the logs from the previous processes. Summary of the number of sequences Copy of the downloaded IMGT database by the process `fetch_databases`, used for the gene assignment step. +If databases are provided with `--imgtdb_base` and `--igblast_base` this folder will not be present. + ## MultiQC
diff --git a/docs/usage.md b/docs/usage.md index 194c9f7a..e966cae2 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -4,19 +4,20 @@ > _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ -## Introduction +# Introduction -The airrflow pipeline allows processing BCR and TCR targeted sequencing data from bulk and single-cell sequencing protocols. It performs V(D)J assignment, clonotyping, lineage reconsctruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework. +The nf-core/airrflow pipeline allows processing BCR and TCR targeted sequencing data from bulk and single-cell sequencing protocols. It performs sequence assembly, V(D)J assignment, clonotyping, lineage reconsctruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework. ![nf-core/airrflow overview](images/airrflow_workflow_overview.png) -## Running the pipeline +# Running the pipeline -The typical command for running the pipeline is as follows: +The typical command for running the pipeline departing from bulk raw fastq files is as follows: ```bash nextflow run nf-core/airrflow \ -profile docker \ +--mode fastq \ --input samplesheet.tsv \ --library_generation_method specific_pcr_umi \ --cprimers CPrimers.fasta \ @@ -27,6 +28,19 @@ nextflow run nf-core/airrflow \ --outdir ./results ``` +The typical command for running the pipeline departing from assembled reads (fasta) or single-cell data (AIRR) is as follows: + +``` +nextflow run nf-core/airrflow \ +-profile docker \ +--input input_samplesheet.tsv \ +--mode assembled \ +--outdir results \ +--reassign --productive_only --remove_chimeric \ +--collapseby filename \ +--cloneby subject_id +``` + For more information about the parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters). The command above will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -39,25 +53,9 @@ work # Directory containing the nextflow working files # Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` -## Input metadata +# Input samplesheet -### Supported AIRR fields - -nf-core/airrflow offers full support for the [AIRR standards 1.4](https://docs.airr-community.org/en/stable/datarep/metadata.html) metadata annotation. The minimum metadata fields that are needed by the pipeline are listed in the table below. Other non-mandatory AIRR fields can be provided in the input samplesheet, which will be available for reporting and introducing comparisons among repertoires. - -| AIRR field | Type | Parameter Name | Description | -| ------------------------- | ------------------ | ----------------------------- | ----------------------------------------------------- | -| sample_id | Samplesheet column | | Sample ID assigned by submitter, unique within study | -| subject_id | Samplesheet column | | Subject ID assigned by submitter, unique within study | -| species | Samplesheet column | | Subject species | -| tissue | Samplesheet column | | Sample tissue | -| pcr_target_locus | Samplesheet column | | Designation of the target locus (IG or TR) | -| sex | Samplesheet column | | Subject sex | -| age | Samplesheet column | | Subject age | -| biomaterial_provider | Samplesheet column | | Name of sample biomaterial provider | -| library_generation_method | Parameter | `--library_generation_method` | Generic type of library generation | - -### Fastq input samplesheet (bulk) +## Fastq input samplesheet (bulk) The required input file for processing raw BCR or TCR bulk targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename_R1`, `filename_R2`, `subject_id`, `species`, `tissue`, `pcr_target_locus`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. An example samplesheet is: @@ -88,7 +86,7 @@ Other optional columns can be added. These columns will be available when buildi The metadata specified in the input file will then be automatically annotated in a column with the same header in the tables generated by the pipeline. -### Assembled input samplesheet (bulk or single-cell) +## Assembled input samplesheet (bulk or single-cell) The required input file for processing raw BCR or TCR bulk targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename`, `subject_id`, `species`, `tissue`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. @@ -100,7 +98,26 @@ An example samplesheet is | sc5p_v2_mm_c57bl6_splenocyte_1k_b_airr_rearrangement.tsv | mouse | mouse_x | sc5p_v2_mm_c57bl6_splenocyte_1k_b | splenocyte | NA | NA | 10x Genomics | ig | TRUE | | bulk-Laserson-2014.fasta | human | PGP1 | PGP1 | PBMC | male | NA | Laserson-2014 | ig | FALSE | -## Supported library generation methods (protocols) +## Supported AIRR metadata fields + +nf-core/airrflow offers full support for the [AIRR standards 1.4](https://docs.airr-community.org/en/stable/datarep/metadata.html) metadata annotation. The minimum metadata fields that are needed by the pipeline are listed in the table below. Other non-mandatory AIRR fields can be provided in the input samplesheet, which will be available for reporting and introducing comparisons among repertoires. + +| AIRR field | Type | Parameter Name | Description | +| ------------------------- | ------------------ | ----------------------------- | ----------------------------------------------------- | +| sample_id | Samplesheet column | | Sample ID assigned by submitter, unique within study | +| subject_id | Samplesheet column | | Subject ID assigned by submitter, unique within study | +| species | Samplesheet column | | Subject species | +| tissue | Samplesheet column | | Sample tissue | +| pcr_target_locus | Samplesheet column | | Designation of the target locus (IG or TR) | +| sex | Samplesheet column | | Subject sex | +| age | Samplesheet column | | Subject age | +| biomaterial_provider | Samplesheet column | | Name of sample biomaterial provider | +| library_generation_method | Parameter | `--library_generation_method` | Generic type of library generation | + +# Supported bulk library generation methods (protocols) + +When processing bulk sequencing data departing from raw `fastq` reads, several sequencing protocols are supported which can be provided with the parameter `--library_generation_method`. +The following table matches the library generation methods as described in the [AIRR metadata annotation guidelines](https://docs.airr-community.org/en/stable/miairr/metadata_guidelines.html#library-generation-method) to the value that can be provided to the `--library_generation_method` parameter. | Library generation methods (AIRR) | Description | Name in pipeline | Commercial protocols | | --------------------------------- | ------------------------------------------------------------------------------------------ | ---------------- | ----------------------------------------- | @@ -115,13 +132,13 @@ An example samplesheet is | RT(specific+UMI)+TS+PCR | 5’-RACE PCR using transcript- specific primers containing UMIs | Not supported | | | RT(specific)+TS | RT-based generation of dsDNA without subsequent PCR. This is used by RNA-seq kits. | Not supported | | -### Multiplex specific PCR (with or without UMI) +## Multiplex specific PCR (with or without UMI) -This sequencing type requires setting `--library_generation_method specific_pcr_umi` if a UMI barcode was used, or `--library_generation_method specific_pcr` if no UMI barcodes were used (sans-umi). If the option without UMI barcodes is selected, the UMI length will be set automatically to 0. +This sequencing type requires setting `--library_generation_method specific_pcr_umi` if UMI barcodes were used, or `--library_generation_method specific_pcr` if no UMI barcodes were used (sans-umi). If the option without UMI barcodes is selected, the UMI length will be set automatically to 0. It is required to provide the sequences for the V-region primers as well as the C-region primers used in the specific PCR amplification. Some examples of UMI and barcode configurations are provided. Depending on the position of the C-region primer, V-region primers and UMI barcodes, there are several possibilities detailed in the following subsections. -#### R1 read contains C primer (and UMI barcode) +### R1 read contains C primer (and UMI barcode) The `--cprimer_position` and `--umi_position` (if UMIs are used) parameters need to be set to R1 (this is the default). If there are extra bases between the UMI barcode and C primer, specify the number of bases with the `--cprimer_start` parameter (default zero). Set `--cprimer_position R1` (this is the default). @@ -154,7 +171,7 @@ nextflow run nf-core/airrflow -profile docker \ --outdir ./results ``` -#### R1 read contains V primer (and UMI barcode) +### R1 read contains V primer (and UMI barcode) The `--umi_position` parameter needs to be set to R1 (if UMIs are used), and `--cprimer_position` to `R2`. If there are extra bases between the UMI barcode and V primer, specify the number of bases with the `--vprimer_start` parameter (default zero). @@ -187,7 +204,7 @@ nextflow run nf-core/airrflow -profile docker \ --outdir results ``` -#### R2 read contains C primer (and UMI barcode) +### R2 read contains C primer (and UMI barcode) The `--umi_position` and `--cprimer_position` parameters need to be set to R2. If there are extra bases between the UMI barcode and C primer, specify the number of bases with the `--cprimer_start` parameter (default zero). @@ -207,7 +224,7 @@ nextflow run nf-core/airrflow -profile docker \ --outdir ./results ``` -#### UMI barcode is provided in the index file +### UMI barcode is provided in the index file If the UMI barcodes are provided in an additional index file, please provide it in the column `filename_I1` in the input samplesheet and additionally set the `--index_file` parameter. Specify the UMI barcode length with the `--umi_length` parameter. You can optionally specify the UMI start position in the index sequence with the `--umi_start` parameter (the default is 0). @@ -226,11 +243,11 @@ nextflow run nf-core/airrflow -profile docker \ --outdir ./results ``` -### dT-Oligo RT and 5'RACE PCR +## dT-Oligo RT and 5'RACE PCR This sequencing type requires setting `--library_generation_method race_5p_umi` or `--library_generation_method race_5p_umi` if UMIs are not being employed, and providing sequences for the C-region primers as well as the linker or template switch oligo sequences with the parameter `--race_linker`. Examples are provided below to run airrflow to process amplicons generated with the TAKARA 5'RACE SMARTer Human BCR and TCR protocols (library structure schema shown below). -#### Takara Bio SMARTer Human BCR +### Takara Bio SMARTer Human BCR The read configuration when sequenicng with the TAKARA Bio SMARTer Human BCR protocol is the following: @@ -249,7 +266,7 @@ nextflow run nf-core/airrflow -profile docker \ --outdir ./results ``` -#### Takara Bio SMARTer Human TCR v2 +### Takara Bio SMARTer Human TCR v2 The read configuration when sequencing with the Takara Bio SMARTer Human TCR v2 protocol is the following: @@ -288,7 +305,7 @@ GTTTGGTATGAGGCTGACTTCN CATCTGCATCAAGTTGTTTATC ``` -## UMI barcode handling +# UMI barcode handling Unique Molecular Identifiers (UMIs) enable the quantification of BCR or TCR abundance in the original sample by allowing to distinguish PCR duplicates from original sample duplicates. The UMI indices are random nucleotide sequences of a pre-determined length that are added to the sequencing libraries before any PCR amplification steps, for example as part of the primer sequences. @@ -301,27 +318,6 @@ The UMI barcodes are typically read from an index file but sometimes can be prov - No UMIs in R1 or R2 reads: if no UMIs are present in the samples, specify `--umi_length 0` to use the sans-UMI subworkflow. -## Experimental features - -We are working on a new subworkflow (`reveal`) to analyze bulk and single cell processed reads. The workflow takes as input assembled reads (`.fasta`) or repertoire `.tsv` (example: 10x `airr.tsv`) files and runs quality controls, and generates reports of clonal analysis and lineage trees. The subworkflow (`--subworkflow reveal`) is under active development, and therefore it is not recommended to use in production. Suggestions and feedback are welcome. - -This subworkflow can be tested with this command: - -```console - nextflow run nf-core/airrflow -profile docker,test_reveal -``` - -An example command to run an analysis: - -``` -nextflow run nf-core/airrflow --subworkflow reveal \ ---input input_samplesheet.tsv \ ---outdir results \ ---reassign --productive_only --remove_chimeric \ ---collapseby filename \ ---cloneby subject_id -``` - ## Updating the pipeline When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: