-
CAFU is a Galaxy-based bioinformatics framework for comprehensive assembly and functional annotation of unmapped RNA-seq data from single- and mixed-species samples which integrates plenty of existing NGS analytical tools and our developed programs, and features an easy-to-use interface to manage, manipulate and most importantly, explore large-scale unmapped reads.
-
Besides the common process of reads cleansing, reads mapping, unmapped reads generation and novel transcription assembly, CAFU optionally offers the multiple-level evidence analysis of assembled transcripts, the sequence and expression characteristics of assembled transcripts, and the functional exploration of assembled transcripts through gene co-expression analysis and genome-wide association analysis.
-
Taking advantages of machine learning (ML) technologies, CAFU also effectively addresses the challenge of classifying species-specific transcripts assembled using unmapped reads from mixed-species samples.
-
The CAFU project is hosted on GitHub(https://github.com/cma2015/CAFU) and can be accessed from http://omicstudio.cloud:4001/. The CAFU Docker image is available at https://hub.docker.com/r/malab/cafu.
- Extraction of unmapped reads
- De novo transcript assembly of unmapped reads
- Evidence support of assembled transcripts
- Species assignment of assembled transcripts
- Sequence characterization of assembled transcripts
- Expression profiles of assembled transcripts
- Function annotation of assembled transcripts
- Tutorials for CAFU: https://github.com/cma2015/CAFU/blob/master/Tutorials/User_manual.md
- Test datasets referred in the tutorials for CAFU: https://github.com/cma2015/CAFU/tree/master/Test_data
- In the function Assemble Unmapped Reads, a parameter "Memory" was added for setting the maximum memory to be used by Triniry (1G in default).
- To run the function Species Assignment of Transcripts, users can now use pre-trained or self-trained models. Currently, a pre-trained model was provided by training 20,502 and 137,052 mRNAs annotated in the reference genome of stripe rust pathogen Puccinia striiformis f. sp. tritici (PST-78 v1) and Chinese Spring wheat (IWGSC RefSeq v1.0), respectively.
- The user tutorial was updated to highlight the importance of CPUs, Memory and Swap settings for running CAFU docker.
- A function Remove Contamination was added to remove potential contamination sequences using Deconseq (Schmieder et al., 2011).
- A function Remove Batch Effect was added to remove batch effects using an R package sva (Leek et al., 2012).
- CAFU source codes, web server and Docker image were released for the first time.
- For any bugs/issues, please feel free to leave a message at Github issues. We will try our best to deal with all issues as soon as possible.
Siyuan Chen, Chengzhi Ren, Jingjing Zhai, Jiantao Yu, Xuyang Zhao, Zelong Li, Ting Zhang, Wenlong Ma, Zhaoxue Han, Chuang Ma. CAFU: a Galaxy framework for exploring unmapped RNA-Seq data. Briefings in Bioinformatics, 2020;21:676-686.