Releases: marbl/canu
Canu v2.2
These are release notes for Canu version 2.2, which was released on August 26th, 2021. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
- Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Research. (2020).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
Note that the installation directory has changed compared to previous releases.
To install from a binary distribution (recommended):
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.<OX>-amd64.tar.xz --output canu-2.2.<OS>.tar.xz
tar -xJf canu-2.2.*.tar.xz
replacing <OX>
with Darwin or Linux, depending on your platform. Confirm the MD5 matches the expected value.
6bd937d31bb9f5f46bf0f9839889c00f canu-2.2.Darwin.tar.xz
63219165fc45b3dbbeb73ed920a23db5 canu-2.2.Linux.tar.xz
For recent versions of OS X (10.15+) you may an the error similar to: "sqStoreCreate" cannot be opened because the developer cannot be verified
. If this happens you can remove the quarantine flags from Canu
xattr -d com.apple.quarantine ./canu-2.2/bin/*
xattr -d com.apple.quarantine ./canu-2.2/lib/*
Canu will be installed at canu-2.2/bin/canu.
To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.2.tar.gz instead):
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.tar.xz --output canu-2.2.tar.xz
tar -xJf canu-2.2.tar.xz
cd canu-2.2/src
make -j 8
cd ..
Canu will be installed at canu-2.2/build/bin/canu.
Changes
Canu v2.2 IS (expected to be) compatible with assemblies started with Canu v2.1 (and v2.1.1) but NOT with any earlier version. However, we DO NOT recommend mixing versions.
- Tweaks to Overlap Error Adjustment to identify real differences near heterozygous alleles, to ignore differences near read ends, and others, mostly for HiFi data. 1ac9dc3 through cb94432
- Tweaks to Overlap Based Trimming to use only evidence overlaps that have different spans; that is, overlaps that do not pile-up on themselves. e540977
- Read Correction:
- Decrease corErrorRate from 0.50 to 0.30 for Nanopore and from 0.30 to 0.25 for PacBio. For Nanopore data, this results in around a 2/3 reduction in 'falconsense' time. See https://canu.readthedocs.io/en/latest/parameter-reference.html#corerrorrate for details. 741911c
- Pass mhap output (*.mhap files) directly to mhapConvert (.ovb files) using a named pipe, instead of a large intermediate file. Option mhapPipe can be used to switch back to using intermediate files. 4fada27
- Do not convert or load short overlaps into the overlap store during correction. d6b7a1f and b982642
- Pass global filter coverage to generateCorrectionLayouts. When corOutCoverage is changed from the default 40x, the number of reads that can be used to correct another read changes correspondingly. e192966 and 07c0481
- Trim low-quality ends from read-to-template alignments before using them for generating corrected reads. ea2b03d
Bug Fixes
- Filter HiFi reads by their homopolymer compressed length. 258941d
- Show HiFi read length histograms using their uncompressed length. f1eadb3
- Fix crash trying to compute the error profile of unitigs with billions of overlaps. Issue #1355. 69e22c9
- Fix 'Assertion 'mincoord < maxcoord' failed' in findPotentialOrphans(). Issues #1872 and #1831. 2f73439
- Improve detectin of grid resources specified in environment variables. Issue #1912. 404540a
- Fix rare crash when placing reads in abnormally short tigs. 2b70735
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, especially for nanopore data, but may produce slightly less contiguous assemblies. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v2.1.1
These are release notes for Canu version 2.1.1, which was released on October 16th, 2020. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
- Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Research. (2020).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
Note that the installation directory has changed compared to previous releases.
To install from a binary distribution (recommended):
tar -xJf canu-2.1.1.*.tar.xz
Canu will be installed at canu-2.1.1/bin/canu.
To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.1.1.tar.gz instead):
tar -xJf canu-2.1.1.tar.xz
cd canu-2.1.1/src
make -j 8
cd ..
Canu will be installed at canu-2.1.1/build/bin/canu.
Changes
Canu v2.1.1 IS compatible with assemblies started with Canu v2.1.
This minor release adds a small performance enhancement to consensus and fixes two crashes, one in consensus and one in bogart.
- Add multithreading for the final step of consensus, where it aligns the original reads back to the consensus sequence to find the read layout.
- Fix a systematic crash (on some systems) in utgcns:
Assertion 'idmap.empty() == true' failed.
#1780. - Fix a crash in bogart (on PacBio HiFi metagenomic datasets):
Assertion 'isRepeat == true' failed
. #1806 and #1813.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, especially for nanopore data, but may produce slightly less contiguous assemblies. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v2.1
These are release notes for Canu version 2.1, which was released on August 21st, 2020. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
- Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. biorXiv. (2020).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
Note that the installation directory has changed compared to previous releases.
To install from a binary distribution (recommended):
tar -xJf canu-2.1.*.tar.xz
Canu will be installed at canu-2.1/bin/canu.
To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.1.tar.gz instead):
gunzip -dc canu-2.1.tar.gz | tar -xf -
cd canu-2.1/src
make -j 8
cd ..
Canu will be installed at canu-2.1/build/bin/canu.
Changes
Canu v2.1 IS NOT compatible with assemblies started with any previous version.
- Contigs are more correct, but generally smaller - better identification of bad reads, bubbles and ambiguous repeats.
** Avoid labeling true repeats as bubbles. Some contigs we previously flagged as bubbles are now flagged as repeats and are allowed to break contigs.
** Improve sensitivity of bubble detection. Some contigs we didn't flag before are now flagged as bubbles and will not break contigs.
** Break repeats at the read end suspected to be incorrectly assembled, instead of at the boundary of the repeat.
** Merge unambiguous small contigs into larger contigs correctly in tandem repeat regions. - Auto-increase maximum allowed overlap error when defaults are too restrictive. This applies to all datatypes but is particularly prevalent in HiFi datasets.
** Fix an esoteric error in picking the best overlap between a pair of reads that would sometimes fail to pick the longest overlap when all overlaps are at 100% identity. - Improve detection of circular contigs and output the coordinates of the non-redundant contig in the FASTA header line.
- Add a report of the quality of overlaps used when building contigs to 'asm.report'.
- Improve consensus quality in repetitive regions.
- Remove support for having read files in spec files; it only worked in limited cases, and would be hard to fix.
- Remove OSTYPE-MACHINETYPE (e.g., Linux-amd64) from the installation path. This quirk has been present since (almost) the first release of Celera Assembler. It was needed to support runs on a heterogeneous grid consisting of Intel 32-bit compute nodes (with 2 CPUs and 2 GB memory) and a "high memory" DEC Alpha node with 4 CPUs and 32 GB.
- Change ovlStore file names to be POSIX compliant. Old names should be silently updated. Issue #1732.
Bug Fixes
- Fix "Modification of non-creatable array value attempted" crash after "Meryl finished successfully." Issue #1632.
- Fix crash in splitReads "Assertion w->clrBgn >= w->iniBgn failed." Issue #1655.
- Fix failure running meryl-configure.sh on PBSPro. Issue #1740.
- Fix underestimate of memory needed for consensus. Issue #1750.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v2.0
These are release notes for Canu version 2.0, which was released on March 18th, 2020. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
- Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. biorXiv. (2020).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
To install from a binary distribution (recommended installation method):
tar -xJf canu-2.0.*.tar.xz
To install from source code (the file can be named either canu-v2.0.tar.gz
or just v2.0.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v2.0.tar.gz | tar -xf -
cd canu-2.0/src
make -j 8
cd ..
In both cases, canu is installed in directory canu-2.0/-, for example, canu-1.9/Linux-amd64. You can run the assembler with:
canu-2.0/*/bin/canu
Changes
This release introduces support for PacBio HiFi assembly and includes several major bug fixes.
Canu v2.0 IS NOT compatible with assemblies started with any previous version.
- Support for HiFi data using option '-pacbio-hifi'. Full details in the preprint HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
- Numerous improvements to contig construction that make longer more correct contigs:
** Detect bubbles during contig construction and prevent them from shattering heterozygous genomes.
** Detect and remove short branches branches during contig construction.
** Detect reads that are not fully covererd by overlaps and exclude them from contigs. - Option 'stopOnReadQuality' is enabled by default, but no longer aborts if there are too many short reads.
- Option 'minInputCoverage' will stop the assembly if the input read coverage is below this value, default 10. This supplements 'stopOnLowCoverage', which stops if read coverage is below some value after input, after correction or after trimming.
- Option 'maxInputCoverage', default 200, will randomly down-sample input reads to this coverage. It replaces option 'readSamplingCoverage' ('readSamplingBias' still exists).
- Write intermediate Mhap outputs to the
stageDirectory
if it is set.
Bug Fixes
- Multiple fixes to read positioning during contig construction (
Assertion 'cnt > 0' failed.
) - Possibly fix a weird error reading overlapper output that resulted in out of memory errors (
terminate called after throwing an instance of 'std::bad_alloc'
). - A variety of bug fixes that nobody will really care about (unless your assembly crashed, in which case you already know it's fixed) and will be tedious to list, so they aren't listed.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.9
These are release notes for Canu version 1.9, which was released on November 4th, 2019. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
To install from a binary distribution (recommended installation method):
tar -xJf canu-1.9.*.tar.xz
To install from source code (the file can be named either canu-v1.9.tar.gz
or just v1.9.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.9.tar.gz | tar -xf -
cd canu-1.9/src
make -j 8
cd ..
In both cases, canu is installed in directory canu-1.9/-, for example, canu-1.9/Linux-amd64. You can run the assembler with:
canu-1.9/*/bin/canu
Changes
This release includes several major bug fixes and improves repeat separation and consensus quality for assemblies.
Canu v1.9 IS NOT compatible with assemblies started with any previous version.
- Preliminary support for HiFi data using option '-pacbio-hifi'. This will skip the correction and trimming phases, set options for high quality reads.
- Improved detection of indel errors in overlaps used for creating contigs. Fix several errors that all but disabled detection of errors in these overlaps.
- Fix an error in consensus generation that was effectively disabling consensus on large contigs.
- Significantly improve speed of reading overlaps during, for example, trimming.
- Trim 'N' bases at either end of a read (as they tended to obscure true overlaps), and treat 'N' bases in the middle of a read as don't-care matches during consensus.
- Support for the DNAnexus platform.
- Output file 'contigs.gfa' was removed because it was misleading.
- Parameter 'saveOverlaps': By default, the 'correction' and 'trimming' overlap stores are removed when they are no longer needed. Set saveOverlaps=true to retain them.
- Parameter 'purgeOverlaps': Controls when to remove intermediate overlap data: never, normal (when all overlaps are loaded into an overlap store, default), aggressive (as soon as safely possible), dangerous (as soon as possible, even if it's unsafe).
- Parameter 'gridEngineResourceOption': A combination of gridEngineThreadsOption and gridEngineMemoryOption, useful for grid schedulers that use one option for requesting both memory and CPUs.
- Parameter 'hapUnknownFraction': Don't include 'unassigned' reads in the haplotype assemblies if they amount to less than some fraction of the total reads. Default 0.05 (5%).
- Option '-haplotype': Will stop Canu after haplotyped reads are generated. No assemblies will be started.
Bug Fixes
- A variety of bug fixes that nobody will really care about (unless your assembly crashed, in which case you already know it's fixed) and will be tedious to list, so they aren't listed.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.8
These are release notes for Canu version 1.8, which was released on October 22nd, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only); GCC 6 recommended
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.8.tar.gz
or just v1.8.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.8.tar.gz | tar -xf -
cd canu-1.8/src
make -j 8
cd ..
To install from a binary distribution (recommended installation method):
tar -xJf canu-1.8.*.tar.xz
In both cases, canu is installed in directory canu-1.8/-, for example, canu-1.8/Linux-amd64. You can run the assembler with:
canu-1.8/*/bin/canu
Changes
This release adds support for trio-binning (Nature Biotechnology), a reimplementation of the meryl kmer counter and processor, and improved support for object storage.
Note, however, that while object storage is supported, there are no methods to run tasks on, e.g., Amazon Web Services or Azure.
Canu v1.8 IS NOT compatible with assemblies started with any previous version.
- The Canu executive now fully supports trio-binning. Specifying parental haplotypes with the -haplotype* options enables trio binning. After the reads are binned into haplotypes, each haplotype assembly is automagically launched.
- The 'meryl' kmer counter was reimplemented for improved performance when counting kmers in reads, and better utilization of grid architectures. The method for deciding which kmers to ignore when computing overlaps changed, resulting, generally, in more kmers being ignored and thus lower run times for computing overlaps.
- The overlap store was largely reimplemented to reduce file counts and sizes during construction, and to allow the data-parallel store construction method to run without a grid. It works with object stores now, too. The sequential construction method runs as its own job, not part of the Canu executive, letting it use more resources than before.
- Decrease the default maximum error rate allowed when finding overlaps in corrected Nanopore reads from 14.4% to 12.0%. With the over-occurring kmer changes mentioned previously, run times for finding overlaps in Nanopore reads should decrease by 5 to 10 fold.
- Options 'executiveMemory' and 'executiveThreads' can be used to increase the size of the executive task. If this job is large enough, tasks that would previously run as individual grid jobs will be run from within the executive task, avoiding a submit/execute/submit cycle on heavily loaded grids.
- Options 'readSamplingCoverage' and 'readSamplingBias' can be used to down sample read coverage before starting correction or assembly.
- Option 'stopOnReadQuality', which seemed to just annoy people, was disabled, but option 'stopOnLowCoverage' was added to stop an assembly if read coverage is too low, 10 by default.
- Option 'gnuplotTested' was removed. Failure to find or run gnuplot is now handled automagically. Issues #1084 and #1129.
- Better file staging in seqStore and ovlStore when object storage is used.
Bug Fixes
- Various tweaks to job sizes. overlapInCore overlap jobs are generally larger now.
- Fix truncation of consensus sequence in large contigs due to mis-aligned reads leaving consensus bases with no read coverage.
- Fix correction failures caused by non-ACGT bases in input reads.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.7.1
These are release notes for Canu version 1.7.1, which was released on June 18th, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. Complete assembly of parental haplotypes with trio binning. Biorxiv. (2018).
Minimum Requirements
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only); GCC 6 recommended
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.7.1.tar.gz
or just v1.7.1.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.7.1.tar.gz | tar -xf -
cd canu-1.7.1/src
make -j 8
cd ..
To install from a binary distribution:
xz -dc canu-1.7.1.*.tar.xz |tar -xf -
In both cases, canu is installed in directory canu-1.7.1/-, for example, canu-1.7.1/Linux-amd64. You can run the assembler with:
canu-1.7.1/*/bin/canu
Changes
This release contains only bug fixes made since Canu v1.7 was released. No featrues were added or removed.
Canu v1.7.1 is compatible with assemblies started with Canu v1.7.
Canu v1.7 and v1.7.1 ARE NOT compatible with assemblies started with Canu v1.6.
Bug Fixes
*Fix many bogart issues, including the dreaded "Assertion `cnt > 0' failed". Issues #930, #874, #873, #844, #718, #546. Backported from 6f3c375.
*Fix Read Error Detection (RED) configuration to prevent single-read jobs. Issues #935, #854, #831, #815. Backported from eeef601.
*Fix excessive memory usage when loading evalues into the ovlStore. Issues #956, #758, #755. Backported from 858eff8.
*Fix a (potential) performance problem when computing overlaps for large assemblies: don't set a one-size-fits-all ovlHashBits, base it on the genome size. Backported from a580131.
*Fix a compilation error with GCC 8. Issue #927. Backported from f251336.
Known Issues
*Downloads before 22 June 2018 incorrectly reported the version as "1.7".
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - TrioCanu is not yet optimized for memory usage, as a result it requires higher than default memory for large genomes, the options
gridOptionsExecutive="--mem=250g" griodOptionsMeryl='--partition=largemem --mem=1000g'
(or the equivalent memory request on your grid) should be sufficient for a 3 Gbp genome. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.7
These are release notes for Canu version 1.7, which was released on February 27th, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. Complete assembly of parental haplotypes with trio binning. Biorxiv. (2018).
Minimum Requirements
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only)
- OS X 10.10 (for binaries only)
- gnuplot (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.7.tar.gz
or just v1.7.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.7.tar.gz | tar -xf -
cd canu-1.7/src
make -j 8
cd ..
To install from a binary distribution:
xz -dc canu-1.7.*.tar.xz |tar -xf -
In both cases, canu is installed in directory canu-1.7/-, for example, canu-1.7/Linux-amd64. You can run the assembler with:
canu-1.7/*/bin/canu
Changes
This release was originally planned to only include changes to read correction, but we opportunistically added: improved support for plasmids via read rescue; an initial implementation of trio binning; a 'fast mode' for Nanopore reads (though not automatic); and sneaked in some major changes to the gkpStore/tigStore read/contig database for future use. So much for the plan.
Assemblies started in Canu v1.6 ARE NOT compatible with Canu v1.7.
- Ensure that every raw read is either corrected or used as evidence for correcting some other raw read. This serves to rescue short plasmids in high coverage datasets, and it is no longer necessary to set corOutCoverage to achieve the same result.
- Initial support of TrioCanu (biorxiv) added.
- Add a '-fast' option for using a faster (but still not rigorously validated) overlap method. Useful for long Nanopore reads.
- In anticipation of future features, all reads - raw, corrected and trimmed versions - are stored in a single gkpStore in the root assembly directory.
- Read correction was almost completely re-engineered.
- Stability of the computation was increased by removing multiple processes communicating through a pipe.
- Layouts of the raw reads used to correct a read are saved for future use (e.g., during consensus). With the gkpStore change above, it is now possible to track a raw read through to the final contig outputs.
- Only a single corrected read is generated for each raw read. Previously, PacBio reads containing multiple sub-reads could create multiple (redundant) corrected reads.
- Overlap Error Detection (RED and OEA) memory usage when configuring compute jobs has been reduced.
- Overlap Error Detection (RED and OEA) job sizes were increased to reduce disk contention.
- overlapInCore (OBTOVL and UTGOVL) job sizes were increased to reduce disk contention and to take advantage of generally larger memory sizes available.
- The ovlRefBlockSize parameter was removed; use ovlRefBlockLength instead.
- Update to Snappy v1.1.7.
- Add basic support for RNA by translating input U bases to T bases. Output files are NOT translated back to U bases.
- Restrict the parallel overlap store creation method to grid runs. ovsMethod=forceparallel was added to force the usage of the parallel method on non-grid runs.
- Add the preExec option to allow a single command to run before any Canu program is run. Useful for, e.g., loading a Canu module.
- Use more standard locations for installing binaries and perl modules.
Bug Fixes
- In non-grid mode, Canu was running too many jobs concurrently and exhausting memory.
- Memory needed for consensus jobs is now set based on the largest contig.
- The VN tag in GFA outputs was set, incorrectly, to the name of the program creating the file. It is now reflecting the format version of the GFA file.
- Numerous not-very-exciting pedantic coding errors resolved. Stuff like failing to close a single input file, failing to release a block of memory, failing to check if an operation successfully completed, et cetera, that were technically incorrect but not significantly so.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- The Overlap Error Adjustment step does not properly configure its memory usage, include
redMemory=8 oeaMemory=8
as a workaround. - Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - TrioCanu is not yet optimized for memory usage, as a result it requires higher than default memory for large genomes, the options
gridOptionsExecutive="--mem=250g" griodOptionsMeryl='--partition=largemem --mem=1000g'
(or the equivalent memory request on your grid) should be sufficient for a 3 Gbp genome. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.6
These are release notes for Canu version 1.6, which was released on August 14th, 2017. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
Minimum Requirements
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only)
- OS X 10.10 (for binaries only)
- gnuplot (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.6.tar.gz
or just v1.6.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.6.tar.gz | tar -xf -
cd canu-1.6/src
make -j 8
cd ..
To install from a binary distribution:
xz -dc canu-1.6.*.tar.xz |tar -xf -
In both cases, canu is installed in directory canu-1.6/-, for example, canu-1.6/Linux-amd64. You can run the assembler with:
canu-1.6/*/bin/canu
Changes
- Improved detection of unitig and contig edges in GFA outputs.
- Repeats that are confirmed correct no longer form unitigs. This increases unitig length and greatly simplifies the unitig GFA.
- Small plasmids are no longer flagged as 'unassembled' sequences. Note that the
contigFilter
option values have changed and old values run the risk of filtering incorrectly. - Improved contig consensus accuracy (longer alignments to reference).
- Added a unitig to contig mapping via a BED output.
- Better memory management in bogart should reduce memory footprint slightly and run slightly faster.
- Remove the ovlStore for correction and trimming when those stages are finished. saveOverlaps=stores will retain them. The correction overlaps are usually the single largest consumer of disk space during the assembly.
- Remove the partitioned gkpStore copy when consensus is finished.
- Use file names with five digits, instead of four, for overlap error adjustment.
- Options minMemory and minThreads are now implemented.
- Use all overlaps, not just the best, to position reads in unitigs/contigs, resulting in more accurate repeat and edge detection.
- Implement the 'suggestCircular' flag in contigs and unitigs. It is set to 'true' if the single sequence can be circularized. Note: the flag is 'false' if two or more contigs are needed to form the circular chromosome.
- Stability improvements to overlap store building when ovsMethod=parallel (the default for large genomes).
- Easier restarts: if restarted from within the assembly directory, the -p, -d and read files can be omitted.
- Improved logging: citations are output at the start of the run for any included software within Canu.
Bug Fixes
- Fixed CIGAR multithreading bug in unitig and contig graphs which dropped some true edges.
- Fix invalid characters in corrected reads due to out of bounds array access.
- Fix useGrid=remote which failed to output commands when multiple jobs needed to be submitted.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- When running each step (correct/trim/assemble) by hand, the assemble step will use corrected not trimmed reads when all steps are run with the same -d option. Run with different -d options as a workaround.
- Large memory usage while unitig consensus calling on unitigs over 100MB in size; a 140Mb contig required approximately 75GB.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The optionsoverlapper=mhap utgReAlign=true
is significantly faster but may produce slightly less contiguous assemblies on genomes >200 Mbp in size. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.
Canu v1.5
These are release notes for Canu version 1.5, which was released on April 17th, 2017. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
Minimum Requirements
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only)
- OS X 10.10 (for binaries only)
- gnuplot (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.5.tar.gz
or just v1.5.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.5.tar.gz | tar -xf -
cd canu-1.5/src
make -j 8
cd ..
To install from a binary distribution:
xz -dc canu-1.5.*.tar.xz |tar -xf -
In both cases, canu is installed directory in canu-1.5/-, for example, canu-1.5/Linux-amd64. You can run the assembler with:
canu-1.5/*/bin/canu
Changes
- Add preliminary support for object storage.
- Paths used in the various shell scripts and the diagnostic output are no longer full paths.
- Use Edlib for read alignments during correction and consensus, which is both faster and generates higher quality results compared to the previous alignment algorithms.
- Add options
rawErrorRate
andcorrectedErrorRate
, both specifying the expected error in an alignment of two reads. The previouserrorRate
option is still accepted, and is equivalent to 1/3 *correctedErrorRate
. Details are in the tutorial. - Add experimental options
overlapper=mhap
andutgReAlign=true
which are significantly faster on ultra-long sequences. Both options need to be supplied. Currently has limited testing and is run at your own risk. On large genomes (>200mb) it can produce a less contiguous assembly than the default. - The GFA output now has correct CIGAR strings for all links.
- Support staging of some data on local disk for greatly improved performance during read correction.
- Significantly better support for PBSPro and LSF. Many thanks to the users that helped us work through problems.
- Fix error when more than 10,000 jobs were created using using the ovsMethod=parallel overlap store creation algorithm.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage while unitig consensus calling on unitigs over 100MB in size; a 140Mb contig required approximately 75GB.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.