Releases: KirillKryukov/naf
Releases · KirillKryukov/naf
NAF v1.3.0
Highlights
- Added
--long
option to ennaf. It enables zstd's long distance matching (for sequence stream only, not for ids, names, lengths, mask or quality), increasing compression ratio and memory consumption. Now you can use-22 --long 31
for maximum compression of large data. Note that decompressing such data will also need more memory, so be careful with this option if you plan to share compressed files with others. Compatibility note: Older versions of unnaf won't be able to decompress data compressed with--long
value larger than 27.
Other changes
- zstd is updated to version 1.5.0, bringing some speed and compactness improvements.
- Added support for empty sequences. Now pathological data containing empty sequences (0 bp long) can be compressed in NAF format and unpacked back as identical to original file.
- Added
--binary
option to unnaf, as a shortcut to--binary-stdout --binary-stderr
.
NAF v1.2.0
This is a minor update. It adds --sequences
output option to unnaf, updates zstd, and improves overall quality (testability, compatibility).
Highlights
- Added
--sequences
output to unnaf. It prints sequences without names or qualities, one sequence per line. This is useful when looking for patterns (grepping) in sequence data. Previously you had to rununnaf --fasta --line-length 0 FILE.naf | grep -v ">" | grep "AACCGGTT"
. Now you can instead rununnaf --sequences FILE.naf | grep "AACCGGTT"
.
Other changes
- Updated zstd to 1.4.5.
- Fixed compilation with MinGW.
- Added
--binary-stderr
option to both ennaf and unnaf. It allows running testsuite on Windows. - Added
--binary-stdout
option to unnaf. It is useful for piping unnaf output to md5sum on Windows.
NAF v1.1.0
Highlights
- Added support for RNA, protein and text sequences. Expected input can be specified with ennaf's new switches:
--dna
,--rna
,--protein
and--text
. DNA is the default. Sequence type is stored in the compressed file, so unnaf will restore the correct data automatically. NAF format specification was updated to include sequence type information. This change is backward compatible - new unnaf will continue to work with old .naf files. - ennaf became a bit more efficient - it no longer loads each entire input sequence to memory, and no longer creates temporary files for small inputs.
- unnaf received a new switch
--charcount
for counting sequence characters in a .naf file.
Other changes
- Added report for number of unknown characters at the end of compression.
- Added strict compression mode (
--strict
switch). In this mode ennaf fails on any unexpected input character. - Added
-o
and-c
arguments to unnaf. - Added test suite.
- Incorporated zstd as submodule to simplify installation from source.
- Fixed streaming mode in MinGW builds.
NAF v1.0.0 - the initial release
The initial release
The initial release of NAF tools. Tested on Windows, Linux and Mac. Tested by compressing and decompressing over 200,000 genomes (2.5 TB), and numerous other datasets.
Provides basic functionality:
- Compresses a FASTA or FASTQ file, or an input stream, autodetecting format.
- Decompresses into same format by default.
- Autodetects, stores and recovers line length.
- Extracts and stores sequence mask, with the option to ignore it, for both compression and decompression
- Supports alignments (sequences with gap marked as '-').
- Supports N and other ambiguous IUPAC nucleotide codes (R, Y, S, W, K, M, B, D, H, V).
- Can pipe all input and output, enabling use in pipelines.
- Has partial decompression options for saving time: concatenated DNA sequence, accession numbers, sequence names, lengths, mask, 4-bit encoded sequence.
- Very fast on low compression levels, while still providing useful compression.
- Provides state of the art compression strength on high compression levels.