Skip to content

Releases: KirillKryukov/naf

NAF v1.3.0

17 May 08:27
042a210
Compare
Choose a tag to compare

Highlights

  • Added --long option to ennaf. It enables zstd's long distance matching (for sequence stream only, not for ids, names, lengths, mask or quality), increasing compression ratio and memory consumption. Now you can use -22 --long 31 for maximum compression of large data. Note that decompressing such data will also need more memory, so be careful with this option if you plan to share compressed files with others. Compatibility note: Older versions of unnaf won't be able to decompress data compressed with --long value larger than 27.

Other changes

  • zstd is updated to version 1.5.0, bringing some speed and compactness improvements.
  • Added support for empty sequences. Now pathological data containing empty sequences (0 bp long) can be compressed in NAF format and unpacked back as identical to original file.
  • Added --binary option to unnaf, as a shortcut to --binary-stdout --binary-stderr.

NAF v1.2.0

02 Sep 02:10
357c79f
Compare
Choose a tag to compare

This is a minor update. It adds --sequences output option to unnaf, updates zstd, and improves overall quality (testability, compatibility).

Highlights

  • Added --sequences output to unnaf. It prints sequences without names or qualities, one sequence per line. This is useful when looking for patterns (grepping) in sequence data. Previously you had to run unnaf --fasta --line-length 0 FILE.naf | grep -v ">" | grep "AACCGGTT". Now you can instead run unnaf --sequences FILE.naf | grep "AACCGGTT".

Other changes

  • Updated zstd to 1.4.5.
  • Fixed compilation with MinGW.
  • Added --binary-stderr option to both ennaf and unnaf. It allows running testsuite on Windows.
  • Added --binary-stdout option to unnaf. It is useful for piping unnaf output to md5sum on Windows.

NAF v1.1.0

01 Oct 10:44
a6e641a
Compare
Choose a tag to compare

Highlights

  • Added support for RNA, protein and text sequences. Expected input can be specified with ennaf's new switches: --dna, --rna, --protein and --text. DNA is the default. Sequence type is stored in the compressed file, so unnaf will restore the correct data automatically. NAF format specification was updated to include sequence type information. This change is backward compatible - new unnaf will continue to work with old .naf files.
  • ennaf became a bit more efficient - it no longer loads each entire input sequence to memory, and no longer creates temporary files for small inputs.
  • unnaf received a new switch --charcount for counting sequence characters in a .naf file.

Other changes

  • Added report for number of unknown characters at the end of compression.
  • Added strict compression mode (--strict switch). In this mode ennaf fails on any unexpected input character.
  • Added -o and -c arguments to unnaf.
  • Added test suite.
  • Incorporated zstd as submodule to simplify installation from source.
  • Fixed streaming mode in MinGW builds.

NAF v1.0.0 - the initial release

17 Jan 09:12
Compare
Choose a tag to compare

The initial release

The initial release of NAF tools. Tested on Windows, Linux and Mac. Tested by compressing and decompressing over 200,000 genomes (2.5 TB), and numerous other datasets.

Provides basic functionality:

  • Compresses a FASTA or FASTQ file, or an input stream, autodetecting format.
  • Decompresses into same format by default.
  • Autodetects, stores and recovers line length.
  • Extracts and stores sequence mask, with the option to ignore it, for both compression and decompression
  • Supports alignments (sequences with gap marked as '-').
  • Supports N and other ambiguous IUPAC nucleotide codes (R, Y, S, W, K, M, B, D, H, V).
  • Can pipe all input and output, enabling use in pipelines.
  • Has partial decompression options for saving time: concatenated DNA sequence, accession numbers, sequence names, lengths, mask, 4-bit encoded sequence.
  • Very fast on low compression levels, while still providing useful compression.
  • Provides state of the art compression strength on high compression levels.