Skip to content

bwt: An ANSI C implementation of the Burrows-Wheeler transformation (BWT)

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

MichaelDipperstein/bwt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DESCRIPTION
-----------
This archive contains a simple and readable ANSI C implementation of the
Burrows-Wheeler transformation (BWT) and its reverse transformation.  The
code for this implementation is derived directly from  "A Block-sorting
Lossless Data Compression Algorithm" by M. Burrows and D.J. Wheeler.

A PDF copy of the document may be found at:
ftp://apotheca.hpl.hp.com/pub/dec/SRC/research-reports/SRC-124.pdf

More information on Burrows-Wheeler Transform may be found at:
https://michaeldipperstein.github.io/bwt.html

FILES
-----
bwxform.c       - Library of Burrows-Wheeler transform (BWT) routines.
bwxform.h       - Header containing prototypes for library functions.
COPYING         - Rules for copying and distributing GPL software
COPYING.LESSER  - Rules for copying and distributing LGPL software
Makefile        - makefile for this project (assumes gcc compiler and GNU make)
README          - this file
sample.c        - Demonstration of how to use BWT library functions
optlist/        - Subtree containing optlist command line option parser library

BUILDING
--------
To build these files with GNU make and gcc, simply enter "make" from the
command line.  The executable will be named sample (or sample.exe).

GIT NOTE: Updates to the subtree optlist don't get pulled by "git pull"
Use the following commands to pull their updates:
git subtree pull --prefix optlist https://github.com/MichaelDipperstein/optlist.git master --squash


USAGE
-----
Usage: sample <options>

options:
  -c : Encode input file to output file.
  -d : Decode input file to output file.
  -m : Perform the Move-to-Front coding.
  -i <filename> : Name of input file.
  -o <filename> : Name of output file.
  -h|?  : Print out command line options.

-c      Generate a probability range list for the specified input file
        (see -i) then use arithmetic coding compresses the file using the
        range list to the specified output file (see -o).

-d      Decompresses the specified input file (see -i) writing the results to
        the specified output file (see -o).  Only files compressed by this
        program may be decompressed.

-m      Perform move to front encoding/decoding on each block.

-i <filename>   The name of the input file.  There is no valid usage of this
                program without a specified input file.

-o <filename>   The name of the output file.  If no file is specified, stdout
                will be used.  NOTE: Sending compressed output to stdout may
                produce undesirable results.

LIBRARY API
-----------
Transforming Data:
int BWXform(FILE *fpIn, FILE *fpOut, const xform_t method);
fpIn
    The file stream to be transformed.  It must non-NULL and opened.
fpOut
    The file stream receiving the transformed data.  It must non-NULL and
    opened.
method
    xform_t type value indicating whether indicate whether or not MTF is used.
Return Value
    Zero for success, non-zero for failure.

Reverse Transforming Data:
int BWXform(FILE *fpIn, FILE *fpOut, const xform_t method);
fpIn
    The file stream to be reverse transformed.  It must non-NULL and opened.
fpOut
    The file stream receiving the reverse transformed data.  It must non-NULL
    and opened.
method
    xform_t type value indicating whether indicate whether or not MTF is used.
Return Value
    Zero for success, non-zero for failure.

HISTORY
-------
08/20/04  - Initial Release
08/25/04  - Don't prefix blocks with block size.  Use value returned
            by fread() to recognize partial blocks instead.
05/02/05  - Allocating large arrays on heap instead of stack so that
            gcc can handle larger blocks.
11/03/05  - Sort algorithm is now similar to the "faster method" in the
            Burrows and Wheeler paper.
08/29/07  - Explicitly licensed under LGPL version 3.
          - Replaces getopt() with optlist library
10/29/14  - Changed the API so that encode and decode routines accept opened
            file streams instead of file names.
          - Upgrade to latest Oplist library
          - Tighter adherence to Michael Barr's "Top 10 Bug-Killing Coding
            Standard Rules" (http://www.barrgroup.com/webinars/10rules).
07/15/17  - Directory stucture changes for ease of use with GitHub
          - Included a test script that I've always used to test things
09/19/19  - Update e-mail address
          - pull the latest optlist

TODO
----
- Use "known" efficient Move-To-Front algorithm
- Investigate suffix tree based sorting solutions
  - There are several papers claiming speed improvements

AUTHOR
------
Michael Dipperstein ([email protected])
https://michaeldipperstein.github.io/

About

bwt: An ANSI C implementation of the Burrows-Wheeler transformation (BWT)

Resources

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published