GitHub - simone-pignotti/crazydoc: Read DNA sequences from colourful Microsoft Word documents

https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/crazydoc/badge.svg?branch=master

Crazydoc is a Python library to parse one of the most common DNA representation formats: the joyfully coloured and stylishly annotated MS-Word document.

Crazydoc returns Biopython records of the sequences contained in an MS-Word document, with record features corresponding to the various sequence highlightings (background color, boldness, italics, case change, etc.). The records can saved as GenBanks or easily plotted.

Motivation

While other standards such as FASTA or Genbank are better supported by modern sequence editors, none enjoys the same popularity among molecular biologist as MS-Word's .docx format, which is limited only by the sophistication and creativity of the user.

Relying on a loose syntax and unclear specifications, this format has however suffered from a lack of support in the developers community and is generally incompatible with mainstream software pipelines. This library allows to convert MS-Word DNA sequences to more computing friendly formats: Biopython records, FASTA, or annotated Genbanks.

Usage

To obtain all sequences contained in a docx as annotated Biopython records (such as this one):

from crazydoc import CrazydocParser
parser = CrazydocParser(['highlight_color', 'bold', 'underline'])
biopython_records = parser.parse_doc_file("./example.docx")

You can then plot the obtained records:

from crazydoc import CrazydocSketcher
sketcher = CrazydocSketcher()
for record in biopython_records:
    sketch = sketcher.translate_record(record)
    ax, _ = sketch.plot()
    ax.set_title(record.id)
    ax.figure.savefig('%s.png' % record.id)

To write the sequences down as Genbank records, with annotations:

from crazydoc import records_to_genbank
records_to_genbank(biopython_records)

Installation

(soon) You can install crazydoc through PIP

sudo pip install crazydoc

Alternatively, you can unzip the sources in a folder and type

sudo python setup.py install

License = MIT

Crazydoc is an open-source software originally written at the Edinburgh Genome Foundry by Zulko and released on Github under the MIT licence (copyright Edinburg Genome Foundry).

Everyone is welcome to contribute !

More biology software

https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png

Crazydoc is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
crazydoc		crazydoc
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENCE.txt		LICENCE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
ez_setup.py		ez_setup.py
pypi-readme.rst		pypi-readme.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Installation

License = MIT

More biology software

About

Releases

Packages

Languages

License

simone-pignotti/crazydoc

Folders and files

Latest commit

History

Repository files navigation

Usage

Installation

License = MIT

More biology software

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages