This repository has been archived by the owner on Oct 28, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 111
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding in Doxygen capabilities for generating UML. Adding UML diagram. Specifying how the UML diagram can be generated. Automating Avrodoc build with a script. Adding proper Beacon stuff to the UML Updating UML to drop AlleleResource Adding a Graph Mode FAQ It would be good to have the answers to people's questions about graph mode all in one place. Moving and renaming documentation All the extra Markdowns should go in doc/, and should not have spaces in the filenames. Adding an SVG of the UML to the repo. Make the FAQ make sense with the side graph changes.
- Loading branch information
Showing
7 changed files
with
25,615 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
#!/usr/bin/env python2.7 | ||
""" | ||
avdlDoxyFilter.py: hack Avro IDL files into vaguely C++-like files that Doxygen | ||
can read. | ||
Re-uses sample code and documentation from | ||
<http://users.soe.ucsc.edu/~karplus/bme205/f12/Scaffold.html> | ||
""" | ||
|
||
import argparse, sys, os, itertools, re | ||
|
||
def parse_args(args): | ||
""" | ||
Takes in the command-line arguments list (args), and returns a nice argparse | ||
result with fields for all the options. | ||
Borrows heavily from the argparse documentation examples: | ||
<http://docs.python.org/library/argparse.html> | ||
""" | ||
|
||
# The command line arguments start with the program name, which we don't | ||
# want to treat as an argument for argparse. So we remove it. | ||
args = args[1:] | ||
|
||
# Construct the parser (which is stored in parser) | ||
# Module docstring lives in __doc__ | ||
# See http://python-forum.com/pythonforum/viewtopic.php?f=3&t=36847 | ||
# And a formatter class so our examples in the docstring look good. Isn't it | ||
# convenient how we already wrapped it to 80 characters? | ||
# See http://docs.python.org/library/argparse.html#formatter-class | ||
parser = argparse.ArgumentParser(description=__doc__, | ||
formatter_class=argparse.RawDescriptionHelpFormatter) | ||
|
||
# Now add all the options to it | ||
parser.add_argument("avdl", type=argparse.FileType('r'), | ||
help="the AVDL file to read") | ||
|
||
return parser.parse_args(args) | ||
|
||
|
||
def main(args): | ||
""" | ||
Parses command line arguments, and does the work of the program. | ||
"args" specifies the program arguments, with args[0] being the executable | ||
name. The return value should be used as the program's exit code. | ||
""" | ||
|
||
options = parse_args(args) # This holds the nicely-parsed options object | ||
|
||
# Are we in a comment? | ||
in_comment = False | ||
|
||
# What level of braces are we in? | ||
brace_level = 0; | ||
|
||
for line in options.avdl: | ||
# For every line of Avro | ||
|
||
# See if it's a comment start or end. | ||
comment_starter = line.rfind("/*") | ||
comment_ender = line.rfind("*/") | ||
|
||
if(comment_starter != -1 and (comment_ender == -1 or | ||
comment_ender < comment_starter)): | ||
# We have entered a multiline comment | ||
|
||
in_comment = True | ||
elif comment_ender != -1: | ||
# We have ended a multiline comment and not started another one. | ||
in_comment = False | ||
|
||
if in_comment: | ||
# Just pass comments as-is | ||
print(line.rstrip()) | ||
continue | ||
|
||
# How many unbalanced braces do we have outside comments? | ||
brace_change = line.count("{") - line.count("}") | ||
|
||
if line.lstrip().startswith("protocol"): | ||
# It's a protocol, so make it a Module and an Interface. | ||
|
||
# Grab the protocol name | ||
name = re.search('protocol\s+(\S+)', line).group(1) | ||
|
||
# Make the open lines | ||
print("namespace {} {{".format(name)) | ||
#print("interface {} {{".format(name)) | ||
|
||
elif line.lstrip().startswith("record"): | ||
# It's a record, so make it a Struct. | ||
|
||
# Grab the record name | ||
name = re.search('record\s+(\S+)', line).group(1) | ||
|
||
print("struct {} {{".format(name)) | ||
|
||
elif line.lstrip().startswith("union"): | ||
# We need to fix up the union with semicolons. | ||
|
||
# Parse out the union | ||
match = re.search("union\s*{(.*)}(.*)", line) | ||
|
||
# What got unioned? | ||
unioned = match.group(1) | ||
|
||
# What's the rest of the line? | ||
rest = match.group(2) | ||
|
||
# Make the union a template as far as Doxygen knows. | ||
print("union<{}>{}".format(unioned, rest)) | ||
|
||
|
||
elif line.rstrip().endswith("}"): | ||
# The line is closing something, so it needs a semicolon. | ||
print("{};".format(line.rstrip())) | ||
else: | ||
# Pass other lines | ||
print(line.rstrip()) | ||
|
||
# Change the brace level. | ||
brace_level += brace_change | ||
|
||
if __name__ == "__main__" : | ||
sys.exit(main(sys.argv)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Script to make the avrodoc documentation. Run from the contrib folder: | ||
# $ contrib/make_avrodoc.sh | ||
# Depends on avrodoc already being on the PATH. | ||
# Can install the Avro command line tools jar itself. | ||
|
||
if [ -d contrib ] | ||
then | ||
# Make sure we are in the contrib directory. | ||
cd contrib | ||
fi | ||
|
||
if [ ! -f avro-tools.jar ] | ||
then | ||
|
||
# Download the Avro tools | ||
curl -o avro-tools.jar http://www.us.apache.org/dist/avro/avro-1.7.7/java/avro-tools-1.7.7.jar | ||
fi | ||
|
||
# Make a directory for all the .avpr files | ||
mkdir -p ../target/schemas | ||
|
||
# Make a place to put the documentation | ||
mkdir -p ../target/documentation | ||
|
||
for AVDL_FILE in ../src/main/resources/avro/*.avdl | ||
do | ||
# Make each AVDL file into a JSON AVPR file. | ||
|
||
# Get the name of the AVDL file without its extension or path | ||
SCHEMA_NAME=$(basename "$AVDL_FILE" .avdl) | ||
|
||
# Decide what AVPR file it will become. | ||
AVPR_FILE="../target/schemas/${SCHEMA_NAME}.avpr" | ||
|
||
# Compile the AVDL to the AVPR | ||
java -jar avro-tools.jar idl "${AVDL_FILE}" "${AVPR_FILE}" | ||
|
||
# Use Avrodoc to make a per-API documentation file. | ||
HTML_FILE="../target/documentation/${SCHEMA_NAME}.html" | ||
avrodoc "${AVPR_FILE}" > "${HTML_FILE}" | ||
|
||
done | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
#Graph Mode FAQ | ||
|
||
This document holds frequently asked questions about the new graph mode, and how various tasks can be accomplished in graph mode and in classic mode. | ||
|
||
If you have a relevant question, please add it to this document in a pull request. | ||
|
||
##What does a SNP look like in graph versus classic mode? | ||
|
||
In "classic" mode, a SNP is represented by a `Variant`, with `referenceBases` set to one base, and `alternateBases` set to the other. | ||
|
||
In "graph" mode, a SNP exists as a single-base `Sequence` with the alternate base, joined with two `Join`s onto the `Sequence` with the original base, like this: | ||
|
||
``` | ||
-G- | ||
/ \ | ||
--A--C--T--G--C--A-- | ||
``` | ||
|
||
To express the genotype of this SNP, a variant caller will need to emit a pair of `Allele`s, one of which follows a single-base path through the original base, and one of which follows a single-base path through the alternate base. It would then emit `AlleleCall`s noting the copy number of each `Allele` in each `CallSet`. | ||
|
||
The variant caller may additionally emit a `Variant` tying the two `Allele`s together, and giving genotypes in more traditional notation. | ||
|
||
##What does a short indel look like in graph versus classic mode? | ||
|
||
In "classic" mode, an indel is represented by a `Variant`, with `referenceBases` set to "" (for an insertion) or some bases (for a deletion), and `alternateBases` set to the inserted bases (for an insertion) or "" (for a deletion). | ||
|
||
In "graph" mode, an insertion exists as a `Sequence` with the inserted bases, joined onto the modified `Sequence` with `Join`s such that it connects the endpoints of the indel, like this: | ||
|
||
``` | ||
Insertion: | ||
-C--A- | ||
/ ____/ | ||
/ / | ||
|| | ||
/\ | ||
--A--C--T--G--C--A-- | ||
``` | ||
|
||
A deletion is represented by a single `Join` skipping the deleted bases, like this: | ||
|
||
``` | ||
Deletion: | ||
--A--C--T--G--C--A-- | ||
\_________/ | ||
``` | ||
|
||
To express the genotype of an indel, a variant caller will need to emit a pair of `Allele`s, one of which follows the path with the extra bases, and one of which follows the 0-length path consisting of the adjacency broken by the insertion or created by the deletion. The caller would then emit `AlleleCall`s noting the copy number of each `Allele` in each `CallSet`. | ||
|
||
The variant caller may additionally emit a `Variant` tying the two `Allele`s together, and giving genotypes in more traditional notation. | ||
|
||
##How do I walk the graph to find all variants within 10kbp of a specific position? | ||
|
||
In "classic" mode, one can issue a `searchVariants()` call interrogating the range 10kb upstream and downstream of the position of interest. All `Variant`s overlapping that range would be returned. | ||
|
||
In "graph" mode, the situation is more complicated. You want to perform a recursive search of the graph out to a distance of 10kb from your start position, following all possible paths. | ||
|
||
You can use `searchJoins()` to get information about all the `Sequence`s attached to the `Sequence` with the position you are interested in, within a 10kb window around your position of interest, and attached such that it is possible to read into them in the direction you are traversing the parent. You would have to recurse down into each such attached `Sequence` (retrieved with `getSequence()`), work out how far in from the joined end you can get with whatever is left of your 10kb window size after walking out to where the join is, and recursively search that region for more children. | ||
|
||
Once you have determined all the ranges on all the `Sequence`s that are "within 10kb" of your starting position, you can make a `searchAlleles()` call on each of them to get all `Allele` objects involving any bases within 10kb of your start position. If any are associated with `Variant` objects, you can use the `getVariant()` call to retrieve those `Variant`s by ID. | ||
|
||
If you are only interested in `Variant` objects with reference `Allele`s overlapping your chosen ranges, you can use `searchvariants()` calls instead of `searchAlleles()` calls. This will ignore `Allele`s which are not part of `Variant`s, or which are not the reference `Allele`s for their `Variant`s. | ||
|
||
|
||
|
Oops, something went wrong.