Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Commit

Permalink
Merge pull request #634 from lshift/protobuf-sphinx-json
Browse files Browse the repository at this point in the history
Update the docs build process to use Protobuf
  • Loading branch information
kozbo authored Jun 20, 2016
2 parents d2fd0ae + 9b1484d commit 7a37d81
Show file tree
Hide file tree
Showing 30 changed files with 517 additions and 5,409 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
target
*~
#*
doc/source/schemas/*.avpr
doc/source/schemas/*.proto.rst
doc/source/schemas/build.rst
doc/source/_build/
build

#********** windows template**********
Expand Down Expand Up @@ -73,3 +75,4 @@ target/
#********** IntelliJ files ******
*.iml


9 changes: 1 addition & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,9 @@ help:
# doc/source is the root of the rst files; the ../.. components effectively
# counter the cd doc/source to place the docs at the schemas root
.PHONY: docs
docs: docs-schemas
docs:
cd doc/source; sphinx-build -b html -d ../../${BUILD_DIR}/doctrees . ../../${BUILD_DIR}/html

#=> docs-schema -- generate rst files from avdl
docs-schemas:
make -C doc/source/schemas default

.PHONY: package
package:
mvn package
Expand All @@ -38,10 +34,7 @@ package:
.PHONY: clean cleaner cleanest
clean:
find . -regex '.*\(~\|\.bak\)' -print0 | xargs -0r /bin/rm -v
make -C doc/source/schemas $@
cleaner: clean
make -C doc/source/schemas $@
cleanest: cleaner
find . -regex '.*\(\.orig\)' -print0 | xargs -0r /bin/rm -v
rm -fr target
make -C doc/source/schemas $@
2 changes: 1 addition & 1 deletion doc/source/api/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ data-provider-specified collection of related data of multiple types.
Logically, it's akin to a folder, where it's up to the provider what
goes into the folder. Individual data objects are linked by
`datasetId` fields to `Dataset objects
<../schemas/metadata.html#avro.Dataset>`_.
<../schemas/metadata.proto.html#protobuf.Dataset>`_.

Since the grouping of content in a dataset is determined by the data
provider, users should not make semantic assumptions about that data.
Expand Down
51 changes: 25 additions & 26 deletions doc/source/api/reads.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,48 +17,48 @@ specific genomic regions instead.

The model has the following data types:

============================== ============================================ ==================
Record | Description SAM/BAM rough equivalent
============================== ============================================ ==================
:avro:record:`ReadAlignment` | One alignment for one read A single line in a file
:avro:record:`ReadGroup` | A group of read alignments A single RG tag
:avro:record:`ReadGroupSet` | Collecton of ReadGroups that map to the Single SAM/BAM file
| same genome
:avro:record:`Program` | Software version and parameters that were PN, CL tags in SAM header
| used to align reads to the genome
:avro:record:`ReadStats` | Counts of aligned and unaligned reads Samtools flagstats on a file
| for a ReadGroup or ReadGroupSet
============================== ============================================ ==================
==================================== =========================================== ========================
Record Description SAM/BAM rough equivalent
==================================== =========================================== ========================
:protobuf:message:`ReadAlignment` One alignment for one read A single line in a file
:protobuf:message:`ReadGroup` A group of read alignments A single RG tag
:protobuf:message:`ReadGroupSet` Collecton of ReadGroups that map to the Single SAM/BAM file
same genome
:protobuf:message:`Program` Software version and parameters that were PN, CL tags in SAM header
used to align reads to the genome
:protobuf:message:`ReadStats` Counts of aligned and unaligned reads Samtools flagstats on a file
for a ReadGroup or ReadGroupSet
==================================== =========================================== ========================

The relationships are mostly one to many (e.g. each
:avro:record:`ReadAlignment` is part of exactly one
:avro:record:`ReadGroup`), with the exception that a
:avro:record:`ReadGroup` is allowed to be part of more than one
:avro:record:`ReadGroupSet`.
:protobuf:message:`ReadAlignment` is part of exactly one
:protobuf:message:`ReadGroup`), with the exception that a
:protobuf:message:`ReadGroup` is allowed to be part of more than one
:protobuf:message:`ReadGroupSet`.

:avro:record:`Dataset` --< :avro:record:`ReadGroupSet` >--< :avro:record:`ReadGroup` --< :avro:record:`ReadAlignment`
:protobuf:message:`Dataset` --< :protobuf:message:`ReadGroupSet` >--< :protobuf:message:`ReadGroup` --< :protobuf:message:`ReadAlignment`

* A :avro:record:`Dataset` is a general-purpose container, defined in
* A :protobuf:message:`Dataset` is a general-purpose container, defined in
metadata.avdl.
* A :avro:record:`ReadGroupSet` is a logical collection of ReadGroups,
* A :protobuf:message:`ReadGroupSet` is a logical collection of ReadGroups,
as determined by the data owner. Typically one
:avro:record:`ReadGroupSet` represents all the Reads from one
:protobuf:message:`ReadGroupSet` represents all the Reads from one
experimental sample, which traditionally would be stored in a single
BAM file.
* A :avro:record:`ReadGroup` is all the data that's processed the same
* A :protobuf:message:`ReadGroup` is all the data that's processed the same
way by the sequencer. There are typically 1-10 ReadGroups in a
:avro:record:`ReadGroupSet`.
* A :avro:record:`ReadAlignment` object is a flattened representation
:protobuf:message:`ReadGroupSet`.
* A :protobuf:message:`ReadAlignment` object is a flattened representation
of several layers of bioinformatics hierarchy, including fragments,
reads, and alignments, stored in one object for easy access.


ReadAlignment: detailed discussion
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

One :avro:record:`ReadAlignment` object represents the following
One :protobuf:message:`ReadAlignment` object represents the following
logical hierarchy. See the field definitions in the
:avro:record:`ReadAlignment` object for more details.
:protobuf:message:`ReadAlignment` object for more details.

.. image:: /_static/read_alignment_diagrams.png

Expand Down Expand Up @@ -88,4 +88,3 @@ identified by that ID. Records are represented by blue rectangles;
dotted lines indicate records defined in other schemas.

.. image:: /_static/reads_schema.png

14 changes: 7 additions & 7 deletions doc/source/api/variants.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,20 @@ constitute the genotype matrix.

The lowest-level entity is a Call:

* a :avro:record:`Call` encodes the genotype of an individual with
* a :protobuf:message:`Call` encodes the genotype of an individual with
respect to a variant, as determined by some analysis of
experimental data.

The other entities can be thought of as collections of Calls that have
something in common:

* a :avro:record:`VariantSet` supports working with a collection
* a :protobuf:message:`VariantSet` supports working with a collection
of Calls intended to be analyzed together.
* a :avro:record:`Variant` supports working with the subset of
* a :protobuf:message:`Variant` supports working with the subset of
Calls in a VariantSet that are at the same site and are
described using the same set of alleles. The Variant entity
contains:

* a variant description: a potential difference between
experimental DNA and a reference sequence, including the
site (position of the difference) and alleles (how the bases
Expand All @@ -46,17 +46,17 @@ something in common:
evidence for actual instances of that difference, as seen in
analyses of experimental data

* a :avro:record:`CallSet` supports working with the subset of
* a :protobuf:message:`CallSet` supports working with the subset of
Calls in a VariantSet that were generated by the same analysis
of the same sample. The CallSet includes information about which
sample was analyzed and how it was analyzed, and is linked to
information about what differences were found.

The following diagram shows the relationship of these four entities to
each other and to other GA4GH API entities. It shows which entities
contain other entities (such as :avro:record:`VariantSetMetadata`),
contain other entities (such as :protobuf:message:`VariantSetMetadata`),
and which contain IDs that can be used to get information from other
entities (such as :avro:record:`Variant`'s ``variantSetId``). The
entities (such as :protobuf:message:`Variant`'s ``variantSetId``). The
arrow points *from* the entity that contains the ID *to* the entity
that can be identified by that ID.

Expand Down
29 changes: 27 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@

import sys
import os
import subprocess

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath('../../tools/sphinx'))
sphinx_path = '../../tools/sphinx'
sys.path.insert(0, os.path.abspath(sphinx_path))

# -- General configuration ------------------------------------------------

Expand All @@ -33,9 +35,32 @@
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'avrodomain',
'protobufdomain',
]

base_dir = "../../src/main/proto"
json_dir = os.path.join("_build", "json-temp")
if not os.path.exists(json_dir):
os.makedirs(json_dir)
schema_dir = base_dir
for root, dirs, files in os.walk(schema_dir):
for f in files:
if not f.endswith(".proto"):
continue
fullpath = os.path.join(root, f)
json_file = f + ".json"
cmd = "protoc --proto_path %s --plugin=protoc-gen-custom=%s --custom_out=%s %s" % (base_dir, os.path.join(sphinx_path, "protobuf-json-docs.py"), json_dir, fullpath)
print cmd
subprocess.check_call(cmd, shell=True)

for root, dirs, files in os.walk(json_dir):
for f in files:
if not f.endswith(".json"):
continue
cmd = "python %s %s/%s schemas/" %(os.path.join(sphinx_path, "protodoc2rst.py"), root, f)
print cmd
subprocess.check_call(cmd, shell=True)

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand Down
16 changes: 16 additions & 0 deletions doc/source/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
channels:
- ioos
dependencies:
- protobuf=3.0.0b2.post2=py27_3
- openssl=1.0.2h=1
- pip=8.1.2=py27_0
- python=2.7.11=0
- readline=6.2=2
- setuptools=23.0.0=py27_0
- six=1.10.0=py27_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- wheel=0.29.0=py27_0
- zlib=1.2.8=3
- pip:
- protobuf==3.0.0b3
47 changes: 0 additions & 47 deletions doc/source/schemas/Makefile

This file was deleted.

Loading

0 comments on commit 7a37d81

Please sign in to comment.