Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Commit

Permalink
Update GraphModeFAQ.md
Browse files Browse the repository at this point in the history
Make it make sense with the side graph changes.
  • Loading branch information
adamnovak committed Apr 3, 2015
1 parent 12f5b93 commit 0ee32ea
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions doc/GraphModeFAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ If you have a relevant question, please add it to this document in a pull reques

In "classic" mode, a SNP is represented by a `Variant`, with `referenceBases` set to one base, and `alternateBases` set to the other.

In "graph" mode, a SNP exists as a single-base `Reference` (or novel sequence in a `VariantSet`) with the alternate base, joined onto the `Reference` with the original base, like this:
In "graph" mode, a SNP exists as a single-base `Sequence` with the alternate base, joined with two `Join`s onto the `Sequence` with the original base, like this:

```
  -G-
Expand All @@ -24,7 +24,7 @@ The variant caller may additionally emit a `Variant` tying the two `Allele`s tog

In "classic" mode, an indel is represented by a `Variant`, with `referenceBases` set to "" (for an insertion) or some bases (for a deletion), and `alternateBases` set to the inserted bases (for an insertion) or "" (for a deletion).

In "graph" mode, an indel exists as a `Reference` (or novel sequence in a `VariantSet`) with the inserted bases (or no bases for a deletion), joined onto the `Reference` such that it connects the endpoints of the indel, like this:
In "graph" mode, an insertion exists as a `Sequence` with the inserted bases, joined onto the modified `Sequence` with `Join`s such that it connects the endpoints of the indel, like this:

```
Insertion:
Expand All @@ -35,7 +35,11 @@ Insertion:
||
/\
--A--C--T--G--C--A--
```

A deletion is represented by a single `Join` skipping the deleted bases, like this:

```
Deletion:
--A--C--T--G--C--A--
Expand All @@ -52,11 +56,9 @@ In "classic" mode, one can issue a `searchVariants()` call interrogating the ran

In "graph" mode, the situation is more complicated. You want to perform a recursive search of the graph out to a distance of 10kb from your start position, following all possible paths.

You can use `searchReferences()` and `searchVariantSetSequences()` to get information about all the children of the sequence carrying the position you are interested in, and see if any of their join locations on your sequence of interest are within a 10kb window around your position of interest, and attached such that it is possible to read into them in the direction you are traversing the parent. In that case, you would have to recurse down into each such child, work out how far in from the joined end you can get with whatever is left of your 10kb window size after walking out to where the join is, and recursively search that region for more children.

If you come to the end of a sequence in your search, you will need to see if it is joined onto a parent (using `getReference()` or `getVariantSetSequence()` to get its end joins), then explore the parent out from the join position in the appropriate direction, searching for more children.
You can use `searchJoins()` to get information about all the `Sequence`s attached to the `Sequence` with the position you are interested in, within a 10kb window around your position of interest, and attached such that it is possible to read into them in the direction you are traversing the parent. You would have to recurse down into each such attached `Sequence` (retrieved with `getSequence()`), work out how far in from the joined end you can get with whatever is left of your 10kb window size after walking out to where the join is, and recursively search that region for more children.

Once you have determined all the ranges on all the sequences that are "within 10kb" of your starting position, you can make a `searchAlleles()` call on each of them to get all `Allele` objects involving any bases within 10kb of your start position. If any are associated with `Variant` objects, you can use the `getVariant()` call to retrieve those `Variant`s by ID.
Once you have determined all the ranges on all the `Sequence`s that are "within 10kb" of your starting position, you can make a `searchAlleles()` call on each of them to get all `Allele` objects involving any bases within 10kb of your start position. If any are associated with `Variant` objects, you can use the `getVariant()` call to retrieve those `Variant`s by ID.

If you are only interested in `Variant` objects with reference `Allele`s overlapping your chosen ranges, you can use `searchvariants()` calls instead of `searchAlleles()` calls. This will ignore `Allele`s which are not part of `Variant`s, or which are not the reference `Allele`s for their `Variant`s.

Expand Down

0 comments on commit 0ee32ea

Please sign in to comment.