-
Notifications
You must be signed in to change notification settings - Fork 15
Coverage_files
Filename: allele_base_coverage.json
This file contains per base coverage counts for alleles.
Consider a read which maps exactly to the PRG and overlaps some number of allele bases. Allele base counts are incremented for every overlapping read. This file contains separate counts for each allele base.
The following example consists of two sites. The first site consist of two alleles and the second site consists of three alleles.
{
"allele_base_counts": [
[
[0, 0, 0],
[1, 1, 0]
],
[
[0, 0, 1],
[2, 2, 0],
[2, 2, 0, 1, 3]
]
]
}
The third allele of the second site consists of five bases and therefore five counts:
sites = data["allele_base_counts"]
first_site = sites[0]
second_site = sites[1]
assert second_site[2] == [2, 2, 0, 1, 3]
Filename: grouped_allele_counts.json
Consider a single read with maps exactly to the PRG multiple times. Lets refer to each distinct mapping of single read as a "mapping instance". When two different mapping instances overlap a common site, the overlapped alleles are grouped together. Then, mapping coverage counts are aggregated for each allele group.
{
"grouped_allele_counts": {
"site_counts": [
{
"0": 10,
"1": 3,
"14": 10
},
{
"3": 30,
"2": 2,
"14": 1
}
],
"allele_groups": {
"0": [0, 2],
"1": [0, 2, 3],
"2": [0, 2, 4],
"3": [2, 5],
"14": [7, 8]
}
}
}
{
"grouped_allele_counts": {
"site_counts": [
{
"<allele_group_id>": <count>,
...
},
<site_index>,
...
],
"allele_groups": {
"<allele_group_id>": [<allele_id>, ...],
...
}
}
}
grouped_allele_counts = data["grouped_allele_counts"]
sites = grouped_allele_counts["site_counts"]
allele_groups = grouped_allele_counts["allele_groups"]
site = sites[0]
for allele_group_id, count in site.iter():
allele_ids = allele_groups[allele_group_id]
print(allele_ids, count)
Filename: allele_sum_coverage
This file contains coverage information for each allele within the PRG.
Each row (line) represents a variant site within the PRG. Each column (space separated within a single line) represents allele coverage counts.
0 0 0
1 0
0 3
This example describes the coverage information for three sites. The first site consists of three alleles. The second and third sites both consist of two alleles each. Read mapping instances have overlapped the first allele of the second site once (hence: 1). Similarly, read mapping instances have overlapped the second allele of the third site three times.