Skip to content

Commit

Permalink
docs: Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristianMichelsen committed Oct 27, 2022
1 parent b73b141 commit be56114
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 23 deletions.
10 changes: 5 additions & 5 deletions docs/source/dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ When you open the dashboard, you will see the following:
:align: center
```

In the middle of the page, we see the overview plot. This shows the amount of damage, $D_\text{max}$, on the y-axis and the fit quality, $z$, on the x-axis. In this case, we see three blue dots; one single dot per tax ID.
In the middle of the page, we see the overview plot. This shows the amount of damage, $D$, on the y-axis and the significance, $Z$, on the x-axis. In this case, we see three blue dots; one single dot per tax ID.

In general, we want a large amount of damage along with a high fit quality to believe that the related data is significantly ancient. In this case, the two points in the upper right seem like good potential candidates.
In general, we want a large amount of damage along with a high significance to believe that the related data is significantly ancient. In this case, the two points in the upper right seem like good potential candidates.

## Hover info

Expand All @@ -38,9 +38,9 @@ To extract more concrete information about the individual tax IDs (points in the
:align: center
```

Here we have hovered the mouse on the tax ID 711 which is the Lactobacillales order. Below the tax information, we can see the fit results, in particular the position-dependent damage-rate, $q$, the fit concentration $\phi$ and the correlation between $A$ and $c$, $\rho_{Ac}$, along with the fit quality and damage amount.
Here we have hovered the mouse on the tax ID 711 which is the Lactobacillales order. Below the tax information, we can see the fit results, in particular the position-dependent damage-rate, $q$, the fit concentration $\phi$ and the correlation between $A$ and $c$, $\rho_{Ac}$, along with the significance and damage amount.

In smaller analyses, the full Bayesian model can be used (which is recommended). In the more general case where this would be too time consuming, we would only have the MAP results. In this case, the fit quality is measured by the likelihood ratio $\lambda_\text{LR}$.
In smaller analyses, the full Bayesian model can be used (which is recommended). In the more general case where this would be too time consuming, we would only have the MAP results. In this case, the significance is measured by the MAP significance $Z_\text{MAP}$.

In the bottom of the hover square, we see the count information. This shows, that this tax ID consisted of $22.5 \times 10^3$ individual reads with the same number of alignments (indicating very little amount of overlap between the references in this case). The `k sum total` is the total number of $C \rightarrow T$ transitions across all reads at all positions (and $G \rightarrow A$ for the reverse strand). `N sum total` is the total number of $C$ to anything, $C \rightarrow X$, across all reads at all positions (and $G \rightarrow X$ for the reverse strand).

Expand Down Expand Up @@ -106,7 +106,7 @@ If we hide the `Filters` pane by clicking on it and instead show the `Styling` p
:align: center
```

This allows to go deeper into the fit results and explore more advanced relationships between the variables. If one e.g. changes the x-axis to show `D_max`, it is possible to visually compare the two estimates of $D_\text{max}$; the MAP one and the Bayesian one.
This allows to go deeper into the fit results and explore more advanced relationships between the variables. If one e.g. changes the x-axis to show `MAP_damage`, it is possible to visually compare the two estimates of $D$; the MAP one and the Bayesian one.
All of the points in the dashboard are sized according to the number of reads. This can be changed in the `Variable` dropdown. By default, they are sized according to the square root of this variable; this can also be changed in the `Function` dropdown. Finally, a relative scaling is also positive in the `Scale` slider.

## Export CSV
Expand Down
15 changes: 5 additions & 10 deletions docs/source/results.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,15 @@ The column names in the results and their explanation:
- `N_alignments`: The number of alignments. int64.

## Fit related parameters
- `lambda_LR`: The likelihood ratio, $\lambda_\text{LR}$, between the null model and the ancient damage model. This can be interpreted as the fit certainty, where higher values means higher certainty. float64.
- `lambda_LR_P`: The likelihood ratio expressed as a probability. float64.
- `lambda_LR_z`: The likelihood ratio expressed as number of $\sigma$. float64.
- `D_max`: The estimated damage, $D_\text{max}$. This can be interpreted as the amount of damage in the specific taxa. float64.
- `damage`: The estimated damage in the specific taxa, $D$. float64.
- `significance`: The number of sigmas that the damage is away from 0, i.e. how certain one should be about there being non-zero damage. float64.
- `q`: The damage decay rate. float64.
- `A`: The background independent damage. float64.
- `c`: The background. float64.
- `phi`: The concentration for a beta binomial distribution (parametrised by $\mu$ and $\phi$). float64.
- `rho_Ac`: The correlation between $A$ and $c$, $\rho_{Ac}$. High values of this are often a sign of a bad fit. float64.
- `valid`: Wether or not the fit is valid (defined by [iminuit](https://iminuit.readthedocs.io/en/stable/)). bool.
- `asymmetry`: An estimate of the asymmetry of the forward and reverse fits. See below for more information. float64.
- `XXX_std`: the uncertainty (standard deviation) of the variable `XXX` for $D_\text{max}$, $A$, $q$, $c$, and $\phi$.
- `forward__XXX`: The same description as above for variable `XXX`, but only for the forward read.
- `reverse__XXX`: The same description as above for variable `XXX`, but only for the reverse read.
- `XXX_std`: the uncertainty (standard deviation) of the variable `XXX` for $D$, $A$, $q$, $c$, and $\phi$.
- `MAP_valid`: Whether or not the MAP fit is valid (defined by [iminuit](https://iminuit.readthedocs.io/en/stable/)). bool.

## Read related parameters
- `mean_L`: The mean read length of all the individual, unique reads that map to the specific taxa. float64.
Expand All @@ -47,5 +42,5 @@ The column names in the results and their explanation:
- `k-i`: Same as above, but for the reverse direction. int64.
- `N+i`: The number of _"trials"_, $N$ at position $x=i$: $N(x=i)$ in the forward direction. int64.
- `N-i`: Same as above, but for the reverse direction. int64.
- `f+i`: The fraction between $k$ and $N$, $f = k / N$, at position $x=i$ in the forward direction. int64.
- `f+i`: The damage frequency, $f$, given $k$ and $N$: $f(x) = k(x) / N(x)$, at position $x=i$ in the forward direction. int64.
- `f-i`: Same as above, but for the reverse direction. int64.
12 changes: 6 additions & 6 deletions docs/source/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,9 @@ The command above will open a page in web browser automatically where you will s
:align: center
```

In the middle of the page, we see the overview plot. This shows the amount of damage, $D_\text{max}$, on the y-axis and the fit quality, $z$, on the x-axis. In this case, we see three blue dots; one single dot per tax ID.
In the middle of the page, we see the overview plot. This shows the amount of damage, $D$, on the y-axis and the significance, $Z$, on the x-axis. In this case, we see three blue dots; one single dot per tax ID.

In general, we want a large amount of damage along with a high fit quality to believe that the related data is significantly ancient. In this case, the two points in the upper right seem like good potential candidates.
In general, we want a large amount of damage along with a high significance to believe that the related data is significantly ancient. In this case, the two points in the upper right seem like good potential candidates.

#### Hover info

Expand All @@ -106,9 +106,9 @@ To extract more concrete information about the individual tax IDs (points in the
:align: center
```

Here we have hovered the mouse on the tax ID 711 which is the Lactobacillales order. Below the tax information, we can see the fit results, in particular the position-dependent damage-rate, $q$, the fit concentration $\phi$ and the correlation between $A$ and $c$, $\rho_{Ac}$, along with the fit quality and damage amount.
Here we have hovered the mouse on the tax ID 711 which is the Lactobacillales order. Below the tax information, we can see the fit results, in particular the position-dependent damage-rate, $q$, the fit concentration $\phi$ and the correlation between $A$ and $c$, $\rho_{Ac}$, along with the significance and damage amount.

In this small analyses, we ran the full Bayesian model. In the more general case where this would be too time consuming, we would only have the MAP results. These are show below. In this case, the fit quality is measured by the likelihood ratio $\lambda_\text{LR}$. Note the strong correspondence between the Bayesian fit results and the approximate MAP results.
In this small analyses, we ran the full Bayesian model. In the more general case where this would be too time consuming, we would only have the MAP results. These are show below. In this case, the MAPsignificance is measured by the MAP significance $Z_\text{MAP}$. Note the strong correspondence between the Bayesian fit results and the approximate MAP results.

In the buttom of hover square, we see the count information. This shows, that this tax ID consisted of $22.5 \times 10^3$ individual reads with the same number of alignments (indicating very little amount of overlap between the references in this case). The `k sum total` is the total number of $C \rightarrow T$ transistions across all reads at all positions (and $G \rightarrow A$ for the reverse strand). `N sum total` is the total number of $C$ to anything, $C \rightarrow X$, across all reads at all positions (and $G \rightarrow X$ for the reverse strand).

Expand All @@ -123,7 +123,7 @@ So now we have managed to extract the fit results of the specific tax ID. Howeve
```
In this plot, the position dependent frequency of the $C \rightarrow T$ transitions are shown in blue dots and the $G \rightarrow A$ transitions in red. The green curve is the fit and the dashed area shows the $1\sigma$ (68%) confidence interval of the fit.

We see that damage frequency starts at around 0.085 and then drops to about 0.025. This is an elevated amount of damage of about 0.06 in the beginning of the read compared to the asymptotic value, which is exactly what $D_\text{max}$ explains. Similarly see a quite clear trend in the data; the visual appearence of the data matches the quantitative fit results.
We see that damage frequency starts at around 0.085 and then drops to about 0.025. This is an elevated amount of damage of about 0.06 in the beginning of the read compared to the asymptotic value, which is exactly what $D$ explains. Similarly see a quite clear trend in the data; the visual appearance of the data matches the quantitative fit results.

We can compare this to the data in the bottom left of the overview plot.

Expand All @@ -133,7 +133,7 @@ We can compare this to the data in the bottom left of the overview plot.
:align: center
```

Here we see that there does seem to be some damage in this tax ID, although the data is a lot more noisy and scattered around. This is also why the fit quality, $z$, is a lot smaller than it is for the previous tax ID.
Here we see that there does seem to be some damage in this tax ID, although the data is a lot more noisy and scattered around. This is also why the significance, $X$, is a lot smaller than it is for the previous tax ID.

#### Export CSV

Expand Down
4 changes: 2 additions & 2 deletions src/metaDMG/viz/figures.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def plot_group(
font_size=30,
)

# add D-max as single errorbar
# add damage as single errorbar
if D_info is not None:

D, D_low, D_high = D_info
Expand All @@ -271,7 +271,7 @@ def plot_group(
color="black",
),
mode="markers",
name="D-max",
name="Damage",
marker_color="black",
# hovertemplate=viz_results.hovertemplate_D,
hoverinfo="skip",
Expand Down

0 comments on commit be56114

Please sign in to comment.