diff --git a/vignettes/tutorial.R b/vignettes/tutorial.R
index 643f30d..32df7e8 100644
--- a/vignettes/tutorial.R
+++ b/vignettes/tutorial.R
@@ -29,10 +29,6 @@ library(rats)
# Show the first rows of the table corresponding to one sample, from simulated data.
head(sim_boot_data()[[2]][[1]])
-## ------------------------------------------------------------------------
-# Show the first rows of the table corresponding to one condition, from simulated data.
-head(sim_count_data()[[2]])
-
## ------------------------------------------------------------------------
# Show the first rows of the table corresponding to the annotation, from simulated data.
head(sim_count_data()[[1]])
@@ -97,6 +93,9 @@ print( names(ids) )
# DTU positive genes.
print( ids[["dtu-genes"]] )
+## ------------------------------------------------------------------------
+print( names(mydtu) )
+
## ------------------------------------------------------------------------
# Parameter list's elements.
print( names(mydtu$Parameters) )
diff --git a/vignettes/tutorial.Rmd b/vignettes/tutorial.Rmd
index aab212d..980069e 100644
--- a/vignettes/tutorial.Rmd
+++ b/vignettes/tutorial.Rmd
@@ -1,7 +1,7 @@
---
title: "RATs Tutorial"
author: "Kimon Froussios"
-date: "`r date()`"
+date: "23 AUG 2016"
output:
html_document:
fig_caption: no
@@ -113,14 +113,14 @@ library(rats)
2. Generic bootstrapped abundance estimates.
3. Generic abundance estimates.
-1. From a [Sleuth](http://pachterlab.github.io/sleuth/) object `rats` extracts the bootstrapped abundance estimates. See the
+From a [Sleuth](http://pachterlab.github.io/sleuth/) object `rats` extracts the bootstrapped abundance estimates. See the
[introduction to Sleuth](http://rawgit.com/pachterlab/sleuth/master/inst/doc/intro.html) pages for details on how
to load the transcript abundance estimate data from [Kallisto](http://pachterlab.github.io/kallisto/) into a Sleuth
object, and see the [wasabi](http://github.com/COMBINE-lab/wasabi) tool for how to load the transcript abundance
estimate data from [Sailfish](http://github.com/kingsfordgroup/sailfish) or [Salmon](https://github.com/COMBINE-lab/salmon)
into a sleuth object.
-2. Bootstrapped abundance estimates obtained by other means can be input as `list`s of `data.table`s. Two lists are needed, one per condition.
+Bootstrapped abundance estimates obtained by other means can be input as `list`s of `data.table`s. Two lists are needed, one per condition.
Each data table should contain the transcript identifiers in the first column, followed by columns containing the estimates from the bootstrap iterations:
```{r}
@@ -128,13 +128,9 @@ Each data table should contain the transcript identifiers in the first column, f
head(sim_boot_data()[[2]][[1]])
```
-3. Generic abundance estimates, without bootstrapping information, can be input simply as two `data.table`s, one per condition. The first column should
-contain the transcript identifiers, followed by columns listing the abundance per sample:
-
-```{r}
-# Show the first rows of the table corresponding to one condition, from simulated data.
-head(sim_count_data()[[2]])
-```
+Generic abundance estimates, without bootstrapping information, can be input simply as two `data.table`s, one per condition. The first column should
+contain the transcript identifiers, followed by columns listing the abundance per sample. The format of each table is identical to the one shown above,
+only the meaning of the numbers changes.
**Regardless of data format**, `rats` also needs an annotation `data.frame` that matches transcript identifiers to gene identifiers. This looks like this:
@@ -301,6 +297,13 @@ print( ids[["dtu-genes"]] )
## Output structure
+The output of `rats` is a list containing 3 elements:
+
+```{r}
+print( names(mydtu) )
+```
+
+
### Parameters
`Parameters` is a list that contains information about the data and the settings.
diff --git a/vignettes/tutorial.html b/vignettes/tutorial.html
index a305204..d14e08f 100644
--- a/vignettes/tutorial.html
+++ b/vignettes/tutorial.html
@@ -11,6 +11,7 @@
+
RATs Tutorial
@@ -127,7 +128,7 @@
RATs Tutorial
Kimon Froussios
-Mon Aug 22 22:10:19 2016
+23 AUG 2016
Abstract
The rats package is aimed at people analysing gene expression and transcript abundance, particularly with RNA-Seq type of data in mind. This R Markdown vignette constitutes the main documentation for the package.
@@ -248,10 +249,10 @@
Input formats
- A Sleuth object.
- Generic bootstrapped abundance estimates.
-Generic abundance estimates.
-From a Sleuth object rats
extracts the bootstrapped abundance estimates. See the introduction to Sleuth pages for details on how to load the transcript abundance estimate data from Kallisto into a Sleuth object, and see the wasabi tool for how to load the transcript abundance estimate data from Sailfish or Salmon into a sleuth object.
-Bootstrapped abundance estimates obtained by other means can be input as list
s of data.table
s. Two lists are needed, one per condition. Each data table should contain the transcript identifiers in the first column, followed by columns containing the estimates from the bootstrap iterations:
+- Generic abundance estimates.
+
From a Sleuth object rats
extracts the bootstrapped abundance estimates. See the introduction to Sleuth pages for details on how to load the transcript abundance estimate data from Kallisto into a Sleuth object, and see the wasabi tool for how to load the transcript abundance estimate data from Sailfish or Salmon into a sleuth object.
+
Bootstrapped abundance estimates obtained by other means can be input as list
s of data.table
s. Two lists are needed, one per condition. Each data table should contain the transcript identifiers in the first column, followed by columns containing the estimates from the bootstrap iterations:
# Show the first rows of the table corresponding to one sample, from simulated data.
head(sim_boot_data()[[2]][[1]])
## target_id V1 V2 V3
@@ -261,18 +262,7 @@ Input formats
## 4: 1D1C:two 76 80 72
## 5: 1B1C.1 0 0 0
## 6: 1B1C.2 52 55 50
-
-- Generic abundance estimates, without bootstrapping information, can be input simply as two
data.table
s, one per condition. The first column should contain the transcript identifiers, followed by columns listing the abundance per sample:
-
-
# Show the first rows of the table corresponding to one condition, from simulated data.
-head(sim_count_data()[[2]])
-
## target_id V1 V2 V3
-## 1: NIB.1 0 0 0
-## 2: 1A1N-2 20 21 18
-## 3: 1D1C:one 0 0 0
-## 4: 1D1C:two 76 80 72
-## 5: 1B1C.1 0 0 0
-## 6: 1B1C.2 52 55 50
+
Generic abundance estimates, without bootstrapping information, can be input simply as two data.table
s, one per condition. The first column should contain the transcript identifiers, followed by columns listing the abundance per sample. The format of each table is identical to the one shown above, only the meaning of the numbers changes.
Regardless of data format, rats
also needs an annotation data.frame
that matches transcript identifiers to gene identifiers. This looks like this:
# Show the first rows of the table corresponding to the annotation, from simulated data.
head(sim_count_data()[[1]])
@@ -392,6 +382,9 @@
Quick results
Output structure
+
The output of rats
is a list containing 3 elements:
+
print( names(mydtu) )
+
## [1] "Parameters" "Genes" "Transcripts"
Parameters
Parameters
is a list that contains information about the data and the settings.
@@ -556,22 +549,22 @@
Example results
## 3: NA NA NA NA NA
## 4: NA NA NA NA NA
## 5: NA NA NA NA NA
-## 6: 0.83 0.002297371 0.0006862716 0.002867174 0.0009853328
+## 6: 0.82 0.002913862 0.0009343038 0.004617752 0.001553497
## 7: NA NA NA NA NA
-## 8: 1.00 0.000000000 0.0000000000 0.000000000 0.0000000000
+## 8: 1.00 0.000000000 0.0000000000 0.000000000 0.000000000
## 9: NA NA NA NA NA
-## 10: 0.00 0.784661756 0.7036691880 0.139088825 0.1925920036
+## 10: 0.00 0.776276584 0.6948216461 0.142400556 0.196181482
## boot_minAB boot_minBA boot_maxAB boot_maxBA boot_na
## 1: NA NA NA NA NA
## 2: NA NA NA NA NA
## 3: NA NA NA NA NA
## 4: NA NA NA NA NA
## 5: NA NA NA NA NA
-## 6: 5.030128e-05 1.550451e-05 0.01321885 0.00480727 0
+## 6: 5.030128e-05 1.550451e-05 0.0154553 0.00480727 0
## 7: NA NA NA NA NA
-## 8: 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0
+## 8: 0.000000e+00 0.000000e+00 0.0000000 0.00000000 0
## 9: NA NA NA NA NA
-## 10: 5.143483e-01 3.311102e-01 0.99150834 0.98871978 0
+## 10: 5.143483e-01 3.311102e-01 0.9915083 0.98871978 0
There are 10 genes in the annotation used. Here are some possible scenarios:
1A1N
has only one known transcript and is thus not eligible. If you look in myslo$kal[[1]]$bootstrap[[1]]
, there are actually two recorded transcripts for this gene, but only one of them is recorded in the annotation. rats
uses the annotation as the basis for managing information. Any transcripts/genes present in the sleuth data, but missing from the annotation, will be ignored completely.
@@ -666,19 +659,19 @@ Example results
## 6: NA NA NA NA NA NA
## 7: NA NA NA NA NA NA
## 8: NA NA NA NA NA NA
-## 9: TRUE 0.73 3.420637e-02 2.555941e-02 6.605471e-03 1.095659e-01
-## 10: TRUE 0.73 3.420637e-02 2.555941e-02 6.605471e-03 1.095659e-01
+## 9: TRUE 0.71 3.517732e-02 3.203049e-02 6.605471e-03 1.095659e-01
+## 10: TRUE 0.71 3.517732e-02 3.203049e-02 6.605471e-03 1.095659e-01
## 11: NA NA NA NA NA NA
-## 12: FALSE 0.09 5.871506e-01 3.743486e-01 1.380439e-02 1.000000e+00
-## 13: TRUE 1.00 2.253061e-78 4.627819e-78 5.067939e-84 2.129120e-77
-## 14: TRUE 1.00 7.211996e-49 1.620455e-48 6.413781e-53 6.559376e-48
-## 15: TRUE 0.00 1.658158e-21 2.826097e-21 1.066067e-24 8.294307e-21
-## 16: TRUE 1.00 1.147936e-44 1.666157e-44 1.516132e-46 7.296158e-44
+## 12: FALSE 0.02 6.565423e-01 3.300701e-01 1.380439e-02 1.000000e+00
+## 13: TRUE 1.00 1.754386e-78 4.381734e-78 5.067939e-84 2.129120e-77
+## 14: TRUE 1.00 3.094905e-49 7.881022e-49 6.413781e-53 6.559376e-48
+## 15: TRUE 0.00 1.950232e-21 3.008824e-21 1.066067e-24 8.294307e-21
+## 16: TRUE 1.00 1.372119e-44 1.970160e-44 1.516132e-46 7.296158e-44
## 17: NA NA NA NA NA NA
-## 18: FALSE 0.00 7.338926e-01 1.546231e-01 3.567434e-01 9.643718e-01
+## 18: FALSE 0.00 7.186030e-01 1.667606e-01 3.567434e-01 9.643718e-01
## 19: NA NA NA NA NA NA
-## 20: FALSE 0.00 9.501605e-01 6.529350e-02 7.948790e-01 1.000000e+00
-## 21: FALSE 0.00 9.501605e-01 6.529350e-02 7.948790e-01 1.000000e+00
+## 20: FALSE 0.00 9.408455e-01 7.414637e-02 7.948790e-01 1.000000e+00
+## 21: FALSE 0.00 9.408455e-01 7.414637e-02 7.948790e-01 1.000000e+00
## sig boot_freq boot_mean boot_stdev boot_min boot_max
## boot_na
## 1: NA
@@ -747,11 +740,11 @@ Plots of overall run
As mentioned before, bootstrapping provides a measure of confidence. This confidence in not taken into account for the DTU calls. It is therefore useful to plot its potential influence. The following two plots show how different confidence thresholds would affect the number of DTU positive calls.
# Transcript-level confidence threshold VS. number of DTU positive calls.
plot_overview(mydtu, type="transc_conf")
-
+
And this is what it looks like on a larger dataset:
# Gene-level confidence threshold VS. number of DTU positive calls.
plot_overview(mydtu, type="gene_conf")
-
+
And this is what it looks like on a larger dataset:
diff --git a/vignettes/tutorial.md b/vignettes/tutorial.md
index 6ef04d9..5a9feec 100644
--- a/vignettes/tutorial.md
+++ b/vignettes/tutorial.md
@@ -1,6 +1,6 @@
# RATs Tutorial
Kimon Froussios
-`r date()`
+23 AUG 2016
# Relative Abundance of Transcripts (RATs).
@@ -99,14 +99,14 @@ library(rats)
2. Generic bootstrapped abundance estimates.
3. Generic abundance estimates.
-1. From a [Sleuth](http://pachterlab.github.io/sleuth/) object `rats` extracts the bootstrapped abundance estimates. See the
+From a [Sleuth](http://pachterlab.github.io/sleuth/) object `rats` extracts the bootstrapped abundance estimates. See the
[introduction to Sleuth](http://rawgit.com/pachterlab/sleuth/master/inst/doc/intro.html) pages for details on how
to load the transcript abundance estimate data from [Kallisto](http://pachterlab.github.io/kallisto/) into a Sleuth
object, and see the [wasabi](http://github.com/COMBINE-lab/wasabi) tool for how to load the transcript abundance
estimate data from [Sailfish](http://github.com/kingsfordgroup/sailfish) or [Salmon](https://github.com/COMBINE-lab/salmon)
into a sleuth object.
-2. Bootstrapped abundance estimates obtained by other means can be input as `list`s of `data.table`s. Two lists are needed, one per condition.
+Bootstrapped abundance estimates obtained by other means can be input as `list`s of `data.table`s. Two lists are needed, one per condition.
Each data table should contain the transcript identifiers in the first column, followed by columns containing the estimates from the bootstrap iterations:
@@ -125,24 +125,9 @@ head(sim_boot_data()[[2]][[1]])
## 6: 1B1C.2 52 55 50
```
-3. Generic abundance estimates, without bootstrapping information, can be input simply as two `data.table`s, one per condition. The first column should
-contain the transcript identifiers, followed by columns listing the abundance per sample:
-
-
-```r
-# Show the first rows of the table corresponding to one condition, from simulated data.
-head(sim_count_data()[[2]])
-```
-
-```
-## target_id V1 V2 V3
-## 1: NIB.1 0 0 0
-## 2: 1A1N-2 20 21 18
-## 3: 1D1C:one 0 0 0
-## 4: 1D1C:two 76 80 72
-## 5: 1B1C.1 0 0 0
-## 6: 1B1C.2 52 55 50
-```
+Generic abundance estimates, without bootstrapping information, can be input simply as two `data.table`s, one per condition. The first column should
+contain the transcript identifiers, followed by columns listing the abundance per sample. The format of each table is identical to the one shown above,
+only the meaning of the numbers changes.
**Regardless of data format**, `rats` also needs an annotation `data.frame` that matches transcript identifiers to gene identifiers. This looks like this:
@@ -347,6 +332,18 @@ print( ids[["dtu-genes"]] )
## Output structure
+The output of `rats` is a list containing 3 elements:
+
+
+```r
+print( names(mydtu) )
+```
+
+```
+## [1] "Parameters" "Genes" "Transcripts"
+```
+
+
### Parameters
`Parameters` is a list that contains information about the data and the settings.
@@ -578,22 +575,22 @@ print( mydtu$Genes )
## 3: NA NA NA NA NA
## 4: NA NA NA NA NA
## 5: NA NA NA NA NA
-## 6: 0.83 0.002297371 0.0006862716 0.002867174 0.0009853328
+## 6: 0.82 0.002913862 0.0009343038 0.004617752 0.001553497
## 7: NA NA NA NA NA
-## 8: 1.00 0.000000000 0.0000000000 0.000000000 0.0000000000
+## 8: 1.00 0.000000000 0.0000000000 0.000000000 0.000000000
## 9: NA NA NA NA NA
-## 10: 0.00 0.784661756 0.7036691880 0.139088825 0.1925920036
+## 10: 0.00 0.776276584 0.6948216461 0.142400556 0.196181482
## boot_minAB boot_minBA boot_maxAB boot_maxBA boot_na
## 1: NA NA NA NA NA
## 2: NA NA NA NA NA
## 3: NA NA NA NA NA
## 4: NA NA NA NA NA
## 5: NA NA NA NA NA
-## 6: 5.030128e-05 1.550451e-05 0.01321885 0.00480727 0
+## 6: 5.030128e-05 1.550451e-05 0.0154553 0.00480727 0
## 7: NA NA NA NA NA
-## 8: 0.000000e+00 0.000000e+00 0.00000000 0.00000000 0
+## 8: 0.000000e+00 0.000000e+00 0.0000000 0.00000000 0
## 9: NA NA NA NA NA
-## 10: 5.143483e-01 3.311102e-01 0.99150834 0.98871978 0
+## 10: 5.143483e-01 3.311102e-01 0.9915083 0.98871978 0
```
There are 10 genes in the annotation used. Here are some possible scenarios:
@@ -713,19 +710,19 @@ print( mydtu$Transcripts )
## 6: NA NA NA NA NA NA
## 7: NA NA NA NA NA NA
## 8: NA NA NA NA NA NA
-## 9: TRUE 0.73 3.420637e-02 2.555941e-02 6.605471e-03 1.095659e-01
-## 10: TRUE 0.73 3.420637e-02 2.555941e-02 6.605471e-03 1.095659e-01
+## 9: TRUE 0.71 3.517732e-02 3.203049e-02 6.605471e-03 1.095659e-01
+## 10: TRUE 0.71 3.517732e-02 3.203049e-02 6.605471e-03 1.095659e-01
## 11: NA NA NA NA NA NA
-## 12: FALSE 0.09 5.871506e-01 3.743486e-01 1.380439e-02 1.000000e+00
-## 13: TRUE 1.00 2.253061e-78 4.627819e-78 5.067939e-84 2.129120e-77
-## 14: TRUE 1.00 7.211996e-49 1.620455e-48 6.413781e-53 6.559376e-48
-## 15: TRUE 0.00 1.658158e-21 2.826097e-21 1.066067e-24 8.294307e-21
-## 16: TRUE 1.00 1.147936e-44 1.666157e-44 1.516132e-46 7.296158e-44
+## 12: FALSE 0.02 6.565423e-01 3.300701e-01 1.380439e-02 1.000000e+00
+## 13: TRUE 1.00 1.754386e-78 4.381734e-78 5.067939e-84 2.129120e-77
+## 14: TRUE 1.00 3.094905e-49 7.881022e-49 6.413781e-53 6.559376e-48
+## 15: TRUE 0.00 1.950232e-21 3.008824e-21 1.066067e-24 8.294307e-21
+## 16: TRUE 1.00 1.372119e-44 1.970160e-44 1.516132e-46 7.296158e-44
## 17: NA NA NA NA NA NA
-## 18: FALSE 0.00 7.338926e-01 1.546231e-01 3.567434e-01 9.643718e-01
+## 18: FALSE 0.00 7.186030e-01 1.667606e-01 3.567434e-01 9.643718e-01
## 19: NA NA NA NA NA NA
-## 20: FALSE 0.00 9.501605e-01 6.529350e-02 7.948790e-01 1.000000e+00
-## 21: FALSE 0.00 9.501605e-01 6.529350e-02 7.948790e-01 1.000000e+00
+## 20: FALSE 0.00 9.408455e-01 7.414637e-02 7.948790e-01 1.000000e+00
+## 21: FALSE 0.00 9.408455e-01 7.414637e-02 7.948790e-01 1.000000e+00
## sig boot_freq boot_mean boot_stdev boot_min boot_max
## boot_na
## 1: NA
diff --git a/vignettes/tutorial_files/figure-html/unnamed-chunk-25-1.png b/vignettes/tutorial_files/figure-html/unnamed-chunk-25-1.png
index 8acf444..bdf0dc0 100644
Binary files a/vignettes/tutorial_files/figure-html/unnamed-chunk-25-1.png and b/vignettes/tutorial_files/figure-html/unnamed-chunk-25-1.png differ
diff --git a/vignettes/tutorial_files/figure-html/unnamed-chunk-26-1.png b/vignettes/tutorial_files/figure-html/unnamed-chunk-26-1.png
index ddbde80..8cd57ad 100644
Binary files a/vignettes/tutorial_files/figure-html/unnamed-chunk-26-1.png and b/vignettes/tutorial_files/figure-html/unnamed-chunk-26-1.png differ