update

tidyomics · Apr 18, 2024 · 3d54f1a · 3d54f1a
1 parent 49395a2
commit 3d54f1a
Show file tree

Hide file tree

Showing 18 changed files with 53 additions and 3,167 deletions.
diff --git a/docs/figures/tidyomics_community.png b/docs/figures/tidyomics_community.png
diff --git a/docs/figures/woodjoin.png b/docs/figures/woodjoin.png
diff --git a/docs/search.json b/docs/search.json
@@ -14,11 +14,18 @@
     "text": "An open, open-source project spanning multiple R packages, and developers from around the world. Organized as a GitHub organization with GitHub Projects. For more:\n\nhttps://github.com/tidyomics\nhttps://www.biorxiv.org/content/10.1101/2023.09.10.557072v2\ntidiness_in_bioc channel in Bioconductor Slack"
   },
   {
-    "objectID": "tidy-intro-talk.html#diagram-of-the-tidyomics-project",
-    "href": "tidy-intro-talk.html#diagram-of-the-tidyomics-project",
+    "objectID": "tidy-intro-talk.html#diagram-of-tidyomics-workflows",
+    "href": "tidy-intro-talk.html#diagram-of-tidyomics-workflows",
     "title": "Tidy Intro Talk",
-    "section": "Diagram of the tidyomics project",
-    "text": "Diagram of the tidyomics project"
+    "section": "Diagram of tidyomics workflows",
+    "text": "Diagram of tidyomics workflows"
+  },
+  {
+    "objectID": "tidy-intro-talk.html#international-development-team",
+    "href": "tidy-intro-talk.html#international-development-team",
+    "title": "Tidy Intro Talk",
+    "section": "International development team",
+    "text": "International development team"
   },
   {
     "objectID": "tidy-intro-talk.html#objects-keep-data-organized",
@@ -88,7 +95,7 @@
     "href": "tidy-intro-talk.html#genomic-overlap-as-a-join",
     "title": "Tidy Intro Talk",
     "section": "Genomic overlap as a join",
-    "text": "Genomic overlap as a join\n\nlibrary(plyranges)\nx\n\nGRanges object with 40 ranges and 1 metadata column:\n       seqnames    ranges strand |     score\n          &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;\n   [1]        1   114-115      * |   4.88780\n   [2]        1   129-130      * |   4.40817\n   [3]        1   154-155      * |   5.18773\n   [4]        1   195-196      * |   5.81901\n   [5]        1   200-201      * |   4.14720\n   ...      ...       ...    ... .       ...\n  [36]        1   898-899      * |   6.55006\n  [37]        1   922-923      * |   6.46796\n  [38]        1   956-957      * |   4.93079\n  [39]        1   957-958      * |   4.34292\n  [40]        1   966-967      * |   4.53976\n  -------\n  seqinfo: 1 sequence from an unspecified genome\n\ny\n\nGRanges object with 3 ranges and 1 metadata column:\n      seqnames    ranges strand |          id\n         &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n  [1]        1   101-300      * |           a\n  [2]        1   451-650      * |           b\n  [3]        1  801-1000      * |           c\n  -------\n  seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nx |&gt; join_overlap_inner(y)\n\nGRanges object with 30 ranges and 2 metadata columns:\n       seqnames    ranges strand |     score          id\n          &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;character&gt;\n   [1]        1   114-115      * |   4.88780           a\n   [2]        1   129-130      * |   4.40817           a\n   [3]        1   154-155      * |   5.18773           a\n   [4]        1   195-196      * |   5.81901           a\n   [5]        1   200-201      * |   4.14720           a\n   ...      ...       ...    ... .       ...         ...\n  [26]        1   898-899      * |   6.55006           c\n  [27]        1   922-923      * |   6.46796           c\n  [28]        1   956-957      * |   4.93079           c\n  [29]        1   957-958      * |   4.34292           c\n  [30]        1   966-967      * |   4.53976           c\n  -------\n  seqinfo: 1 sequence from an unspecified genome\n\n\nMany options, directed, within, maxgap, minoverlap, etc.\n\n# chaining operations\nx |&gt;\n  filter(score &gt; 3.5) |&gt;\n  join_overlap_inner(y) |&gt;\n  group_by(id) |&gt;\n  summarize(ave_score = mean(score), n = n())\n\nDataFrame with 3 rows and 3 columns\n           id ave_score         n\n  &lt;character&gt; &lt;numeric&gt; &lt;integer&gt;\n1           a   5.00465        10\n2           b   5.43353        11\n3           c   5.45538         7\n\n\n\n# pipe to plot\nx |&gt;\n  filter(score &gt; 3.5) |&gt;\n  join_overlap_inner(y) |&gt;\n  as_tibble() |&gt;\n  ggplot(aes(x = id, y = score)) + \n  geom_violin() + geom_jitter(width=.1)\n\n\n\n\n\n# many convenience functions\ny |&gt; \n  anchor_5p() |&gt; # 5', 3', start, end center\n  mutate(width=2) |&gt;\n  join_nearest(x, distance=TRUE)\n\nGRanges object with 3 ranges and 3 metadata columns:\n      seqnames    ranges strand |          id     score  distance\n         &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;numeric&gt; &lt;integer&gt;\n  [1]        1   101-102      * |           a   4.88780        11\n  [2]        1   451-452      * |           b   3.99047         1\n  [3]        1   801-802      * |           c   6.49877        21\n  -------\n  seqinfo: 1 sequence from an unspecified genome; no seqlengths"
+    "text": "Genomic overlap as a join\n\nlibrary(plyranges)\nx\n\nGRanges object with 40 ranges and 1 metadata column:\n       seqnames    ranges strand |     score\n          &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;\n   [1]        1   114-115      * |   4.88780\n   [2]        1   129-130      * |   4.40817\n   [3]        1   154-155      * |   5.18773\n   [4]        1   195-196      * |   5.81901\n   [5]        1   200-201      * |   4.14720\n   ...      ...       ...    ... .       ...\n  [36]        1   898-899      * |   6.55006\n  [37]        1   922-923      * |   6.46796\n  [38]        1   956-957      * |   4.93079\n  [39]        1   957-958      * |   4.34292\n  [40]        1   966-967      * |   4.53976\n  -------\n  seqinfo: 1 sequence from an unspecified genome\n\ny\n\nGRanges object with 3 ranges and 1 metadata column:\n      seqnames    ranges strand |          id\n         &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n  [1]        1   101-300      * |           a\n  [2]        1   451-650      * |           b\n  [3]        1  801-1000      * |           c\n  -------\n  seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\nx |&gt; join_overlap_inner(y)\n\nGRanges object with 30 ranges and 2 metadata columns:\n       seqnames    ranges strand |     score          id\n          &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;character&gt;\n   [1]        1   114-115      * |   4.88780           a\n   [2]        1   129-130      * |   4.40817           a\n   [3]        1   154-155      * |   5.18773           a\n   [4]        1   195-196      * |   5.81901           a\n   [5]        1   200-201      * |   4.14720           a\n   ...      ...       ...    ... .       ...         ...\n  [26]        1   898-899      * |   6.55006           c\n  [27]        1   922-923      * |   6.46796           c\n  [28]        1   956-957      * |   4.93079           c\n  [29]        1   957-958      * |   4.34292           c\n  [30]        1   966-967      * |   4.53976           c\n  -------\n  seqinfo: 1 sequence from an unspecified genome\n\n\nMany options, directed, within, maxgap, minoverlap, etc.\n\n# chaining operations\nx |&gt;\n  filter(score &gt; 3.5) |&gt;\n  join_overlap_inner(y) |&gt;\n  group_by(id) |&gt;\n  summarize(ave_score = mean(score), n = n())\n\nDataFrame with 3 rows and 3 columns\n           id ave_score         n\n  &lt;character&gt; &lt;numeric&gt; &lt;integer&gt;\n1           a   5.00465        10\n2           b   5.43353        11\n3           c   5.45538         7\n\n\n\n# pipe to plot\nx |&gt;\n  filter(score &gt; 3.5) |&gt;\n  join_overlap_inner(y) |&gt;\n  as_tibble() |&gt;\n  ggplot(aes(x = id, y = score)) + \n  geom_violin() + geom_jitter(width=.1)\n\n\n\n\n\n# many convenience functions\ny |&gt; \n  anchor_5p() |&gt; # 5', 3', start, end center\n  mutate(width=2) |&gt;\n  join_nearest(x, distance=TRUE)\n\nGRanges object with 3 ranges and 3 metadata columns:\n      seqnames    ranges strand |          id     score  distance\n         &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;numeric&gt; &lt;integer&gt;\n  [1]        1   101-102      * |           a   4.88780        11\n  [2]        1   451-452      * |           b   3.99047         1\n  [3]        1   801-802      * |           c   6.49877        21\n  -------\n  seqinfo: 1 sequence from an unspecified genome; no seqlengths"
   },
   {
     "objectID": "tidy-intro-talk.html#nullranges",
@@ -123,14 +130,14 @@
     "href": "tidy-intro-talk.html#limitations",
     "title": "Tidy Intro Talk",
     "section": "Limitations",
-    "text": "Limitations\n\npackage code and non-standard evaluation\noptimized code, e.g. matrix operations\n\n\nlibrary(tidySummarizedExperiment)\nse_test\n\n# A SummarizedExperiment-tibble abstraction: 100,000 × 4\n# \u001b[90mFeatures=1000 | Samples=100 | Assays=counts\u001b[0m\n   .feature .sample counts     x\n   &lt;chr&gt;    &lt;chr&gt;    &lt;int&gt; &lt;int&gt;\n 1 1        1           94     1\n 2 2        1           79     2\n 3 3        1           96     3\n 4 4        1           94     4\n 5 5        1           99     5\n 6 6        1          130     6\n 7 7        1          115     7\n 8 8        1          100     8\n 9 9        1           91     9\n10 10       1           85    10\n# ℹ 40 more rows\n\n# looping over 1000 genes, averaging 100 samples\nse_test |&gt;\n  group_by(.feature) |&gt;\n  summarize(ave_count = mean(counts))\n\ntidySummarizedExperiment says: A data frame is returned for independent data analysis.\n\n\n# A tibble: 1,000 × 2\n   .feature ave_count\n   &lt;chr&gt;        &lt;dbl&gt;\n 1 1             99.3\n 2 10           102. \n 3 100          101. \n 4 1000          99.8\n 5 101          101. \n 6 102          100. \n 7 103          101. \n 8 104          102. \n 9 105           99.4\n10 106          100. \n# ℹ 990 more rows\n\n\n\nlibrary(microbenchmark)\nmb &lt;- microbenchmark(tidy = tidy_version(se_test), \n                     baseR = rowMeans(assay(se_test)), times = 5)\n\nWarning in microbenchmark(tidy = tidy_version(se_test), baseR = rowMeans(assay(se_test)), : less\naccurate nanosecond times to avoid potential integer overflows\n\nprint(mb, unit=\"s\", signif=2)\n\nUnit: seconds\n  expr   min     lq   mean median     uq   max neval\n  tidy 0.033 0.0340 0.0340 0.0340 0.0340 0.036     5\n baseR 0.001 0.0013 0.0014 0.0013 0.0014 0.002     5"
+    "text": "Limitations\n\npackage code and non-standard evaluation\noptimized code, e.g. matrix operations\n\n\nlibrary(tidySummarizedExperiment)\nse_test\n\n# A SummarizedExperiment-tibble abstraction: 100,000 × 4\n# \u001b[90mFeatures=1000 | Samples=100 | Assays=counts\u001b[0m\n   .feature .sample counts     x\n   &lt;chr&gt;    &lt;chr&gt;    &lt;int&gt; &lt;int&gt;\n 1 1        1           94     1\n 2 2        1           79     2\n 3 3        1           96     3\n 4 4        1           94     4\n 5 5        1           99     5\n 6 6        1          130     6\n 7 7        1          115     7\n 8 8        1          100     8\n 9 9        1           91     9\n10 10       1           85    10\n# ℹ 40 more rows\n\n# looping over 1000 genes, averaging 100 samples\nse_test |&gt;\n  group_by(.feature) |&gt;\n  summarize(ave_count = mean(counts))\n\ntidySummarizedExperiment says: A data frame is returned for independent data analysis.\n\n\n# A tibble: 1,000 × 2\n   .feature ave_count\n   &lt;chr&gt;        &lt;dbl&gt;\n 1 1             99.3\n 2 10           102. \n 3 100          101. \n 4 1000          99.8\n 5 101          101. \n 6 102          100. \n 7 103          101. \n 8 104          102. \n 9 105           99.4\n10 106          100. \n# ℹ 990 more rows\n\n\n\nlibrary(microbenchmark)\nmb &lt;- microbenchmark(tidy = tidy_version(se_test), \n                     baseR = rowMeans(assay(se_test)), times = 5)\n\nWarning in microbenchmark(tidy = tidy_version(se_test), baseR = rowMeans(assay(se_test)), : less\naccurate nanosecond times to avoid potential integer overflows\n\nprint(mb, unit=\"s\", signif=2)\n\nUnit: seconds\n  expr   min     lq   mean median     uq   max neval\n  tidy 0.033 0.0330 0.0330 0.0330 0.0340 0.034     5\n baseR 0.001 0.0013 0.0014 0.0013 0.0013 0.002     5"
   },
   {
     "objectID": "tidy-intro-talk.html#outro",
     "href": "tidy-intro-talk.html#outro",
     "title": "Tidy Intro Talk",
     "section": "Outro",
-    "text": "Outro\nRecommend genomic data analysts are always checking:\n\nmain contributions to variance (e.g. PCA, see plotPCA for bulk and OSCA for sc)\ncolumn and row densities (tidySE allows directly plotting geom_density of rows/columns, or geom_violin)\nknown positive features, feature-level plots (filter to feature, pipe to geom_point etc.)"
+    "text": "Outro\nRecommend genomic data analysts are always checking:\n\nmain contributions to variance (e.g. PCA, see plotPCA for bulk and OSCA for sc)\ncolumn and row densities (tidySE allows directly plotting geom_density of rows/columns, or geom_violin)\nknown positive features, feature-level plots (filter to feature, pipe to geom_point etc.)\n\nIf you’re interested in more complicated use cases of tidyomics see this online book:\n\nTidy ranges tutorial"
   },
   {
     "objectID": "tidy-intro-talk.html#contributors",

diff --git a/docs/tidy-intro-talk.html b/docs/tidy-intro-talk.html
@@ -139,9 +139,13 @@ <h2 class="anchored" data-anchor-id="tidyomics-project">Tidyomics project</h2>
 <li><p><code>tidiness_in_bioc</code> channel in Bioconductor Slack</p></li>
 </ul>
 </section>
-<section id="diagram-of-the-tidyomics-project" class="level2">
-<h2 class="anchored" data-anchor-id="diagram-of-the-tidyomics-project">Diagram of the tidyomics project</h2>
-<p><img src="figures/figure2.png" class="img-fluid"></p>
+<section id="diagram-of-tidyomics-workflows" class="level2">
+<h2 class="anchored" data-anchor-id="diagram-of-tidyomics-workflows">Diagram of tidyomics workflows</h2>
+<p><img src="figures/figure2.png" class="img-fluid" alt="Diagram of how packages share a similar grammar to operate on data objects. From top to bottom, the data are analyzed, manipulated, and made into plots."></p>
+</section>
+<section id="international-development-team" class="level2">
+<h2 class="anchored" data-anchor-id="international-development-team">International development team</h2>
+<p><img src="figures/tidyomics_community.png" class="img-fluid" alt="Diagram of tidyomics community, with users and developers interacting. On the top are users with arrows coming from developers and packages. On the bottom is the extended community, including Bioconductor."></p>
 </section>
 <section id="objects-keep-data-organized" class="level2">
 <h2 class="anchored" data-anchor-id="objects-keep-data-organized">Objects keep data organized</h2>
@@ -351,7 +355,7 @@ <h2 class="anchored" data-anchor-id="enabling-dplyr-verbs-for-omics">Enabling dp
 </div>
 <p>What does this mean “<em>SE-tibble abstraction</em>”?</p>
 <p>Essentially this is an API, we can use our familiar verbs and interact with the native object.</p>
-<p><img src="figures/counter.png" class="img-fluid"></p>
+<p><img src="figures/counter.png" class="img-fluid" alt="Picture of a counter with a menu and a bell"></p>
 </section>
 <section id="still-a-standard-bioc-object" class="level2">
 <h2 class="anchored" data-anchor-id="still-a-standard-bioc-object">Still a standard Bioc object</h2>
@@ -559,6 +563,9 @@ <h2 class="anchored" data-anchor-id="genomic-overlap-as-a-join">Genomic overlap
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths</code></pre>
 </div>
+</div>
+<p><img src="figures/woodjoin.png" class="img-fluid" alt="Picture of two stacks of wood being interleaved"></p>
+<div class="cell">
 <div class="sourceCode cell-code" id="cb45"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a>x <span class="sc">|&gt;</span> <span class="fu">join_overlap_inner</span>(y)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code>GRanges object with 30 ranges and 2 metadata columns:
@@ -633,7 +640,7 @@ <h2 class="anchored" data-anchor-id="nullranges"><code>nullranges</code></h2>
 <section id="bootstrapping-ranges" class="level2">
 <h2 class="anchored" data-anchor-id="bootstrapping-ranges">Bootstrapping ranges</h2>
 <p>Statistical papers from the ENCODE project noted that <em>block bootstrapping</em> genomic data preserves important spatial patterns (Bickel <em>et al.</em> 2010).</p>
-<p><img src="figures/boot.png" class="img-fluid"></p>
+<p><img src="figures/boot.png" class="img-fluid" alt="Diagram of block bootstrapping genomic ranges. Blocks are resampled from original data and arranged to form new range sets."></p>
 <div class="cell">
 <div class="sourceCode cell-code" id="cb52"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><a href="#cb52-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(nullranges)</span>
 <span id="cb52-2"><a href="#cb52-2" aria-hidden="true" tabindex="-1"></a>boot <span class="ot">&lt;-</span> <span class="fu">bootRanges</span>(x, <span class="at">blockLength=</span><span class="dv">10</span>, <span class="at">R=</span><span class="dv">20</span>)</span>
@@ -675,7 +682,7 @@ <h2 class="anchored" data-anchor-id="bootstrapping-ranges">Bootstrapping ranges<
 <section id="matching-ranges" class="level2">
 <h2 class="anchored" data-anchor-id="matching-ranges">Matching ranges</h2>
 <p>Matching on covariates from a large pool allows for more focused hypothesis testing.</p>
-<p><img src="figures/match.png" class="img-fluid"></p>
+<p><img src="figures/match.png" class="img-fluid" alt="Diagram of matching genomic ranges. A pool of different colored ranges are drawn from to match the warmer colors of a focal set of ranges."></p>
 <div class="cell">
 <div class="sourceCode cell-code" id="cb55"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a>xprime <span class="ot">&lt;-</span> x <span class="sc">|&gt;</span></span>
 <span id="cb55-2"><a href="#cb55-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(score <span class="sc">&gt;</span> <span class="dv">5</span>) <span class="sc">|&gt;</span></span>
@@ -825,8 +832,8 @@ <h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
 <div class="cell-output cell-output-stdout">
 <pre><code>Unit: seconds
   expr   min     lq   mean median     uq   max neval
-  tidy 0.033 0.0340 0.0340 0.0340 0.0340 0.036     5
- baseR 0.001 0.0013 0.0014 0.0013 0.0014 0.002     5</code></pre>
+  tidy 0.033 0.0330 0.0330 0.0330 0.0340 0.034     5
+ baseR 0.001 0.0013 0.0014 0.0013 0.0013 0.002     5</code></pre>
 </div>
 </div>
 </section>
@@ -838,6 +845,10 @@ <h2 class="anchored" data-anchor-id="outro">Outro</h2>
 <li>column and row densities (<code>tidySE</code> allows directly plotting <code>geom_density</code> of rows/columns, or <code>geom_violin</code>)</li>
 <li>known positive features, feature-level plots (<code>filter</code> to feature, pipe to <code>geom_point</code> etc.)</li>
 </ul>
+<p>If you’re interested in more complicated use cases of <code>tidyomics</code> see this online book:</p>
+<ul>
+<li><a href="https://tidyomics.github.io/tidy-ranges-tutorial">Tidy ranges tutorial</a></li>
+</ul>
 </section>
 <section id="contributors" class="level2">
 <h2 class="anchored" data-anchor-id="contributors">Contributors</h2>