Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
mikelove committed Apr 18, 2024
1 parent 49395a2 commit 3d54f1a
Show file tree
Hide file tree
Showing 18 changed files with 53 additions and 3,167 deletions.
Binary file added docs/figures/tidyomics_community.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/woodjoin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 14 additions & 7 deletions docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,18 @@
"text": "An open, open-source project spanning multiple R packages, and developers from around the world. Organized as a GitHub organization with GitHub Projects. For more:\n\nhttps://github.com/tidyomics\nhttps://www.biorxiv.org/content/10.1101/2023.09.10.557072v2\ntidiness_in_bioc channel in Bioconductor Slack"
},
{
"objectID": "tidy-intro-talk.html#diagram-of-the-tidyomics-project",
"href": "tidy-intro-talk.html#diagram-of-the-tidyomics-project",
"objectID": "tidy-intro-talk.html#diagram-of-tidyomics-workflows",
"href": "tidy-intro-talk.html#diagram-of-tidyomics-workflows",
"title": "Tidy Intro Talk",
"section": "Diagram of the tidyomics project",
"text": "Diagram of the tidyomics project"
"section": "Diagram of tidyomics workflows",
"text": "Diagram of tidyomics workflows"
},
{
"objectID": "tidy-intro-talk.html#international-development-team",
"href": "tidy-intro-talk.html#international-development-team",
"title": "Tidy Intro Talk",
"section": "International development team",
"text": "International development team"
},
{
"objectID": "tidy-intro-talk.html#objects-keep-data-organized",
Expand Down Expand Up @@ -88,7 +95,7 @@
"href": "tidy-intro-talk.html#genomic-overlap-as-a-join",
"title": "Tidy Intro Talk",
"section": "Genomic overlap as a join",
"text": "Genomic overlap as a join\n\nlibrary(plyranges)\nx\n\nGRanges object with 40 ranges and 1 metadata column:\n seqnames ranges strand | score\n <Rle> <IRanges> <Rle> | <numeric>\n [1] 1 114-115 * | 4.88780\n [2] 1 129-130 * | 4.40817\n [3] 1 154-155 * | 5.18773\n [4] 1 195-196 * | 5.81901\n [5] 1 200-201 * | 4.14720\n ... ... ... ... . ...\n [36] 1 898-899 * | 6.55006\n [37] 1 922-923 * | 6.46796\n [38] 1 956-957 * | 4.93079\n [39] 1 957-958 * | 4.34292\n [40] 1 966-967 * | 4.53976\n -------\n seqinfo: 1 sequence from an unspecified genome\n\ny\n\nGRanges object with 3 ranges and 1 metadata column:\n seqnames ranges strand | id\n <Rle> <IRanges> <Rle> | <character>\n [1] 1 101-300 * | a\n [2] 1 451-650 * | b\n [3] 1 801-1000 * | c\n -------\n seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nx |> join_overlap_inner(y)\n\nGRanges object with 30 ranges and 2 metadata columns:\n seqnames ranges strand | score id\n <Rle> <IRanges> <Rle> | <numeric> <character>\n [1] 1 114-115 * | 4.88780 a\n [2] 1 129-130 * | 4.40817 a\n [3] 1 154-155 * | 5.18773 a\n [4] 1 195-196 * | 5.81901 a\n [5] 1 200-201 * | 4.14720 a\n ... ... ... ... . ... ...\n [26] 1 898-899 * | 6.55006 c\n [27] 1 922-923 * | 6.46796 c\n [28] 1 956-957 * | 4.93079 c\n [29] 1 957-958 * | 4.34292 c\n [30] 1 966-967 * | 4.53976 c\n -------\n seqinfo: 1 sequence from an unspecified genome\n\n\nMany options, directed, within, maxgap, minoverlap, etc.\n\n# chaining operations\nx |>\n filter(score > 3.5) |>\n join_overlap_inner(y) |>\n group_by(id) |>\n summarize(ave_score = mean(score), n = n())\n\nDataFrame with 3 rows and 3 columns\n id ave_score n\n <character> <numeric> <integer>\n1 a 5.00465 10\n2 b 5.43353 11\n3 c 5.45538 7\n\n\n\n# pipe to plot\nx |>\n filter(score > 3.5) |>\n join_overlap_inner(y) |>\n as_tibble() |>\n ggplot(aes(x = id, y = score)) + \n geom_violin() + geom_jitter(width=.1)\n\n\n\n\n\n# many convenience functions\ny |> \n anchor_5p() |> # 5', 3', start, end center\n mutate(width=2) |>\n join_nearest(x, distance=TRUE)\n\nGRanges object with 3 ranges and 3 metadata columns:\n seqnames ranges strand | id score distance\n <Rle> <IRanges> <Rle> | <character> <numeric> <integer>\n [1] 1 101-102 * | a 4.88780 11\n [2] 1 451-452 * | b 3.99047 1\n [3] 1 801-802 * | c 6.49877 21\n -------\n seqinfo: 1 sequence from an unspecified genome; no seqlengths"
"text": "Genomic overlap as a join\n\nlibrary(plyranges)\nx\n\nGRanges object with 40 ranges and 1 metadata column:\n seqnames ranges strand | score\n <Rle> <IRanges> <Rle> | <numeric>\n [1] 1 114-115 * | 4.88780\n [2] 1 129-130 * | 4.40817\n [3] 1 154-155 * | 5.18773\n [4] 1 195-196 * | 5.81901\n [5] 1 200-201 * | 4.14720\n ... ... ... ... . ...\n [36] 1 898-899 * | 6.55006\n [37] 1 922-923 * | 6.46796\n [38] 1 956-957 * | 4.93079\n [39] 1 957-958 * | 4.34292\n [40] 1 966-967 * | 4.53976\n -------\n seqinfo: 1 sequence from an unspecified genome\n\ny\n\nGRanges object with 3 ranges and 1 metadata column:\n seqnames ranges strand | id\n <Rle> <IRanges> <Rle> | <character>\n [1] 1 101-300 * | a\n [2] 1 451-650 * | b\n [3] 1 801-1000 * | c\n -------\n seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\nx |> join_overlap_inner(y)\n\nGRanges object with 30 ranges and 2 metadata columns:\n seqnames ranges strand | score id\n <Rle> <IRanges> <Rle> | <numeric> <character>\n [1] 1 114-115 * | 4.88780 a\n [2] 1 129-130 * | 4.40817 a\n [3] 1 154-155 * | 5.18773 a\n [4] 1 195-196 * | 5.81901 a\n [5] 1 200-201 * | 4.14720 a\n ... ... ... ... . ... ...\n [26] 1 898-899 * | 6.55006 c\n [27] 1 922-923 * | 6.46796 c\n [28] 1 956-957 * | 4.93079 c\n [29] 1 957-958 * | 4.34292 c\n [30] 1 966-967 * | 4.53976 c\n -------\n seqinfo: 1 sequence from an unspecified genome\n\n\nMany options, directed, within, maxgap, minoverlap, etc.\n\n# chaining operations\nx |>\n filter(score > 3.5) |>\n join_overlap_inner(y) |>\n group_by(id) |>\n summarize(ave_score = mean(score), n = n())\n\nDataFrame with 3 rows and 3 columns\n id ave_score n\n <character> <numeric> <integer>\n1 a 5.00465 10\n2 b 5.43353 11\n3 c 5.45538 7\n\n\n\n# pipe to plot\nx |>\n filter(score > 3.5) |>\n join_overlap_inner(y) |>\n as_tibble() |>\n ggplot(aes(x = id, y = score)) + \n geom_violin() + geom_jitter(width=.1)\n\n\n\n\n\n# many convenience functions\ny |> \n anchor_5p() |> # 5', 3', start, end center\n mutate(width=2) |>\n join_nearest(x, distance=TRUE)\n\nGRanges object with 3 ranges and 3 metadata columns:\n seqnames ranges strand | id score distance\n <Rle> <IRanges> <Rle> | <character> <numeric> <integer>\n [1] 1 101-102 * | a 4.88780 11\n [2] 1 451-452 * | b 3.99047 1\n [3] 1 801-802 * | c 6.49877 21\n -------\n seqinfo: 1 sequence from an unspecified genome; no seqlengths"
},
{
"objectID": "tidy-intro-talk.html#nullranges",
Expand Down Expand Up @@ -123,14 +130,14 @@
"href": "tidy-intro-talk.html#limitations",
"title": "Tidy Intro Talk",
"section": "Limitations",
"text": "Limitations\n\npackage code and non-standard evaluation\noptimized code, e.g. matrix operations\n\n\nlibrary(tidySummarizedExperiment)\nse_test\n\n# A SummarizedExperiment-tibble abstraction: 100,000 × 4\n# \u001b[90mFeatures=1000 | Samples=100 | Assays=counts\u001b[0m\n .feature .sample counts x\n <chr> <chr> <int> <int>\n 1 1 1 94 1\n 2 2 1 79 2\n 3 3 1 96 3\n 4 4 1 94 4\n 5 5 1 99 5\n 6 6 1 130 6\n 7 7 1 115 7\n 8 8 1 100 8\n 9 9 1 91 9\n10 10 1 85 10\n# ℹ 40 more rows\n\n# looping over 1000 genes, averaging 100 samples\nse_test |>\n group_by(.feature) |>\n summarize(ave_count = mean(counts))\n\ntidySummarizedExperiment says: A data frame is returned for independent data analysis.\n\n\n# A tibble: 1,000 × 2\n .feature ave_count\n <chr> <dbl>\n 1 1 99.3\n 2 10 102. \n 3 100 101. \n 4 1000 99.8\n 5 101 101. \n 6 102 100. \n 7 103 101. \n 8 104 102. \n 9 105 99.4\n10 106 100. \n# ℹ 990 more rows\n\n\n\nlibrary(microbenchmark)\nmb <- microbenchmark(tidy = tidy_version(se_test), \n baseR = rowMeans(assay(se_test)), times = 5)\n\nWarning in microbenchmark(tidy = tidy_version(se_test), baseR = rowMeans(assay(se_test)), : less\naccurate nanosecond times to avoid potential integer overflows\n\nprint(mb, unit=\"s\", signif=2)\n\nUnit: seconds\n expr min lq mean median uq max neval\n tidy 0.033 0.0340 0.0340 0.0340 0.0340 0.036 5\n baseR 0.001 0.0013 0.0014 0.0013 0.0014 0.002 5"
"text": "Limitations\n\npackage code and non-standard evaluation\noptimized code, e.g. matrix operations\n\n\nlibrary(tidySummarizedExperiment)\nse_test\n\n# A SummarizedExperiment-tibble abstraction: 100,000 × 4\n# \u001b[90mFeatures=1000 | Samples=100 | Assays=counts\u001b[0m\n .feature .sample counts x\n <chr> <chr> <int> <int>\n 1 1 1 94 1\n 2 2 1 79 2\n 3 3 1 96 3\n 4 4 1 94 4\n 5 5 1 99 5\n 6 6 1 130 6\n 7 7 1 115 7\n 8 8 1 100 8\n 9 9 1 91 9\n10 10 1 85 10\n# ℹ 40 more rows\n\n# looping over 1000 genes, averaging 100 samples\nse_test |>\n group_by(.feature) |>\n summarize(ave_count = mean(counts))\n\ntidySummarizedExperiment says: A data frame is returned for independent data analysis.\n\n\n# A tibble: 1,000 × 2\n .feature ave_count\n <chr> <dbl>\n 1 1 99.3\n 2 10 102. \n 3 100 101. \n 4 1000 99.8\n 5 101 101. \n 6 102 100. \n 7 103 101. \n 8 104 102. \n 9 105 99.4\n10 106 100. \n# ℹ 990 more rows\n\n\n\nlibrary(microbenchmark)\nmb <- microbenchmark(tidy = tidy_version(se_test), \n baseR = rowMeans(assay(se_test)), times = 5)\n\nWarning in microbenchmark(tidy = tidy_version(se_test), baseR = rowMeans(assay(se_test)), : less\naccurate nanosecond times to avoid potential integer overflows\n\nprint(mb, unit=\"s\", signif=2)\n\nUnit: seconds\n expr min lq mean median uq max neval\n tidy 0.033 0.0330 0.0330 0.0330 0.0340 0.034 5\n baseR 0.001 0.0013 0.0014 0.0013 0.0013 0.002 5"
},
{
"objectID": "tidy-intro-talk.html#outro",
"href": "tidy-intro-talk.html#outro",
"title": "Tidy Intro Talk",
"section": "Outro",
"text": "Outro\nRecommend genomic data analysts are always checking:\n\nmain contributions to variance (e.g. PCA, see plotPCA for bulk and OSCA for sc)\ncolumn and row densities (tidySE allows directly plotting geom_density of rows/columns, or geom_violin)\nknown positive features, feature-level plots (filter to feature, pipe to geom_point etc.)"
"text": "Outro\nRecommend genomic data analysts are always checking:\n\nmain contributions to variance (e.g. PCA, see plotPCA for bulk and OSCA for sc)\ncolumn and row densities (tidySE allows directly plotting geom_density of rows/columns, or geom_violin)\nknown positive features, feature-level plots (filter to feature, pipe to geom_point etc.)\n\nIf you’re interested in more complicated use cases of tidyomics see this online book:\n\nTidy ranges tutorial"
},
{
"objectID": "tidy-intro-talk.html#contributors",
Expand Down
27 changes: 19 additions & 8 deletions docs/tidy-intro-talk.html
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,13 @@ <h2 class="anchored" data-anchor-id="tidyomics-project">Tidyomics project</h2>
<li><p><code>tidiness_in_bioc</code> channel in Bioconductor Slack</p></li>
</ul>
</section>
<section id="diagram-of-the-tidyomics-project" class="level2">
<h2 class="anchored" data-anchor-id="diagram-of-the-tidyomics-project">Diagram of the tidyomics project</h2>
<p><img src="figures/figure2.png" class="img-fluid"></p>
<section id="diagram-of-tidyomics-workflows" class="level2">
<h2 class="anchored" data-anchor-id="diagram-of-tidyomics-workflows">Diagram of tidyomics workflows</h2>
<p><img src="figures/figure2.png" class="img-fluid" alt="Diagram of how packages share a similar grammar to operate on data objects. From top to bottom, the data are analyzed, manipulated, and made into plots."></p>
</section>
<section id="international-development-team" class="level2">
<h2 class="anchored" data-anchor-id="international-development-team">International development team</h2>
<p><img src="figures/tidyomics_community.png" class="img-fluid" alt="Diagram of tidyomics community, with users and developers interacting. On the top are users with arrows coming from developers and packages. On the bottom is the extended community, including Bioconductor."></p>
</section>
<section id="objects-keep-data-organized" class="level2">
<h2 class="anchored" data-anchor-id="objects-keep-data-organized">Objects keep data organized</h2>
Expand Down Expand Up @@ -351,7 +355,7 @@ <h2 class="anchored" data-anchor-id="enabling-dplyr-verbs-for-omics">Enabling dp
</div>
<p>What does this mean “<em>SE-tibble abstraction</em>”?</p>
<p>Essentially this is an API, we can use our familiar verbs and interact with the native object.</p>
<p><img src="figures/counter.png" class="img-fluid"></p>
<p><img src="figures/counter.png" class="img-fluid" alt="Picture of a counter with a menu and a bell"></p>
</section>
<section id="still-a-standard-bioc-object" class="level2">
<h2 class="anchored" data-anchor-id="still-a-standard-bioc-object">Still a standard Bioc object</h2>
Expand Down Expand Up @@ -559,6 +563,9 @@ <h2 class="anchored" data-anchor-id="genomic-overlap-as-a-join">Genomic overlap
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths</code></pre>
</div>
</div>
<p><img src="figures/woodjoin.png" class="img-fluid" alt="Picture of two stacks of wood being interleaved"></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb45"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a>x <span class="sc">|&gt;</span> <span class="fu">join_overlap_inner</span>(y)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>GRanges object with 30 ranges and 2 metadata columns:
Expand Down Expand Up @@ -633,7 +640,7 @@ <h2 class="anchored" data-anchor-id="nullranges"><code>nullranges</code></h2>
<section id="bootstrapping-ranges" class="level2">
<h2 class="anchored" data-anchor-id="bootstrapping-ranges">Bootstrapping ranges</h2>
<p>Statistical papers from the ENCODE project noted that <em>block bootstrapping</em> genomic data preserves important spatial patterns (Bickel <em>et al.</em> 2010).</p>
<p><img src="figures/boot.png" class="img-fluid"></p>
<p><img src="figures/boot.png" class="img-fluid" alt="Diagram of block bootstrapping genomic ranges. Blocks are resampled from original data and arranged to form new range sets."></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb52"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><a href="#cb52-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(nullranges)</span>
<span id="cb52-2"><a href="#cb52-2" aria-hidden="true" tabindex="-1"></a>boot <span class="ot">&lt;-</span> <span class="fu">bootRanges</span>(x, <span class="at">blockLength=</span><span class="dv">10</span>, <span class="at">R=</span><span class="dv">20</span>)</span>
Expand Down Expand Up @@ -675,7 +682,7 @@ <h2 class="anchored" data-anchor-id="bootstrapping-ranges">Bootstrapping ranges<
<section id="matching-ranges" class="level2">
<h2 class="anchored" data-anchor-id="matching-ranges">Matching ranges</h2>
<p>Matching on covariates from a large pool allows for more focused hypothesis testing.</p>
<p><img src="figures/match.png" class="img-fluid"></p>
<p><img src="figures/match.png" class="img-fluid" alt="Diagram of matching genomic ranges. A pool of different colored ranges are drawn from to match the warmer colors of a focal set of ranges."></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb55"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a>xprime <span class="ot">&lt;-</span> x <span class="sc">|&gt;</span></span>
<span id="cb55-2"><a href="#cb55-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">filter</span>(score <span class="sc">&gt;</span> <span class="dv">5</span>) <span class="sc">|&gt;</span></span>
Expand Down Expand Up @@ -825,8 +832,8 @@ <h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
<div class="cell-output cell-output-stdout">
<pre><code>Unit: seconds
expr min lq mean median uq max neval
tidy 0.033 0.0340 0.0340 0.0340 0.0340 0.036 5
baseR 0.001 0.0013 0.0014 0.0013 0.0014 0.002 5</code></pre>
tidy 0.033 0.0330 0.0330 0.0330 0.0340 0.034 5
baseR 0.001 0.0013 0.0014 0.0013 0.0013 0.002 5</code></pre>
</div>
</div>
</section>
Expand All @@ -838,6 +845,10 @@ <h2 class="anchored" data-anchor-id="outro">Outro</h2>
<li>column and row densities (<code>tidySE</code> allows directly plotting <code>geom_density</code> of rows/columns, or <code>geom_violin</code>)</li>
<li>known positive features, feature-level plots (<code>filter</code> to feature, pipe to <code>geom_point</code> etc.)</li>
</ul>
<p>If you’re interested in more complicated use cases of <code>tidyomics</code> see this online book:</p>
<ul>
<li><a href="https://tidyomics.github.io/tidy-ranges-tutorial">Tidy ranges tutorial</a></li>
</ul>
</section>
<section id="contributors" class="level2">
<h2 class="anchored" data-anchor-id="contributors">Contributors</h2>
Expand Down
Loading

0 comments on commit 3d54f1a

Please sign in to comment.