Enhance documentation (#95)

BiocPy · Jun 14, 2024 · 52171ea · 52171ea
1 parent 594bd00
commit 52171ea
Showing 1 changed file with 47 additions and 37 deletions.
diff --git a/docs/tutorial.md b/docs/tutorial.md
@@ -16,8 +16,9 @@ Moreover, the package also provides a `SeqInfo` class to update or modify sequen
 
 The `GenomicRanges` class is designed to seamlessly operate with upstream packages like `RangeSummarizedExperiment` or `SingleCellExperiment` representations, providing consistent and stable functionality.
 
+:::{note}
 These classes follow a functional paradigm for accessing or setting properties, with further details discussed in [functional paradigm](https://biocpy.github.io/tutorial/chapters/philosophy.html#functional-discipline) section.
-
+:::
 
 ## Installation
 
@@ -31,39 +32,6 @@ pip install genomicranges
 
 We support multiple ways to initialize a `GenomicRanges` object.
 
-## Preferred way
-
-To construct a `GenomicRanges` object, we need to provide sequence information and genomic coordinates. This is achieved through the combination of the `seqnames` and `ranges` parameters. Additionally, you have the option to specify the `strand`, represented as a list of "+" (or 1) for the forward strand, "-" (or -1) for the reverse strand, or "*" (or 0) if the strand is unknown. You can also provide a NumPy vector that utilizes either the string or numeric representation to specify the `strand`. Optionally, you can use the `mcols` parameter to provide additional metadata about each genomic region.
-
-```{code-cell}
-from genomicranges import GenomicRanges
-from iranges import IRanges
-from biocframe import BiocFrame
-from random import random
-
-gr = GenomicRanges(
-    seqnames=[
-        "chr1",
-        "chr2",
-        "chr3",
-        "chr2",
-        "chr3",
-    ],
-    ranges=IRanges([x for x in range(101, 106)], [11, 21, 25, 30, 5]),
-    strand=["*", "-", "*", "+", "-"],
-    mcols=BiocFrame(
-        {
-            "score": range(0, 5),
-            "GC": [random() for _ in range(5)],
-        }
-    ),
-)
-
-print(gr)
-```
-
-The input for `mcols` is expected to be a `BiocFrame` object and will be converted to a `BiocFrame` in case a pandas `DataFrame` is supplied.
-
 ## From Bioinformatic file formats
 
 ### From `biobear`
@@ -89,8 +57,6 @@ print(len(gg), len(df))
 
 You can also import genomes from UCSC or load a genome annotation from a GTF file. This requires installation of additional packages **pandas** and **joblib** to parse and extract various attributes from the gtf file.
 
-A future version of this package might implement or take advantage of existing genomic parser packages in Python to support various file formats.
-
 ```python
 import genomicranges
 
@@ -102,11 +68,49 @@ human_gr = genomicranges.read_ucsc(genome="hg19")
 print(human_gr)
 ```
 
+
+## Preferred way
+
+To construct a `GenomicRanges` object, we need to provide sequence information and genomic coordinates. This is achieved through the combination of the `seqnames` and `ranges` parameters. Additionally, you have the option to specify the `strand`, represented as a list of "+" (or 1) for the forward strand, "-" (or -1) for the reverse strand, or "*" (or 0) if the strand is unknown. You can also provide a NumPy vector that utilizes either the string or numeric representation to specify the `strand`. Optionally, you can use the `mcols` parameter to provide additional metadata about each genomic region.
+
+```{code-cell}
+from genomicranges import GenomicRanges
+from iranges import IRanges
+from biocframe import BiocFrame
+from random import random
+
+gr = GenomicRanges(
+    seqnames=[
+        "chr1",
+        "chr2",
+        "chr3",
+        "chr2",
+        "chr3",
+    ],
+    ranges=IRanges([x for x in range(101, 106)], [11, 21, 25, 30, 5]),
+    strand=["*", "-", "*", "+", "-"],
+    mcols=BiocFrame(
+        {
+            "score": range(0, 5),
+            "GC": [random() for _ in range(5)],
+        }
+    ),
+)
+
+print(gr)
+```
+
+:::{note}
+The input for `mcols` is expected to be a `BiocFrame` object and will be converted to a `BiocFrame` in case a pandas `DataFrame` is supplied.
+:::
+
 ## Pandas `DataFrame`
 
 If your genomic coordinates are represented as a pandas `DataFrame`, convert this into `GenomicRanges` if it contains the necessary columns.
 
+::: {important}
 The `DataFrame` must contain columns `seqnames`, `starts` and `ends` to represent genomic coordinates. The rest of the columns are considered metadata and will be available in the `mcols` slot of the `GenomicRanges` object.
+:::
 
 ```{code-cell}
 from genomicranges import GenomicRanges
@@ -193,7 +197,9 @@ print(gr.mcols)
 
 ### Setters
 
-All property-based setters are `in_place` operations, with further details discussed in [functional paradigm](../philosophy.qmd#functional-discipline) section.
+:::{important}
+All property-based setters are `in_place` operations, with further details discussed in [functional paradigm](https://biocpy.github.io/tutorial/chapters/philosophy.html#functional-discipline) section.
+:::
 
 ```{code-cell}
 modified_mcols = gr.mcols.set_column("score", range(1,6))
@@ -420,7 +426,9 @@ binned_avg_gr = subject.binned_average(bins=bins_gr, scorename="score", outname=
 print(binned_avg_gr)
 ```
 
+::: {tip}
 Now you might wonder how can I generate these ***bins***?
+:::
 
 # Generate tiles or bins
 
@@ -544,7 +552,9 @@ query_hits = gr.follow(find_regions)
 print(query_hits)
 ```
 
+::: {note}
 Similar to `IRanges` operations, these methods typically return a list of indices from `subject` for each interval in `query`.
+:::
 
 # Comparison, rank and order operations