Skip to content

Commit

Permalink
Minor fixes to the docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed May 15, 2024
1 parent e713ed1 commit 1c266de
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 9 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ The **tatami_chunked** library implements some common functionality for **tatami
Given a rectangular grid of chunks that make up a chunked matrix,
we define a "slab" as the set of chunks that overlap a single row or column (or some subset/contiguous block thereof).
We typically want to load and cache an entire slab at once, ensuring that future requests to adjacent row/columns can just use the cached values rather than re-reading or decompressing the same chunks.
The **tatami_chunked** library provides the `LruSlabCache` and `OracleSlabCache` classes to facilitate caching of the slabs in `tatami::Matrix` extractors.
The **tatami_chunked** library provides the `LruSlabCache` and `OracularSlabCache` classes to facilitate caching of the slabs in `tatami::Matrix` extractors.
The `TypicalSlabCacheWorkspace` class allows developers to easily switch between caching strategies, depending on whether an oracle is provided to predict the future access pattern.

The `CustomDenseChunkedMatrix` and `CustomSparseChunkedMatrix` classes implement the `tatami::Matrix` interface on top of a matrix of custom chunks.
These classes automatically perform slab caching given a set of options including the maximum cache size (see the `CustomDenseChunkedOptions` and `CustomSparseChunkedOptions` classes).
These classes automatically perform slab caching given a set of options including the maximum cache size (see the `CustomDenseChunkedMatrixOptions` and `CustomSparseChunkedMatrixOptions` classes).
Developers can use this to quickly create matrix representations with arbitrary chunk compression schemes that can reduce the memory footprint, e.g., DEFLATE, run length encodings.
Obviously, this comes at the cost of speed whereby the chunks must be unpacked to extract the relevant data -
developers are expected to define an appropriate extraction method for dense/sparse chunks.
Expand Down
5 changes: 3 additions & 2 deletions include/tatami_chunked/CustomDenseChunkedMatrix.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
namespace tatami_chunked {

/**
* @brief Options for custom dense chunk extraction.
* @brief Options for data extraction from a `CustomDenseChunkedMatrix`.
*/
struct CustomDenseChunkedMatrixOptions {
/**
Expand Down Expand Up @@ -260,7 +260,8 @@ struct DenseIndex : public tatami::DenseExtractor<oracle_, Value_, Index_>, publ
* @tparam Chunk_ Class of the chunk, implementing either the `MockSimpleDenseChunk` or `MockSubsetDenseChunk` interfaces.
*
* Implements a `Matrix` subclass where data is contained in dense rectangular chunks.
* These chunks are typically compressed in some manner to reduce memory usage; on access, each chunk is decompressed and the desired values are extracted.
* These chunks are typically compressed in some manner to reduce memory usage compared to, e.g., a `tatami::DenseMatrix`.
* On access, the relevant chunks are decompressed and the desired values are extracted.
* Each dimension should be divided into chunk boundaries at regular intervals starting from zero;
* this partitions the matrix according to a regular grid where each grid entry is a single chunk of the same size.
* The exception is for chunks at the non-zero boundaries of the matrix dimensions, which may be truncated.
Expand Down
5 changes: 3 additions & 2 deletions include/tatami_chunked/CustomSparseChunkedMatrix.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
namespace tatami_chunked {

/**
* @brief Options for custom sparse chunk extraction.
* @brief Options for data extraction from a `CustomSparseChunkedMatrix`.
*/
struct CustomSparseChunkedMatrixOptions {
/**
Expand Down Expand Up @@ -433,7 +433,8 @@ struct DensifiedIndex : public tatami::DenseExtractor<oracle_, Value_, Index_>,
* @tparam Chunk_ Class of the chunk, implementing either the `MockSimpleSparseChunk` or `MockSubsetSparseChunk` interfaces.
*
* Implements a `Matrix` subclass where data is contained in sparse rectangular chunks.
* These chunks are typically compressed in some manner to reduce memory usage; on access, each chunk is decompressed and the desired values are extracted.
* These chunks are typically compressed in some manner to reduce memory usage compared to, e.g., a `tatami:CompressedSparseMatrix`.
* On access, the relevant chunks are decompressed and the desired values are extracted.
* Each dimension should be divided into chunk boundaries at regular intervals starting from zero;
* this partitions the matrix according to a regular grid where each grid entry is a single chunk of the same size.
* The exception is for chunks at the non-zero boundaries of the matrix dimensions, which may be truncated.
Expand Down
1 change: 1 addition & 0 deletions include/tatami_chunked/OracularSlabCache.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ class OracularSlabCache {
* This should return a pair containing:
* 1. An `Id_`, the identifier of the slab containing `i`.
* This is typically defined as the index of the slab on the target dimension.
* For example, if each chunk takes up 10 rows, attempting to access row 21 would require retrieval of slab 2.
* 2. An `Index_`, the index of row/column `i` inside that slab.
* For example, if each chunk takes up 10 rows, attempting to access row 21 would yield an offset of 1.
* @param create Function that accepts no arguments and returns a `Slab_` object with sufficient memory to hold a slab's contents when used in `populate()`.
Expand Down
7 changes: 4 additions & 3 deletions include/tatami_chunked/OracularSubsettedSlabCache.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ struct OracularSubsettedSlabCacheSelectionDetails {
*
* Implement an oracle-aware cache for slab subsets.
* Each slab is defined as the set of chunks required to read an element of the target dimension (or a contiguous block/indexed subset thereof) from a `tatami::Matrix`.
* This cache is similar to the `OracleSlabCache` except that it remembers the subset of elements on the target dimension that were requested for each slab.
* This cache is similar to the `OracularSlabCache` except that it remembers the subset of elements on the target dimension that were requested for each slab.
* Slab extractors can use this information to optimize slab loading by ignoring unneeded elements.
*/
template<typename Id_, typename Index_, class Slab_>
Expand Down Expand Up @@ -239,14 +239,15 @@ class OracularSubsettedSlabCache {
* This should return a pair containing:
* 1. An `Id_`, the identifier of the slab containing `i`.
* This is typically defined as the index of the slab on the target dimension.
* For example, if each chunk takes up 10 rows, attempting to access row 21 would require retrieval of slab 2.
* 2. An `Index_`, the index of row/column `i` inside that slab.
* For example, if each chunk takes up 10 rows, attempting to access row 21 would yield an offset of 1.
* @param create Function that accepts no arguments and returns a `Slab_` object with sufficient memory to hold a slab's contents when used in `populate()`.
* This may also return a default-constructed `Slab_` object if the allocation is done dynamically per slab in `populate()`.
* @param populate Function that accepts a `std::vector<std::tuple<Id_, Slab_*, OracularSubsettedSlabCacheSelectionDetails<Index_>*> >&` specifying the slabs to be populated.
* The first `Id_` element of each tuple contains the slab identifier, i.e., the first element returned by the `identify` function.
* The second `Slab_*` element specifies the object which to store the contents of each slab.
* The third `OracularSubsettedSlabCacheSelectionDetails<Index_>*` element contains information about the subset of each slab that is required.
* The second `Slab_*` element specifies the object which to store the contents of the slab.
* The third `OracularSubsettedSlabCacheSelectionDetails<Index_>*` element contains information about the desired subset of elements on the target dimension of the slab.
* This function should iterate over the vector and populate the desired subset of each slab.
* Note that the vector is not guaranteed to be sorted.
*
Expand Down

0 comments on commit 1c266de

Please sign in to comment.