Skip to content

Commit

Permalink
Update all of the C++ code to use the latest versions of all librarie…
Browse files Browse the repository at this point in the history
…s. (#90)

Most user-visible functionality are unchanged, with some exceptions:

- Replace the ADT normalization code with the CLRm1 method.
- Removed median/groupedSizeFactors() as these were only needed for ADTs.
- Renamed testFeatureSetEnrichment() to testGeneSetEnrichment().
- Rotation matrices are now reported from runPca().
- Block weighting is supported in modelGeneVariances().
- Renamed clusterSnnGraph() to clusterGraph().
- Renamed scoreFeatureSet() to scoreGsdecon().
- Replaced PCA-partitioning with variance partitioning in clusterKmeans().
- Exposed options for refinement algorithm choice, iterations in clusterKmeans().
- All Suggest*FilterResults::filter() methods now return the cells to **keep**.
- filterCells() now accepts a vector of cells to **keep**.
- Renamed some methods to avoid unnecessary plurals, e.g., sum() instead of sums().
- Added isBlocked() methods to *Results classes with methods that accept block IDs.
- Renamed all of the SingleR-related functions, for consistency.
- Added asTypedArray= option to various methods/functions to allow users to
  choose between returning a TypedArray or TypedWasmArray.

We enforce use of int32_t instead of int to avoid surprises later. We add some more
tests for SingleR, especially with regards to intersection of the feature space.
  • Loading branch information
LTLA authored Oct 11, 2024
1 parent ecd2b28 commit f14bbe5
Show file tree
Hide file tree
Showing 115 changed files with 4,172 additions and 4,665 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
extern/installed
build_*
extern/include
build_*/
docs/html/
docs/latex/
docs/built/
Expand Down
83 changes: 60 additions & 23 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,43 +16,52 @@ add_subdirectory(extern)
add_executable(
scran_wasm

src/read_matrix_market.cpp
src/read_hdf5_matrix.cpp
src/NumericMatrix.cpp
src/cbind.cpp
src/subset.cpp
src/delayed.cpp

src/initialize_from_arrays.cpp
src/initialize_from_rds.cpp
src/initialize_from_mtx.cpp
src/initialize_from_hdf5.cpp

src/rds_utils.cpp
src/hdf5_utils.cpp
src/write_sparse_matrix_to_hdf5.cpp
src/initialize_sparse_matrix.cpp

src/quality_control_rna.cpp
src/quality_control_adt.cpp
src/quality_control_crispr.cpp
src/filter_cells.cpp

src/log_norm_counts.cpp
src/median_size_factors.cpp
src/grouped_size_factors.cpp
src/normalize_counts.cpp
src/compute_clrm1_factors.cpp

src/model_gene_variances.cpp

src/run_pca.cpp
src/run_tsne.cpp
src/run_umap.cpp
src/mnn_correct.cpp
src/scale_by_neighbors.cpp
src/cluster_snn_graph.cpp

src/NeighborIndex.cpp

src/run_tsne.cpp
src/run_umap.cpp

src/build_snn_graph.cpp
src/cluster_graph.cpp
src/cluster_kmeans.cpp

src/score_markers.cpp

src/run_singlepp.cpp
src/NumericMatrix.cpp
src/NeighborIndex.cpp
src/cbind.cpp
src/subset.cpp
src/delayed.cpp
src/get_error_message.cpp
src/rds_utils.cpp
src/initialize_from_rds.cpp

src/score_feature_set.cpp
src/score_gsdecon.cpp
src/hypergeometric_test.cpp

src/aggregate_across_cells.cpp

src/get_error_message.cpp
)

target_compile_options(
Expand All @@ -63,18 +72,46 @@ target_compile_options(

target_link_libraries(
scran_wasm
scran

tatami_hdf5
tatami_mtx
tatami_layered
mnncorrect

scran_qc
scran_norm
scran_variances
scran_pca
scran_aggregate
scran_markers

knncolle
knncolle_annoy

qdtsne
umappp
hdf5-static
hdf5_cpp-static

mumosa

mnncorrect

igraph::igraph
scran_graph_cluster

singlepp
singlepp_loaders

hdf5-static
hdf5_cpp-static
rds2cpp

phyper
gsdecon
)

target_include_directories(
scran_wasm
PRIVATE
extern/include
)

target_link_options(scran_wasm PRIVATE
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ After that, you can run the remaining steps synchronously - for example, using t
// Reading in the count matrix.
import * as fs from "fs";
let buffer = fs.readFileSync("matrix.mtx.gz");
let mat = scran.initializeSparseMatrixFromMatrixMarketBuffer(buffer);
let mat = scran.initializeSparseMatrixFromMatrixMarket(buffer);
```

## Basic analyses
Expand Down
3 changes: 2 additions & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ then
-B $builddir \
-DCOMPILE_NODE=${node_flag} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_PREFIX_PATH=extern/installed
-DCMAKE_PREFIX_PATH=extern/installed \
-DTATAMI_HDF5_FIND_HDF5=OFF
fi

cd $builddir
Expand Down
8 changes: 4 additions & 4 deletions docs/related/developer_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ We use WebAssembly (Wasm) to enable efficient client-side execution of common st
Code to perform each step is written in C++ and compiled to Wasm using the [Emscripten toolchain](https://emscripten.org/).
Some of the relevant C++ libraries are listed below:

- [libscran](https://github.com/LTLA/libscran) provides C++ implementations of key functions in **scran** and its fellow packages **scater** and **scuttle**.
- [libscran](https://github.com/libscran) provides C++ implementations of key functions in **scran** and its fellow packages **scater** and **scuttle**.
This includes quality control, normalization, feature selection, PCA, clustering and dimensionality reduction.
- [tatami](https://github.com/tatami-inc/tatami) provides an abstract interface to different matrix classes, focusing on row and column extraction.
- [knncolle](https://github.com/LTLA/knncolle) wraps a number of nearest neighbor detection methods in a consistent interface.
- [knncolle](https://github.com/knncolle-inc/knncolle) wraps a number of nearest neighbor detection methods in a consistent interface.
- [CppIrlba](https://github.com/LTLA/CppIrlba) contains a C++ port of the IRLBA algorithm for approximate PCA.
- [CppKmeans](https://github.com/LTLA/CppKmeans) contains C++ ports of the Hartigan-Wong and Lloyd algorithms for k-means clustering.
- [qdtsne](https://github.com/LTLA/qdtsne) contains a refactored C++ implementation of the Barnes-Hut t-SNE dimensionality reduction algorithm.
- [umappp](https://github.com/LTLA/umappp) contains a refactored C++ implementation of the UMAP dimensionality reduction algorithm.
- [qdtsne](https://github.com/libscran/qdtsne) contains a refactored C++ implementation of the Barnes-Hut t-SNE dimensionality reduction algorithm.
- [umappp](https://github.com/libscran/umappp) contains a refactored C++ implementation of the UMAP dimensionality reduction algorithm.

For each step, we use Emscripten to compile the associated C++ functions into Wasm and generate Javascript-visible bindings.
We can then load the Wasm binary into a web application and call the desired functions on user-supplied data.
Expand Down
Loading

0 comments on commit f14bbe5

Please sign in to comment.