New
- Added the
initializeSparseMatrixFromHdf5DenseArray()
andinitializeSparseMatrixFromHdf5SparseMatrix()
functions, which exposes the internals used by theinitializeSparseMatrixFromHdf5()
function. These new functions give the user more control over the construction of sparse matrices from HDF5 files.
Changes
- Protected
cbindWithNames()
from weird names. Null names are ignored, allowing us to safely handle missing gene symbols. When names are duplicated, we explicitly keep only the first occurrence; this is more intuitive than the old (and undocumented) behavior of keeping only the last occurrence. The order of names from the first matrix is preserved in the final intersection, which should reduce the amount of reordering.
Changes
- Updated to a more recent version of the underlying tatami library for some bugfixes and improvements.
Changes
- Setting
layered=true
in the variousinitializeSparseMatrix*()
functions no longer reorders the rows, due to the use of the new tatami_layered library. As a result, allinitializeSparseMatrix*()
functions no longer need to return an array ofrow_ids
and instead return theScranMatrix
directly where the rows are in the same order as defined in the file. This greatly simplifies downstream management of the row order. - Renamed all functions to use PascalCase for acronyms, e.g.,
HDF5
is nowHdf5
,PCA
is nowPca
,SNN
is nowSnn
. This gives us a consistent naming scheme across the package that aligns with the underlying C++ code. - Renamed
modelGeneVar
tomodelGeneVariances
(similarly for its*Results
class) for easier reading.- Also changed the weighting scheme so that we do not allow equal contributions from very small blocks (< 1000 cells) with less stable statistics. Large blocks will still be equally weighted, regardless of their actual size.
- Modified
scoreMarkers()
in response to libscran changes.- Added a weighting scheme identical to the one used by
modelGeneVariances()
. This now equalizes contributions from large blocks to each per-cluster statistics (> 1000 cells in the relevant cluster). - Removed the
block=
option from the methods of the correspondingResults
object, as averages are now automatically returned for the means and detected proportions.
- Added a weighting scheme identical to the one used by
- Changed how blocking is handled in
runPca()
:- Added
blockMethod="project"
to enable projection of cells onto rotation vectors defined from residuals. - Added
blockWeights=
to determine whether to equalize the contribution of large blocks of different size, using the same scheme inscoreMarkers()
andmodelGeneVariances()
. Equal weighting only applies once blocks reach a certain size (1000 cells), otherwise the weight of each block is proportional to its size. - Removed
blockMethod = "weight"
, which is replaced byblockMethod = "none"
withblockWeights = true
.
- Added
- Overhauled the interface for t-SNE and UMAP calculations:
- Renamed the output of
initializeTsne()
andinitializeUmap()
toTsneStatus
andUmapStatus
, respectively. - Added a
run()
method to the status objects, which runs them for the specified time/number of iterations. This is more ergonomic than initializing the status objects and passing them torunTsne()
orrunUmap()
. - Changed
runTsne()
andrunUmap()
to directly return the final t-SNE and UMAP coordinates, respectively, from the nearest neighbor input. This avoids exposing the status objects for basic use cases.
- Renamed the output of
- Removed many of the
empty*Results()
functions, as these are not necessary for regular uses of this package. The exception is that of theemptySuggest*QcFiltersResults()
as this can be used to perform filtering with custom thresholds. - Removed the
consume=
option frominitializeSparseMatrixFromRds()
, as the potential damage from pass-by-reference mutations is too high for the minor improvement in performance. - Added
allowNonFinite=
option to thelogNormCounts()
function to handle infinite and missing size factors. - Added
allowZeros=
andallowNonFinite=
options to thegroupedSizeFactors()
function to handle infinite and missing size factors. Both options are automatically set totrue
when callinggroupedSizeFactors()
from thequickAdtSizeFactors()
function.
Changes
- Added a
cacheSize=
option toinitializeSparseMatrixFromHDF5()
, mostly to increase the cache size for awkward chunk sizes in dense HDF5 datasets.
Changes
- Ignore all MGI identifiers in
guessFeatures()
to avoid confusing them with human gene symbols.
Changes
- Ignore all VEGA identifiers in
guessFeatures()
to avoid confusing them with human gene symbols.
Changes
- Added the
subsetRow=
andsubsetColumn=
options to enable loading of a subset of rows/columns frominitializeSparseMatrixFromHDF5()
.
Changes
- Ignore non-string values in the
features=
forguessFeatures()
.
Changes
- Preserve placeholder entries for factor indices in
resetLevels()
.
New
- Added a
resetLevels()
function to change the levels of an existing factor.
Changes
- Improved the predictability of level ordering in
convertToFactor()
. All-string/all-number levels that are inferred from the array are now sorted. Users may also pass in their ownlevels
.
Changes
- Improved the intersection of feature identifiers in
labelCells()
andintegrateCellLabels()
. Reference features may now contain synonyms, and if feature identifiers are duplicated, only the first occurrence is used.
New
- Added an
aggregateAcrossCells()
function to aggregate expression values across groups of cells. This is typically used to obtain cluster-level summaries for plotting or per-cluster analyses.
Changes
- Provide more details (scores, fine-tuning deltas) in
labelCells()
andintegrateCellLabels()
. These functions now return full-fledged objects that need to be explicitly freed after use.
New
- Added a
factorize()
function to convert an arbitrary array into an R-style factor. This provides a superset of the functionality of theconvertBlock()
function.
Changes
convertBlock()
now raises a warning upon detectingnull
orNaN
values.
Changes
- Reduce the impact of duplicated feature identifiers in
guessFeatures()
. This avoids treating strings like "Chr1" as mouse identifiers.
Changes
- Greatly expanded the range of species that can be guessed in
guessFeatures()
. Also added the ability to force the function to report taxonomy IDs instead of common names for human/mouse.
New
- Added
perCellCrisprQcMetrics()
andsuggestCrisprQcFilters()
, to compute the QC metrics and filters for CRISPR guide count data. - Added
scoreFeatureSet()
to compute per-cell scores for a feature set's activity. - Added
hypergeometricTest()
,testFeatureSetEnrichment()
andremapFeatureSets()
, to compute simple enrichment p-values for the top markers in each feature set. - Added
computeTopThreshold()
to more easily identify the top markers for - Added
writeSparseMatrixToHdf5()
to dump aScranMatrix
back into a HDF5 file. - Added
centerSizeFactors()
to allow users to center the size factors manually.
Changes
- Renamed
perCell*QcFilters()
functions tosuggest*QcFilters()
. These now return aSuggest*QcFiltersResults
object containing filter thresholds but not the discard vector itself. Instead, thefilter()
method can be called with aPerCell*QcMetricsResults()
object to generate a discard vector, either for the same dataset or for a related set of cells. The filter thresholds themselves can also be adjusted by the application before callingfilter()
. All in all, this provides greater flexibility for applications to perform quality control. - Renamed
perCellQCMetrics()
toperCellRnaQcMetrics()
(similarly for the name of the corresponding result class). This is more consistent with the namings of the QC functions for the other modalities. - Getters for empty results will now return
null
if the corresponding field has not been filled, either using a dedicated setter or by extracting a memory view withfillable: true
. This allows applications to fail gracefully upon encountering an object where the required fields have not been filled. - Added a
leidenModularityObjective
option toclusterSNNGraph()
, to use the modularity as the objective function. This allows for a more stable interpretation of the magnitude of the resolution. - Separated resolution arguments to
multiLevelResolution
andleidenResolution
forclusterSNNGraph()
, allowing them to have different defaults. This is especially relevant whenleidenModularityObjective = false
. - Removed the
updateRowIdentities()
function, as this has little relation with other functions in scran.js. - Updated to the latest version of libscran (and thus igraph, which changes some of the clustering outputs).
- Added a
minimum=
argument tochooseHVGs()
to avoid choosing HVGs with negative residuals. - Modified
summary=
argument to accept a string inScoreMarkerResults
, which is more interpretable. - Support calculation of median and maximum effect sizes in
scoreMarkers()
. - Pass along
block=
to the internal PCA inquickAdtSizeFactors()
. - Allow size factor centering to turned off in
logNormCounts()
, in case the input size factors are already centered. - Ignore
null
s in the feature ID vectors inbuildLabelledReference()
. - Removed deprecated functionality from previous version:
- Removed
ScranMatrix.isPermuted()
. clusterSNNGraph()
no longer accepts integer arguments forscheme=
.- Removed
initializeSparseMatrixFromMatrixMarketBuffer()
. runPCA()
no longer acceptsblockMethod="block"
.- Removed
safeFree()
.
- Removed
New
- Added more
empty*()
functions to construct empty instances of various result objects. This is useful for mimicking the output of functions without actually running them.
Changes
- Added more methods and options for the
ClusterSNNGraph*Results
classes, mostly to facilitate filling of empty objects.
Changes
- Added the
lfcThreshold
andcomputeAuc
options to thescoreMarkers()
function. In particular, skipping the AUCs can improve speed and memory efficiency if they are not required. - Switched the default
referencePolicy
to"max-rss"
in themnnCorrect()
function. This favors the use of more heterogeneous batches as the initial reference. - Actually exported the
ScranMatrix
class.
Changes
- Added a
forceInteger
option to (almost) all matrix initialization functions. Setting this tofalse
will preserve any floating-point representations, e.g., for normalized expression data. This defaults totrue
for back-compatibility, where floats are coerced to integer by truncation.
Changes
- Fixed
readFile()
to actually return content in the browser. - Removed the not-to-be-used virtual file system utilities.
New
- Added
chooseTemporaryPath()
to obtain a temporary file path on both browsers and Node.js.
Changes
- All file-related utilities (
writeFile()
,removeFile()
,readFile()
andfileExists()
) now operate as expected on Node.js.
New
- Added
realizeFile()
to prepare a file for reading into other functions, regardless of whether the call is in a Node.js or browser context. For browsers, this creates a file on the virtual file system; for Node.js, it either uses the supplied path or it creates a temporary file. - Added
extractHdf5MatrixDetails()
to preview the format and dimensions of a HDF5-based matrix.
Changes
- Removed
quickSliceArray()
. Users should instead use theSLICE()
function from the bioconductor package. - Removed all array collection-related functions.
Users should instead use the
DataFrame
class and related methods from the bioconductor package. - Removed
splitByFactor()
. Users should instead use thepresplitFactor()
function from the bioconductor package.
New
- Support parsing and inspection data in RDS files (generated by R's
saveRDS()
function) via the newreadRds()
function. - Added the
initializeSparseMatrixFromRds()
function, which does exactly as advertised.
Changes
- Updated the underlying C++ libraries to their latest versions. This should improve memory efficiency.
New
- HDF5 handles now support reading and writing of attributes via
readAttribute()
andwriteAttribute()
methods. An extraattributes
member is available for listing the available attributes. - ScranMatrix objects can be used in more delayed operations via the new
delayedArithmetic()
,delayedMath()
,rbind()
andtranspose()
functions. All of these operations can be performed in place or can generate a new ScranMatrix. - Subsetting of a ScranMatrix via
subsetRows()
orsubsetColumns()
can now be done in place with the newinPlace=
option. - Added a
quickSliceArray()
function to slice a (Typed)Array while preserving its type.
Changes
- The
initializeSparseMatrix*()
functions now return an object with arow_ids=
array. This makes it more explicit that a reorganization of the row identities was performed. - The
identities()
andisReorganized()
methods for a ScranMatrix have been soft-deprecated; the latter will now always returnfalse
. This simplifies downstream operations, which no longer need to preserve consistency in the identities to produce a valid ScranMatrix. updateRowIdentities()
now requires a row identity vector instead of a ScranMatrix in its first argument.matchVectorToRowIdentities()
has been removed, along with other deprecated functions based on manipulation of row identities.
New
- Added a
maximumThreads()
function to query the maximum number of threads specified at module initialization. - All parallelizable functions now accept a
numberOfThreads=
option to control the number of threads.
New
- Added a
layered=
option in variousinitializeSparseMatrix*()
functions. This enables direct loading of sparse matrices without row reorganization, for simplicity at the cost of memory efficiency.
New
- Added the
extractMatrixMarketDimensions()
function to easily get dimensions without loading the entire file.
Changes
- Renamed the MatrixMarket reader to
initializeSparseMatrixFromMatrixMarket()
. This function now supports file path inputs, which avoids the need to buffer the entire file in Node.js. - Functions that accept an optional
buffer =
argument will now return it directly ifbuffer
is non-null
and the output type is a WasmArray. Otherwise, if the output type is a TypedArray, functions will now return a TypedArray view on a non-null inputbuffer
. This aims to provide some kind of sensible output value rather than justundefined
.
New
- Exported the
free()
function as a replacement forsafeFree()
. - Added the
subsetBlock()
function for general subsetting of the blocking factor. - Added a
allowZeros=
option to gracefully handle size factors of zero inlogNormCounts()
.
Changes
- Renamed some classes for consistency with the
*Results
naming scheme. - All
*Results
and*Matrix
instances will automatically free their memory upon garbage collection iffree()
has not already been called. subsetArrayCollection()
will now check for correct array length before subsetting.
New
- Exported the
safeFree()
function for fail-safe freeing of scran.js objects. - Added a
listMito()
function to list mitochondrial genes in mouse or human. - Added a
validateArrayCollection()
function to validate equilength array collections. subsetArrayCollection()
can now be used to subset based on a filtering vector.
Changes
- Functions involving the scran.js virtual filesystem (e.g.,
writeFile()
) now throw errors when attempted in a Node context. - The setting of
localFile=
ininitialize()
is ignored outside of a Node context. combineArrayCollections()
will attempt to preserve TypedArray types across input collections.
New
- Added a MultiMatrix class to handle memory management in multi-modal scenarios.
Changes
subsetRows()
can now directly return a MultiMatrix.
Changes
- Updated libscran dependency for
scoreMarkers()
bugfix when group and block are confounded.
Changes
blockMethod: "block"
has been renamed toblockMethod: "regress"
inrunPCA()
, for clarity.- More checks for valid
blockMethod=
being passed torunPCA()
.
New
- Exposed a reference policy option in
mnnCorrect()
for choosing the reference batch. - Support more community detection methods in
clusterSNNGraph()
.
Changes
- Any attempt to save
null
s via the HDF5 writers will now raise an error. - Any attempt to save non-strings as strings in the HDF5 writers will raise an error.
New
- Added
scaleByNeighbors()
for combining embeddings across multiple modalities. - Added
computePerCellAdtQcMetrics()
andcomputePerCellAdtQcFilters()
for quality control on the ADT count matrix. - Added
quickAdtSizeFactors()
to support normalization of ADT counts. - Added
splitRows()
to split a ScranMatrix along its rows. - Added
subsetArrayCollection()
,splitArrayCollection()
andcombineArrayCollections()
for working with collections of parallel arrays. - Added
createBlock()
,convertBlock()
,filterBlock()
anddropUnusedBlock()
for creating and manipulating the blocking factor.
Changes
runPCA()
will automatically adjust the number of requested PCs to be no greater than the number of available PCs.- Added a
numberOfDims()
method to retrieve the number of dimensions from a NeighborIndex object. computePerCellQcMetrics()
now accepts an array of Uint8WasmArrays insubsets=
rather than a single concatenated array.
Changes
- Deprecated
ScranMatrix::isPermuted()
forScranMatrix::isReorganized()
. - Deprecated
permuteVector()
formatchVectorToRowIdentities()
. - Deprecated
updatePermutation()
forupdateRowIdentities()
. - Deprecated
permuteFeatures()
formatchFeatureAnnotationToRowIdentities()
.
- Switched to
identities()
to keep track of the identities of the rows in the in-memoryScranMatrix
. This is an improvement overpermutation()
as theidentities()
can naturally accommodate subsetting.
First series of releases. Didn't keep track of all the changes here, so let's just treat these releases as prehistory. Check out the commit history for more details.