We expect a combine_embeddings
HDF5 group at the root of the file, describing how embeddings are combined across multiple modalities.
The group itself contains the parameters
and results
subgroups.
Only a single embedding was generated prior to version 2.0 of the format, so the combine_embeddings
group may be absent in pre-v2.0 files.
No concept of multi-modality exists in earlier versions so downstream steps should use the PCs directly from the pca
step.
Definitions:
modalities
: an array of names of modalities with available embeddings. This is based on the availability of embeddings in thepca
andadt_pca
stepsnum_cells
: number of cells remaining after QC filtering. This is typically determined from thecell_filtering
step.total_dims
: the sum of the number of dimensions of the embedding across all modalities inmodalities
.
parameters
should contain:
approximate
: an integer scalar to be interpreted as a boolean, indicating whether an approximate neighbor search was used to compute the per-embedding scaling factors.weights
: a group containing the (scaling) weights to apply to each modality. If empty, weights are implicitly assumed to be equal to unity for all modalities. Otherwise, the group should contain a float scalar dataset named after each modality inmodalities
, containing the weight to be applied to each modality.
If multiple modalities are present in modalities
, results
should contain:
combined
: a 2-dimensional float dataset containing the combined embeddings in a row-major layout. Each row corresponds to a cell and each column corresponds to a dimension, withnum_cells
rows andtotal_dims
columns in total.
Otherwise, combined
may be missing, in which case it is assumed that only a single embedding exists in the analysis and no combining is necessary.
Downstream steps should instead use the coordinates from the PCA group of the available modality in the pca
or adt_pca
step.
Added in version 2.0.