We expect a crispr_pca
HDF5 group at the root of the file, describing how the PCA was performed on the RNA expression matrix.
The group itself contains the parameters
and results
subgroups.
Definitions:
crispr_available
: whether CRISPR data is present in the dataset. This is typically determined by examining theinputs
step.num_cells
: number of cells remaining after QC filtering. This is typically determined from thecell_filtering
step.
parameters
should contain:
num_pcs
: a scalar integer containing the number of PCs to compute.block_method
: a scalar string specifying the method to use when dealing with multiple blocks in the dataset. This may be"none"
,"regress"
or"weight"
.
If crispr_available = false
, results
should be empty.
Otherwise, results
should contain:
pcs
: a 2-dimensional float dataset containing the PC coordinates in a row-major layout. Each row corresponds to a cell (after QC filtering), withnum_cells
rows in total. Each column corresponds to a PC, with no more than (but possibly less than)num_pcs
columns in total. PCs may be computed with block-specific weights or regression, depending onblock_method
.var_exp
: a float dataset of length equal to the number of PCs, containing the percentage of variance explained by each PC.
Added in version 3.0.