Overview

We expect a batch_correction HDF5 group at the root of the file, containing the parameters and results of batch correction across samples. The group itself contains the parameters and results subgroups.

A separate batch correction step was not used prior to version 2.0 of the format, so the batch_correction group may be absent in pre-v2.0 files. In such cases, we can check for the presence of a corrected dataset in the pca step.

Definitions:

num_samples: the number of samples in the dataset. This is typically determined from the inputs step.
num_cells: the number of cells remaining after quality control. This is typically determined from the cell_filtering step.b
num_dims: the expected number of dimensions in the original (uncorrected) embedding. This is typically determined from the combine_embeddings step.

Parameters

parameters should contain:

num_neighbors: a scalar integer specifying the number of neighbors to use for mutual nearest neighbors correction.
approximate: a scalar integer to be treated as a boolean, indicating whether an approximate neighbor search was used.
method: a scalar string specifying the correction method to use, either "none" or "mnn".

Results

If method = "mnn" and num_samples > 1, results should contain:

corrected: a 2-dimensional float dataset containing the corrected PCs in a row-major layout. Each row corresponds to a cell and each column corresponds to a dimension.

Otherwise, correction is assumed to be a no-op and results may be empty. Downstream steps should instead fetch coordinates from the combined dataset in the combine_embeddings step.

History

Added in version 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2_0.md

v2_0.md

Overview

Parameters

Results

History

Files

v2_0.md

Latest commit

History

v2_0.md

File metadata and controls

Overview

Parameters

Results

History