An inputs
group should be present at the root of the HDF5 file, containing all information relating to the data inputs.
This includes the definition of the input files as well as summary statistics like the number of cells and features.
The group itself contains the parameters
and results
subgroups.
In this section, a "matrix" refers to one or more files describing a single (count) matrix. This should be exactly one file for HDF5-based formats, or multiple files for MatrixMarket formats, e.g., to include feature information - see below for details.
Definitions:
embedded
: whether the input files were embedded in the*.kana
file. Iffalse
, the input files are assumed to be linked instead.
parameters
should contain:
-
format
: a scalar string specifying the file format for a single matrix. This is usually either"MatrixMarket"
, for a MatrixMarket file with possible feature/barcode annotations;"10X"
, for the 10X Genomics HDF5 matrix format; or"H5AD"
, for the H5AD format. Other values are allowed but their interpretation is implementation-defined (e.g., for custom resources). For multiple matrices,format
should instead be a 1-dimensional string dataset of length equal to the number of uploads. Each element of the dataset is usually one of"MatrixMarket"
,"10X"
or"H5AD"
; different values can be present for mixed input formats.} -
files
: a group of groups representing an array of input file information. Each inner group is named by their positional index in the array and contains information about a file in an upload. Each inner group should contain:type
: a scalar string specifying the type of the file.- If
format = "MatrixMarket"
, there should be exactly onetype = "mtx"
corresponding to the (possibly Gzipped)*.mtx
file. There may be zero or onetype = "genes"
, containing a (possibly Gzipped) TSV file with the Ensembl and gene symbols for each row. There may be zero or onetype = "annotations"
, containing a (possibly Gzipped) TSV file with the annotations for each column. - If
format = "10X"
or"H5AD"
, there should be exactly onetype = "h5"
. - For other
format
s, anytype
can be used, typically for custom resources.
- If
name
: a scalar string specifying the file name as it was provided to kana.
If
embedded = true
, we additionally expect:offset
: a scalar integer specifying where the file starts as an offset from the start of the remaining bytes section. The offset for the first file should be zero, and entries infiles
should be ordered by increasingoffset
.size
: a non-negative scalar integer specifying the number of bytes in the file. The offset of each file should be equal to the sum ofsize
andoffset
for the preceding file.
If
embedded = false
, we expect:id
: a scalar string containing some unique identifier for this file. The interpretation ofid
is application-specific but usually refers to some cache or database.
results
should contain:
dimensions
: an integer dataset of length 2, containing the number of features and the number of cells in the loaded dataset.permutation
: an integer dataset of length equal to the number of cells, describing the permutation to be applied to the per-gene results to recover the original row order.
Added in version 1.0.