Skip to content

Commit

Permalink
[SQUASH ME] Added additional text explaining 31-bit vs 32-bit size li…
Browse files Browse the repository at this point in the history
…mits.

The main tables should describe sign and on-disk sizes, which they now
do, but on disk size is not the same thing as range of valid values.

I feel this is clearer than the alternative option of keeping some as
int32_t and needing to define their valid ranges.
  • Loading branch information
jkbonfield committed Feb 5, 2020
1 parent 3355653 commit 535dc6b
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion SAMv1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -947,6 +947,14 @@ \subsection{The BAM format}
the default when the corresponding information is not available; an
underlined word in uppercase denotes a field in the SAM format.
Note the field types defined below define whether the value may be
negative and the number of bytes used on disk, but do not necessarily
dictate their appropriate maximum value. Care should be taken with
{\tt uint32\_t} fields to avoid exceeding 31-bits (approximately 2 billion)
as this can cause either implementation language issues or other
related data type constraints (e.g. {\tt tlen} needs to be 1 bit larger
than the maximum {\tt l\_ref}).
\begin{table}[ht]
\centering
{\small
Expand Down Expand Up @@ -1076,7 +1084,7 @@ \subsubsection{Auxiliary data encoding}\label{sec:aux-type-codes}
\newcommand*{\arraytagfield}[3]{\tagfield{B}{\bytebox{1}{\tt #1}\bytebox{4}{\em count}\byteboxvector{#2}{#3}}}
The representation of a `{\tt B}' array field starts with a sub-type character
similar to the numeric field types above and an {\tt uint32\_t} \emph{count}
similar to the numeric field types above and a {\tt uint32\_t} \emph{count}
giving the number of elements in the array.
The array elements follow, encoded as binary integers or IEEE floats sized
according to the sub-type:
Expand Down Expand Up @@ -1220,6 +1228,11 @@ \subsection{The BAI index format for BAM files}
\end{tabular}}
\end{table}
As with the BAM format, the {\tt uint32\_t} fields indicate unsigned
values consuming 4 bytes and do not imply the full range of values is
appropriate. Practical implementations may limit these to 31-bit or
less.
The index file may optionally contain additional metadata providing a summary
of the number of mapped and unmapped read-segments per reference sequence,
and of any unplaced unmapped read-segments.\footnote{By \emph{placed unmapped
Expand Down

0 comments on commit 535dc6b

Please sign in to comment.