Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCM incorrect handling of SQ groups in header #5606

Closed
gagern opened this issue Sep 30, 2022 · 2 comments
Closed

DCM incorrect handling of SQ groups in header #5606

gagern opened this issue Sep 30, 2022 · 2 comments

Comments

@gagern
Copy link

gagern commented Sep 30, 2022

ImageMagick version

7.1.0-48-22-g7a3f3f1bd

Operating system

Linux

Operating system, version and so on

Debian Linux 5.18.0-4-amd64

Description

The handling of nested data (value representation SQ) is broken. This leads to improper image header errors reported in

ThrowDCMException(CorruptImageError,"ImproperImageHeader")

Problem 1:

if ((info.scale != (Quantum *) NULL) && (strcmp(explicit_vr,"SQ") == 0))
only performs the group entry operations if info.scale != NULL. I believe this depends on which attributes have been parsed up to this point, so this can lead to not entering a group which then causes problems when the group is exited in
if ((group == 0xFFFE) && (element == 0xE0DD))

Problem 2:

I don't see any code which would treat elements with unknown tag and unknown length as nested data encoded. https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_6.2.2 writes:

The length field of the Value Representation of UN may contain the value of Undefined Length, in which case the contents can be assumed to be encoded with implicit VR. See Section 7.5.1 to determine how to parse Data Elements with an Undefined Length.

It is unclear to me whether any such data inside an unknown tag should be treated as though it were occurring at top level, or just parsed for length and otherwise discarded. The latter is probably the safer thing to do.

Problem 3:

if ((info.scale != (Quantum *) NULL) && (strcmp(explicit_vr,"SQ") == 0))
is only using the explicit VR to detect grouping. This fails when an implicit encoding is being used.

It might make sense to check whether the explicit VR matches what the implicit VR would allow (assuming that the encoding is explicit), to throw an error if there is a VR mismatch, and to just continue with a single consistent VR after that check instead of having some of the code use the explicit and some the implicit one.

Problem 4:

ImageMagick/coders/dcm.c

Lines 3211 to 3217 in 7a3f3f1

/*
Check for "explicitness", but meta-file headers always explicit.
*/
if ((explicit_file == MagickFalse) && (group != 0x0002))
explicit_file=(isupper((int) ((unsigned char) *explicit_vr)) != 0) &&
(isupper((int) ((unsigned char) *(explicit_vr+1))) != 0) ?
MagickTrue : MagickFalse;
appears to be a fairly hackish way to detect whether explicit syntax is being used. It works by checking whether the two bytes at the position of the value representation code are both upper case letters, but with implicit representation these bytes would be the least significant bytes of a four-byte data length encoding. So it is not unlikely that these might just happen to also encode upper-case letters.

I believe it would be better to examine (0x0002,0x0010) transfer syntax information. If that is 1.2.840.10008.1.2 then implicit encoding is used, if it is 1.2.840.10008.1.2.1 then it's explicit encoding instead. I'm still looking for the relevant part of the specification to list all these UIDs; the best I have so far is https://dicom.nema.org/medical/dicom/current/output/html/part02.html#sect_B.4.2.1.3.1.2 which makes that distinction.

ImageMagick/coders/dcm.c

Lines 3452 to 3457 in 7a3f3f1

type=1;
subtype=0;
if (strlen(transfer_syntax) > 17)
{
count=(ssize_t) sscanf(transfer_syntax+17,".%d.%d",&type,
&subtype);
is already parsing the type but since type = 1 is also the default it can't use the type to distinguish between implicit and explicit syntax for groups other than 0x0002.

Steps to Reproduce

The following code snippet generates test images which I believe should be valid but which currently fail to parse when processed by identify or convert.

#!/usr/bin/env python3

"""Dicom test image generator.

When invoked, this script will print two DICOM (dcm) images to files
test1.dcm and test2.dcm in the current directory. The structure of the
data is based on actually observed files, but personal information has
been removed.
"""

import struct
import sys

UNDEFINED_LENGTH = 0xffffffff

def padEven(data):
  return data + (b'\x00' if len(data) & 1 else b'')

def explicitShort(grp, elt, vr, data, undeflen=False):
  return struct.pack(
    '<HH2sH', grp, elt, vr.encode(),
    UNDEFINED_LENGTH if undeflen else len(data)
  ) + padEven(data)

def explicitLong(grp, elt, vr, data, undeflen=False):
  return struct.pack(
    '<HH2sHI', grp, elt, vr.encode(), 0,
    UNDEFINED_LENGTH if undeflen else len(data)
  ) + padEven(data)

def implicit(grp, elt, data, undeflen=False):
  return struct.pack(
    '<HHI', grp, elt,
    UNDEFINED_LENGTH if undeflen else len(data)
  ) + padEven(data)

def sqGroup(children, undeflen=False):
  if undeflen:
    return b''.join(children) + implicit(0xfffe, 0xe0dd, b'')
  else:
    return b''.join(children)

def sqItem(children, undeflen=False):
  if undeflen:
    return (implicit(0xfffe, 0xe000, b'', undeflen=True) +
            b''.join(children) +
            implicit(0xfffe, 0xe00d, b''))
  else:
    return implicit(0xfffe, 0xe000, b''.join(children))

def datalen(fmt, children):
  data = b''.join(children)
  return fmt(len(data)) + data

def vrUL(*values):
  return b''.join(struct.pack('<I', val) for val in values)

def vrUS(*values):
  return b''.join(struct.pack('<H', val) for val in values)

pixels = bytes.fromhex(
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
  '0032ffffffffffffffa20000000000000000000000f37d000000'
  '0000000000f68d0000000000000000000000000000f37d000000'
  '0000000000f68d0055d5f7f1b20000bcf5f7cb35d6ffffffae00'
  '0000000000f68d0de4c8496eee947cf46a45a98f00f37d000000'
  '0000000000f68d77f81c0000add076f05600000000f37d000000'
  '0000000000f68d96fffefeffffdf00a3e2e4b93800f37d000000'
  '0000000000f68d79f30d000000000000003dd6c000f27d000000'
  '0000000000f68d16e5c5534fa7ad96b65a47d7c700e7a60d0000'
  '0000000000f68d0055d2f5f5cd4f35bff3f8d74f009af3ffae00'
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
  '0000000000000000000000000000000000000000000000000000'
)  # 26 x 17 x 8bit

image1 = b''.join([
  b'\x00'*128,
  b'DICM',
  # Head
  datalen(lambda l: explicitShort(0x0002, 0x0000, 'UL', vrUL(l)), [
    explicitLong(0x0002, 0x0001, 'OB', b'\x00\x01'),  # File Meta Information Version
    explicitShort(0x0002, 0x0002, 'UI', b'1.2.840.10008.5.1.4.1.1.7\x00'),  # Media Storage SOP Class UID
    explicitShort(0x0002, 0x0003, 'UI', b'1.3.12.2.1107.5.3.63.20011.12.202208190856103529'),  # Media Storage SOP Instance UID
    explicitShort(0x0002, 0x0010, 'UI', b'1.2.840.10008.1.2\x00'),  # Transfer Syntax UID
    explicitShort(0x0002, 0x0012, 'UI', b'1.2.826.0.1.3680043.2.135.1066.101'),  # Implementation Class UID
    explicitShort(0x0002, 0x0013, 'SH', b'1.5.0/WIN32\x00'),  # Implementation Version Name
  ]),
  # Body
  implicit(0x0008, 0x0005, b'ISO_IR 100'),  # Specific Character Set
  implicit(0x0008, 0x0008, b'ORIGINAL\\PRIMARY\\RAD'),  # Image Type
  implicit(0x0008, 0x0016, b'1.2.840.10008.5.1.4.1.1.7\x00'),  # SOP Class UID
  implicit(0x0008, 0x0018, b'1.3.12.2.1107.5.3.63.20011.12.202208190856103529'),  # SOP Instance UID
  implicit(0x0008, 0x0070, b'SIEMENS '),  # Manufacturer
  implicit(0x0008, 0x0080, b'ImageMagick Test'),  # Institution Name
  implicit(0x0008, 0x0090, b'DOE^JOHN'),  # Referring Physician's Name
  implicit(0x0008, 0x0096, sqGroup([  # Referring​ Physician's ​Identification​ Sequence
    sqItem([
      implicit(0x0040, 0x1101, sqGroup([  # Person Identification Code Sequence
        sqItem([
          implicit(0x0008, 0x0100, b'123456'),  # Code Value
        ], undeflen=True),
      ], undeflen=True), undeflen=True),
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  implicit(0x0008, 0x1010, b''),  # Station Name
  implicit(0x0008, 0x1030, b'IMAGEMAGICK TEST'),  # Study Description
  implicit(0x0008, 0x1032, sqGroup([  # Procedure Code Sequence
    sqItem([
      implicit(0x0008, 0x0100, b'XYZ '),  # Code Value
      implicit(0x0008, 0x0102, b''),  # Coding Scheme Designator
      implicit(0x0008, 0x0103, b''),  # Coding Scheme Version
      implicit(0x0008, 0x0104, b''),  # Code Meaning
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  implicit(0x0008, 0x103e, b'Radiation Dose Information'),  # Series Description
  implicit(0x0008, 0x1090, b'MULTIX Impact '),  # Manufacturer's Model Name
  implicit(0x0008, 0x1111, sqGroup([  # Referenced Study Component Sequence
  ])),
  implicit(0x0020, 0x0013, b'1 '),  # Instance (formerly Image) Number
  implicit(0x0028, 0x0002, vrUS(1)),  # Samples per Pixel
  implicit(0x0028, 0x0004, b'MONOCHROME1 '),  # Photometric Interpretation
  implicit(0x0028, 0x0008, b'1 '),  # Number of Frames
  implicit(0x0028, 0x0010, vrUS(17)),  # Rows
  implicit(0x0028, 0x0011, vrUS(26)),  # Columns
  implicit(0x0028, 0x0100, vrUS(8)),  # Bits Allocated
  implicit(0x0028, 0x0101, vrUS(8)),  # Bits Stored
  implicit(0x0028, 0x0102, vrUS(7)),  # High Bit
  implicit(0x0028, 0x0103, vrUS(0)),  # Pixel Representation
  implicit(0x0028, 0x0106, vrUS(0)),  # Smallest Image Pixel Value
  implicit(0x0028, 0x0107, vrUS(255)),  # Largest Image Pixel Value
  implicit(0x0028, 0x0301, b'YES '),  # Burned In Annotation
  implicit(0x0028, 0x1040, b'LIN '),  # Pixel Intensity Relationship
  implicit(0x0028, 0x1050, b'128 '),  # Window Center
  implicit(0x0028, 0x1051, b'256 '),  # Window Width
  implicit(0x0028, 0x1052, b'0 '),  # Rescale Intercept
  implicit(0x0028, 0x1053, b'1 '),  # Rescale Slope
  implicit(0x0028, 0x1054, b'US'),  # Rescale Type
  implicit(0x0028, 0x2110, b'00'),  # Lossy Image Compression
  implicit(0x0040, 0x0275, sqGroup([  # Request Attributes Sequence
    sqItem([
      implicit(0x0040, 0x1001, b'1234567 '),  # Requested Procedure ID
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  implicit(0x0400, 0x0561, sqGroup([  # Original Attributes Sequence
    sqItem([
      implicit(0x0400, 0x0550, sqGroup([  # Modified Attributes Sequence
        sqItem([
          implicit(0x0008, 0x0090, b'DOE^JOHN'),  # Referring Physician's Name
          implicit(0x0008, 0x0096, sqGroup([  # Referring​ Physician's ​Identification​ Sequence
            sqItem([
              implicit(0x0040, 0x1101, sqGroup([  # Person Identification Code Sequence
                sqItem([
                  implicit(0x0008, 0x0100, b''),  # Code Value
                ], undeflen=True),
              ], undeflen=True), undeflen=True),
            ], undeflen=True),
          ], undeflen=True), undeflen=True),
        ], undeflen=True),
      ], undeflen=True), undeflen=True),
      implicit(0x0400, 0x0565, b'CORRECT '),
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  implicit(0x7fe0, 0x0010, pixels),  # Pixel Data
])

image2 = b''.join([
  b'\x00'*128,
  b'DICM',
  # Head
  datalen(lambda l: explicitShort(0x0002, 0x0000, 'UL', vrUL(l)), [
    explicitLong(0x0002, 0x0001, 'OB', b'\x00\x01'),  # File Meta Information Version
    explicitShort(0x0002, 0x0002, 'UI', b'1.2.840.10008.5.1.4.1.1.1.1\x00'),  # Media Storage SOP Class UID
    explicitShort(0x0002, 0x0003, 'UI', b'1.3.51.0.7.1725979627.14904.4931.44329.55568.57304.7930\x00'),  # Media Storage SOP Instance UID
    explicitShort(0x0002, 0x0010, 'UI', b'1.2.840.10008.1.2.1\x00'),  # Transfer Syntax UID
    explicitShort(0x0002, 0x0012, 'UI', b'1.2.40.0.13.1.1.1\x00'),  # Implementation Class UID
    explicitShort(0x0002, 0x0013, 'SH', b'dcm4che-1.4.34'),  # Implementation Version Name
  ]),
  # Body
  explicitShort(0x0008, 0x0005, 'CS', b'ISO_IR 100'),  # Specific Character Set
  explicitShort(0x0008, 0x0008, 'CS', b'ORIGINAL\\PRIMARY'),  # Image Type
  explicitShort(0x0008, 0x0016, 'UI', b'1.2.840.10008.5.1.4.1.1.1.1\x00'),  # SOP Class UID
  explicitShort(0x0008, 0x0018, 'UI', b'1.3.51.0.7.1725979627.14904.4931.44329.55568.57304.7930\x00'),  # SOP Instance UID
  explicitLong(0x0008, 0x0051, 'SQ', sqGroup([  # Issuer of Accession Number Sequence
    sqItem([
      explicitLong(0x0040, 0x0031, 'UT', b'SOMEID'),  # Local Namespace Entity ID
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitShort(0x0008, 0x0070, 'LO', b'Varian'),  # Manufacturer
  explicitShort(0x0008, 0x0090, 'PN', b'Doe^Jane'),  # Referring Physician's Name
  explicitShort(0x0008, 0x1090, 'LO', b'Varian_4343R'),  # Manufacturer's Model Name
  explicitLong(0x0008, 0x1111, 'SQ', sqGroup([  # Referenced Study Component Sequence
    sqItem([
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitShort(0x0008, 0x2111, 'ST', b''),  # Derivation Description
  explicitLong(0x0008, 0x2112, 'SQ', sqGroup([  # Source Image Sequence
    sqItem([
      explicitShort(0x0028, 0x135a, 'CS', b'YES '),
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitLong(0x0008, 0x2218, 'SQ', sqGroup([  # Anatomic Region Sequence
    sqItem([
      explicitShort(0x0008, 0x0100, 'SH', b'ABCD'),  # Code Value
      explicitShort(0x0008, 0x0104, 'LO', b'Test'),  # Code Meaning
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitShort(0x0018, 0x1020, 'LO', b'Firmwareversion=0. 0\\7.4.1.0  '),  # Software Version(s)
  explicitShort(0x0028, 0x0002, 'US', vrUS(1)),  # Samples per Pixel
  explicitShort(0x0028, 0x0004, 'CS', b'MONOCHROME1 '),  # Photometric Interpretation
  explicitShort(0x0028, 0x0010, 'US', vrUS(17)),  # Rows
  explicitShort(0x0028, 0x0011, 'US', vrUS(26)),  # Columns
  explicitShort(0x0028, 0x0100, 'US', vrUS(8)),  # Bits Allocated
  explicitShort(0x0028, 0x0101, 'US', vrUS(8)),  # Bits Stored
  explicitShort(0x0028, 0x0102, 'US', vrUS(7)),  # High Bit
  explicitShort(0x0028, 0x0103, 'US', vrUS(0)),  # Pixel Representation
  explicitShort(0x0028, 0x0301, 'CS', b'NO'),  # Burned In Annotation
  explicitShort(0x0028, 0x1040, 'CS', b'LIN '),  # Pixel Intensity Relationship
  explicitShort(0x0028, 0x1052, 'DS', b'0 '),  # Rescale Intercept
  explicitShort(0x0028, 0x1053, 'DS', b'1 '),  # Rescale Slope
  explicitShort(0x0028, 0x1054, 'LO', b'PVAL'),  # Rescale Type
  explicitShort(0x0028, 0x2110, 'CS', b'00'),  # Lossy Image Compression
  explicitLong(0x0028, 0x3010, 'SQ', sqGroup([  # VOI LUT Sequence
    sqItem([
      explicitShort(0x0028, 0x3003, 'LO', b''),  # LUT Explanation
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitLong(0x0040, 0x0275, 'SQ', sqGroup([  # Request Attributes Sequence
    sqItem([
      explicitShort(0x0032, 0x1032, 'PN', b'Dude^Some '),  # Requesting Physician
      explicitLong(0x0032, 0x1064, 'SQ', sqGroup([  # Requested Procedure Code Sequence
        sqItem([
          explicitShort(0x0008, 0x0104, 'LO', b'abcd'),  # Code Meaning
        ], undeflen=True),
      ], undeflen=True), undeflen=True),
      explicitShort(0x0040, 0x1002, 'LO', b'DICOM image parsing '),  # Reason For Requested Procedure
      explicitLong(0x0040, 0x100a, 'SQ', sqGroup([
      ], undeflen=True), undeflen=True),
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitLong(0x0040, 0x0555, 'SQ', sqGroup([  # Acquisition Context Sequence
  ], undeflen=True), undeflen=True),
  explicitLong(0x0054, 0x0220, 'SQ', sqGroup([  # View Code Sequence
    sqItem([
      explicitShort(0x0008, 0x0104, 'LO', b'abcd'),  # Code Meaning
    ], undeflen=True),
  ], undeflen=True), undeflen=True),
  explicitLong(0x7fe0, 0x0010, 'OW', pixels),  # Pixel Data
])

if __name__ == '__main__':
  with open('test1.dcm', 'wb') as f:
    f.write(image1)
  with open('test2.dcm', 'wb') as f:
    f.write(image2)
  # Try to process these with identify or convert.

Images

No response

@gagern
Copy link
Author

gagern commented Sep 30, 2022

Looking at the example image from #2534 I see that it contains the secondary (smaller) image as (0x0088,0x0200) “Icon Image Sequence”. The primary image on the other hand is outside any SQ element. So one might get away with just skipping the SQ elements (as well as any elements of unknown size) and still get the main image. However https://dicom.nema.org/medical/dicom/current/output/html/part06.html#chapter_6 lists a number of things called “… Image Sequence” and I guess some of them might be quite important. For example (0x0022,0x0021) and (0x0022,0x0022) sound like the left and right image of some stereoscopic imaging process. So from that perspective, perhaps it is correct to treat any occurrence of sufficient amounts of image information as a separate image, even in unknown types.

Also note that my current checkout at 7.1.0-48-22-g7a3f3f1bd can't read the sample image from #2534 any more, either. Does ImageMagick have a test suite somewhere, a set of images that are expected to get parsed in a certain reproducible way? And speaking of development approach, is there a way on the identify command line to trigger the image_info->verbose flag and get the debug output that's gated by that?

@urban-warrior
Copy link
Member

Thanks for the problem report. We can reproduce it and will have a patch to fix it in the GIT main branch @ https://github.com/ImageMagick/ImageMagick later today. The patch will be available in the beta releases of ImageMagick @ https://imagemagick.org/archive/beta/ by sometime tomorrow.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Oct 13, 2022
Fix some pkglint.

Merged

    Fix DDS file DDPF_LUMINANCE type of data #5452

Commits

    beta release 9f4d7d5
    Removed default quality of 50. d0b8d6e
    Use the new api of jpeg-xl 0.7.0. 1246eab
    Set the minimum jpeg-xl version to 0.7.0 67e6c68
    Corrected setting the properties that should be set when the image has an alpha channels. 401f580
    Adjust num_color_channels when the image is grayscale. 1a2117e
    Use ReadStrip method when bit depth is higher than 8 (#5597) f95bf0b
    Added support for reading the resolution of an xcf file (#5549). 7f0348c
    Minor style change. c50602c
    Correct distance calculation. c3f5009
    Perform ChannelGeometry checks earlier. 7eb960d
    Corrected version format to be compatible with Ghostscript 10.00.0 (#5618) d5349ca
    Correct quotes around the password, the old way no longer works with version 10.00.0 of Ghostscript. 82bbf4c
    Read and use the offset instead of skipping it (#5604). bb4018a
    Corrected bounds calculation ($5623). 5118534
    fix incorrect handling of SQ groups in header @ ImageMagick/ImageMagick#5606 0bc1022
    support 1-bit pixels 740ac65
    release f032690
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants