Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for non-compliant DICOM and edge cases #327

Open
3 of 8 tasks
lnogueir opened this issue Jun 5, 2024 · 8 comments
Open
3 of 8 tasks

Improve support for non-compliant DICOM and edge cases #327

lnogueir opened this issue Jun 5, 2024 · 8 comments

Comments

@lnogueir
Copy link
Contributor

lnogueir commented Jun 5, 2024

I was playing around with the pydicom test files and noticed that parsing (with SkipProcessingPixelDataValue() option enabled) the following files result in errors:

These files were parseable by dcm4che.

For most of them, the issue is related to missing some or all meta tags.
Ideally, if the meta tags aren't included, we should ignore and try parsing the first data element. And we can guess the transfer syntax by peeking the first tag and VR of the first element. We try [implicit LE, explicit BE, explicit LE], whichever one decodes a known tag and VR is likely the correct transfer syntax and we should use that for the rest of the dataset.

@suyashkumar
Copy link
Owner

suyashkumar commented Jun 8, 2024

Thanks @lnogueir! I agree that we can attempt some enhanced fallback behavior, particularly if that's standard practice in other parsers. Our current fallback behavior for no transfer syntax found in the Metadata is to proceed with little endian implicit but it shouldn't be too difficult to try a couple different transfer syntaxes speculatively.

@suyashkumar
Copy link
Owner

For missing metadata group length we do hav a flag for that, but can consider automatic fallback behavior:

dicom/read.go

Line 184 in 65259e5

if !r.opts.allowMissingMetaElementGroupLength {

I think originally the idea was the safest option is to force callers to be as explicit as possible about what loosened restrictions they may want in case there are any safety issues with fallback behavior or in case the fact the dicom is not compliant is something the user would want raised to them (in case that led to other concerns). That being said, recently I've been thinking it's more reasonable to just have more automatic fallbacks, at least ones that are "standard" in the industry for dicom non-compliance.

@suyashkumar
Copy link
Owner

Going through these in a little more detail, and adding some more notes:

  1. Easy fix (have working locally)
  2. Easy fix (have working locally)
  3. This appears to parse fine, but iiuc the problem is the PixelData is intentionally truncated? So the vl says 8192 but there aren't that many bytes left in the dicom. We treat this as an error, which seems reasonable imo but we can discuss more.
  4. Bug/implementation needed with UN unknown sequences
  5. Needs more investigation, seems related to the UN sequences / tags possibly similar to 4.
  6. Also seems possible related to UN sequences / tag handling
  7. Same as 4/5/6 at first glance?
  8. Should parse fine with allowMissingMetaElementGroupLength but need to check

@suyashkumar
Copy link
Owner

The draft changes in #330 address 1 and 2.

@lnogueir
Copy link
Contributor Author

lnogueir commented Jun 9, 2024

For missing metadata group length we do hav a flag for that, but can consider automatic fallback behavior:

dicom/read.go

Line 184 in 65259e5

if !r.opts.allowMissingMetaElementGroupLength {

I think originally the idea was the safest option is to force callers to be as explicit as possible about what loosened restrictions they may want in case there are any safety issues with fallback behavior or in case the fact the dicom is not compliant is something the user would want raised to them (in case that led to other concerns). That being said, recently I've been thinking it's more reasonable to just have more automatic fallbacks, at least ones that are "standard" in the industry for dicom non-compliance.

I agree. I think it's better to have these automatic fallbacks and just log warnings if non-compliant things are found. That's the behavior most libraries choose I believe.

@suyashkumar
Copy link
Owner

suyashkumar commented Jun 9, 2024

Also, the changes in #331 address 6 (with SkipPixelData). They almost address 4 but some other changes are still needed for that.

suyashkumar added a commit that referenced this issue Jun 9, 2024
This addresses #220 (and one of the dicoms in #327) by treating elements with a VR=UN and undefined VL as Sequences.

This pulls from work that I and @jabillings did previously (#235).

Some follow on changes may be needed to support this part of [the spec CP](https://dicom.nema.org/dicom/cp/cp246_01.pdf):
> If at some point an application knows the actual VR for an Attribute of VR UN (e.g. has its own applicable data dictionary), it can assume that the Value Field of the Attribute is encoded in Little Endian byte ordering with implicit VR encoding, irrespective of the current TransferSyntax.

---------

Co-authored-by: jabillings <[email protected]>
suyashkumar added a commit that referenced this issue Jun 10, 2024
This fixes #328. This change will also help set the stage for roundtrip tests for transfer syntax uid inference in the case of missing transfer syntax UIDs (part of #327).

For the future:

should we give this option as something to always override whatever transfer syntax is present (or if not present, use the override as well)?
suyashkumar added a commit that referenced this issue Jun 10, 2024
…next 100 bytes. (#330)

This PR attempts to infer missing transfer syntax in dicoms during Parse.

Specifically: 
* When transfer syntax is missing in dicom metadata, attempt to infer the correct transfer syntax by peeking the next 100bytes and trying to read an element without an error. This isn't foolproof, but one option to start with.
* This also makes test updates to support testfiles/ that may not have PixelData.
* This also introduces a write option to write dicoms without transfer syntax elements, in order to write some "roundtrip" unit tests for this behavior on Parse.

I was able to successfully test using some test data from #327, but I need to do some more investigation to see if we can safely add those to our test files (licensing and otherwise).

Things to consider in the future:
* Try deflated little endian explicit as well. 
* Peek more/less than the initial 100 bytes, or move away from a fixed peek.
@suyashkumar
Copy link
Owner

Edited the original comment to update current status, link to PRs, etc.

@lnogueir
Copy link
Contributor Author

I noticed that the following files (also from pydicom) do not error, but drop values:

  1. rtdose_rle.dcm
  2. rtdose_rle_1frame.dcm (seems to be the same issue as the file above)

The issue seems to be that they change the transfer syntax when writing the SQ dataset. The top level dataset is encoded with explicit little endian, but the dataset of the Referenced RT Plan Sequence is encoded with implicit little endian.

These were the pydicom issues that handled that: pydicom/pydicom#1035, pydicom/pydicom#1067

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants