Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for using a hashed Study UID / Series UID instead of newly generated UID #2

Open
tblock79 opened this issue Dec 21, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@tblock79
Copy link
Member

No description provided.

@tblock79 tblock79 added the enhancement New feature or request label Dec 21, 2023
@chrstphmr
Copy link

chrstphmr commented Jan 2, 2024

Couple of thoughts on this:

What attributes to hash

UIDs: It'd be nice to have a global option to hash all UIDs that have not been explicitly set to 'keep' either in a preset or in the settings. This way, the various InstanceUIDs and, importantly, also the cross-references that point to an instance (ReferencedXXXUID) should be preserved in a relatively future-proof manner.

Beyond UIDs, the hash function should also be applicable to other tags, a few that come to my mind are:
(0008,0050) SH Accession Number
(0010,0020) LO Patient ID
(0010,0010) PN Patient's Name

How to hash

Compute a cryptographic hash function of the original value of the DICOM attribute together and a secret salt. The salt protects against pre-image attacks/rainbow tables.

The salt would need to be passed to the anonymizer as a module setting to ensure that the same input value results in the same output value. This way, multiple Mercure instances in one institution can also create reproducible hash values.
The module should fail if it's configured to hash but no salt has been passed to it.
I guess it's fair to pass on the generation of the salt to the Mercure admin, maybe with some guidance in the documentation explaning how to obtain a random salt.

There are multiple suitable hash functions of course, e.g. SHA-256. Alternatively, a key derivation function such as argon2id might be considered, but that is probably overkill.

Mapping the digest

The digest of the hash function needs to be mapped to a value that is valid for the DICOM value representation of the respective tag.

For SH and LO, this is should be straightforward, e.g. base64 or hex of the digest.

For UI, it's a bit more tricky: UIDs require either having an assigned org prefix or using 2.25. plus a 128 bit UUID. Either way, the UID cannot be longer than 64 decimal digits including separators. The digest length of many modern hash functions is longer than 64 decimal digits (e.g., 2**256 = 1e77). Easiest approach might be to truncate the digest to 128 bits and use the 2.25. prefix. The loss in entropy should be acceptable, considering that changing UIDs is required, but only adds little to the overall robustness of the anonymization as long as PixelData remains unchanged (see note 3 in ref 3).

The module should fail if no mapping function has been implemented for the VR of the attribute that is being processed.

Refs

  1. https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview
  2. https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.2.html
  3. https://dicom.nema.org/medical/dicom/current/output/chtml/part15/sect_e.3.9.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants