201 bugfix remove references to sites where data originated from #229

pcw24601 · 2022-04-07T11:46:33Z

Remove references to site locations from file/directory names and anonymises all dicom files in tests\ directory.

Previous commit kept original files as well as anonymised versions.

laurencejackson · 2022-04-14T14:52:39Z

Hi @pcw24601, thanks for this contribution - could you please change the target branch of this PR to release/0.7.0? This is the branch where we will stage releases and will merge the changes together into the main branch as each release is finalised.

pcw24601 · 2022-04-14T15:06:02Z

Hi @laurencejackson, I think I've done that now--let me know if it's OK (I'm off for a week now so there may be a delay in getting a reply).

github-actions · 2022-05-03T10:43:36Z

Coverage Report

File	Stmts	Miss	Cover	Missing
hazenlib
ACRObject.py	101	10	90%	59–61, 66–69, 107, 123–124
HazenTask.py	25	3	88%	32–34
__init__.py	54	17	69%	125–133, 145, 179–181, 184–186, 193–196, 207
exceptions.py	21	4	81%	17–21
relaxometry.py	285	86	70%	182–200, 377, 423–425, 492, 566–588, 606–621, 1037–1040, 1046–1052, 1085–1130
utils.py	189	43	77%	61, 65, 75, 80, 117, 124–129, 140, 143–150, 170–172, 190–192, 211–213, 222, 227, 233, 284, 287, 295–300, 303, 346, 355, 371
hazenlib/tasks
acr_geometric_accuracy.py	121	64	47%	41–73, 121–146, 160–194
acr_ghosting.py	109	43	61%	34–49, 75–77, 107–109, 145–186
acr_slice_position.py	135	47	65%	46–61, 187–233
acr_slice_thickness.py	134	58	57%	34–48, 157–216
acr_snr.py	135	63	53%	43–83, 93, 163–173, 207–222, 255–272
acr_spatial_resolution.py	209	71	66%	58–83, 128, 171, 184–193, 275–326
acr_uniformity.py	83	33	60%	35–51, 108–133
ghosting.py	152	53	65%	18–35, 50, 112–113, 117, 127–128, 154–156, 173–175, 221–259
relaxometry.py	7	7	0%	1–11
slice_position.py	117	23	80%	28, 37–38, 49, 103–104, 130, 210, 217–234
slice_width.py	357	53	85%	34–37, 41, 109, 168–188, 453, 458–459, 465, 470, 532–533, 782–823
snr.py	166	67	60%	51, 68–73, 167–185, 200–209, 227–237, 264–274, 279–289, 320–333, 338–346, 375–388
snr_map.py	104	1	99%	291
spatial_resolution.py	247	45	82%	36–39, 43, 64, 149, 208, 334–370
uniformity.py	79	20	75%	42–45, 51, 93–94, 101, 135–149
TOTAL	2855	811	72%

Tests	Skipped	Failures	Errors	Time
201	0 💤	0 ❌	0 🔥	2m 46s ⏱️

tomaroberts · 2022-05-06T10:02:24Z

@pcw24601

Thanks for this. Did you anonymise every DICOM in tests/data? What did you use to anonymise them and which specific tags did you alter?

pcw24601 · 2022-05-09T07:16:28Z

@tomaroberts

Yes--I anonymised every file using the algorithm below (very slightly adapted from Dicognito). A couple of points worth noting:

Files no longer strictly meet the DICOM standard as I created UI prefixes which I don't own and made no effort to maintain consistency with them--if two fields contained the same UI in the original files, they will have different UIs in the anonymised versions.
These original DICOM files are all currently in the public domain in the git history, so anonymising now is fairly weak, although better than nothing.

Algorithm

Files are anonymised by:
    
    1. Replacing [UI] fields (except 'ClassUID' and 'TransferSyntaxUID')
        with new values of the form '2.datetime.counter.ramdom'.
        
    2. Setting all [PN] fields to 'Anon'
    
    3. Removing the following fields (if they exist): 
        "BranchOfService",
        "Occupation",
        "MedicalRecordLocator",
        "MilitaryRank",
        "PatientInsurancePlanCodeSequence",
        "PatientReligiousPreference",
        "PatientTelecomInformation",
        "PatientTelephoneNumbers",
        "ReferencedPatientPhotoSequence",
        "ResponsibleOrganization"
        
    4. Set the following fields to 'Anon.random_string' (if they exist):
        "AccessionNumber",
        "OtherPatientIDs",
        "FillerOrderNumberImagingServiceRequest",
        "FillerOrderNumberImagingServiceRequestRetired",
        "FillerOrderNumberProcedure",
        "PatientID",
        "PerformedProcedureStepID",
        "PlacerOrderNumberImagingServiceRequest",
        "PlacerOrderNumberImagingServiceRequestRetired",
        "PlacerOrderNumberProcedure",
        "RequestedProcedureID",
        "ScheduledProcedureStepID",
        "StudyID"
        
    5. Set the following fields to 'Anon' (if they exist):
        "RequestingService",
        "CurrentPatientLocation",
        "PatientAddress",
        "RegionOfResidence",
        "CountryOfResidence",
        "InstitutionName",
        "InstitutionAddress",
        "InstitutionalDepartmentName",
        "StationName"
        "DeviceSerialNumber"
        "ProtocolName"
        "StudyDescription"
        "PerformedProcedureStepDescription"
        "PerformedStationName"
        "PerformedStationAETitle"
        
    6. Set "DeidentificationMethod" to 'DICOGNITO_BHSCT'
        (or append if value exists).
        
    7. DICOMDIR files are skipped

tomaroberts · 2022-05-09T12:42:15Z

Thanks for info Paul.

Nice, I've never seen dicognito before. Just used pydicom in the past when anonymising my own stuff.

True re: Github history, but definitely better to sort before hazen becomes more widely adopted so fewer people don't have the files on their machines.

pcw24601 · 2022-05-10T07:17:27Z

There are definite advantages to anonymising by hand, for example, you can hunt through the private fields. However, it's a slow process especially for phantom data were it's unlikely patient data will be exposed.

tomaroberts · 2023-07-17T13:40:28Z

Error on my part – merged main into here, forgetting that Paul had updated the files in this PR. Hence, the files he changed were overwritten with newer files. I've force-pushed back to his previous most recent commit.

…es-where-data-originated-from

tomaroberts · 2023-07-17T14:06:58Z

@tomaroberts – reminder to self, merge all the other PRs into main for the next release, then merge this one last. Should be easier to confirm that file changes are stable.

…data-originated-from

tomaroberts

Checked across repo to ensure any old filenames have been adjusted correctly. Tested locally with pytest. GHA tests passing. LGTM.

pcw24601 added 2 commits April 7, 2022 12:24

Refactor to remove site names.

7377cd8

Anonymises DICOM files

6634d1b

pcw24601 requested review from laurencejackson and Lucrezia-Cester April 7, 2022 11:46

pcw24601 self-assigned this Apr 7, 2022

pcw24601 linked an issue Apr 7, 2022 that may be closed by this pull request

Bugfix: remove references to sites where data originated from #201

Closed

Fix directory structure for test dicom files

992574c

Previous commit kept original files as well as anonymised versions.

pcw24601 changed the base branch from main to release/0.7.0 April 14, 2022 14:57

tomaroberts changed the base branch from release/0.7.0 to main July 17, 2023 13:13

tomaroberts force-pushed the 201-bugfix-remove-references-to-sites-where-data-originated-from branch from 820ffbe to 992574c Compare July 17, 2023 13:39

tomaroberts changed the base branch from main to release/0.7.0 July 17, 2023 13:42

Merge branch 'release/0.7.0' into 201-bugfix-remove-references-to-sit…

fc836a3

…es-where-data-originated-from

tomaroberts changed the base branch from release/0.7.0 to main July 28, 2023 13:54

tomaroberts added 3 commits July 28, 2023 16:42

Merge branch 'main' into 201-bugfix-remove-references-to-sites-where-…

99d56c0

…data-originated-from

Tweak references in test_utils

71ebb37

Fixes pathing error

d0994a1

tomaroberts requested review from tomaroberts and removed request for Lucrezia-Cester July 28, 2023 15:52

tomaroberts approved these changes Jul 28, 2023

View reviewed changes

tomaroberts merged commit 72aace8 into main Jul 28, 2023

tomaroberts deleted the 201-bugfix-remove-references-to-sites-where-data-originated-from branch July 28, 2023 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

201 bugfix remove references to sites where data originated from #229

201 bugfix remove references to sites where data originated from #229

pcw24601 commented Apr 7, 2022

laurencejackson commented Apr 14, 2022

pcw24601 commented Apr 14, 2022

github-actions bot commented May 3, 2022 •

edited

Loading

tomaroberts commented May 6, 2022

pcw24601 commented May 9, 2022 •

edited

Loading

tomaroberts commented May 9, 2022

pcw24601 commented May 10, 2022

tomaroberts commented Jul 17, 2023 •

edited

Loading

tomaroberts commented Jul 17, 2023

tomaroberts left a comment

201 bugfix remove references to sites where data originated from #229

201 bugfix remove references to sites where data originated from #229

Conversation

pcw24601 commented Apr 7, 2022

laurencejackson commented Apr 14, 2022

pcw24601 commented Apr 14, 2022

github-actions bot commented May 3, 2022 • edited Loading

tomaroberts commented May 6, 2022

pcw24601 commented May 9, 2022 • edited Loading

tomaroberts commented May 9, 2022

pcw24601 commented May 10, 2022

tomaroberts commented Jul 17, 2023 • edited Loading

tomaroberts commented Jul 17, 2023

tomaroberts left a comment

Choose a reason for hiding this comment

github-actions bot commented May 3, 2022 •

edited

Loading

pcw24601 commented May 9, 2022 •

edited

Loading

tomaroberts commented Jul 17, 2023 •

edited

Loading