feat: support anonymization/pseudonymization for conditional references #51

jabberwoc · 2022-04-28T08:22:42Z

jabberwoc · 2022-05-03T08:38:18Z

Any comments on this?

jabberwoc · 2022-05-20T09:59:41Z

Now supports identifier search parameters in IfNoneExist properties of Bunde entry requests (Conditional creates).

chgl · 2022-10-10T14:58:12Z

src/FhirPseudonymizer/Microsoft.Health.Fhir.Anonymizer.Shared.Core/Utility/ReferenceUtility.cs

+            // Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
+            new Regex(@"^(?<prefix>(("
+                      + string.Join("|", ModelInfo.SupportedResources)
+                      + @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")


[A-Za-z0-9-.]{1,64}

while Resource.id is limited by this regex, an Identifier.value is not and can consist of arbitrary characters and also be of arbitrary length.

Yes, that was just stupid copy&paste on my part. 😬

chgl · 2022-10-10T15:16:44Z

src/FhirPseudonymizer/Microsoft.Health.Fhir.Anonymizer.Shared.Core/Utility/ReferenceUtility.cs

+            // Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
+            new Regex(@"^(?<prefix>(("
+                      + string.Join("|", ModelInfo.SupportedResources)
+                      + @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")


conditional references could also be more complicated by chaining multiple search parameters like so: ResearchSubject?patient.identifier=http://example.com/patient|123&study.identifier=http://example.com/study|456 which currently doesn't seem to be handled.

I didn't think of that, actually. This should be fixed along with the arbitrary Identifier.value length.

chgl · 2022-10-10T15:21:08Z

src/FhirPseudonymizer/Microsoft.Health.Fhir.Anonymizer.Shared.Core/Utility/ReferenceUtility.cs

+            // Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
+            new Regex(@"^(?<prefix>(("
+                      + string.Join("|", ModelInfo.SupportedResources)
+                      + @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")


identifier.system could also be an oid or uuid reference.

chgl · 2022-10-10T15:32:15Z

Thanks for the PR! There's only a conflict in the Dockerfile which you can address by rebasing on master.

Do you think there will be any breaking changes, ie. things that previously weren't pseudonymized now are or things that were pseudonymized now no longer are? (We should probably add some sort of snapshot testing to better protect against such regressions eventually). If no breaking changes are expected I'd be OK with merging despite my comments and simply iterating on the feature later on since the raised concerns may not actually affect the users (you) at this point. Putting this feature behind a feature flag to avoid any unintended changes might also be an option for peace of mind.

Off-topic, but I think it might be better to eventually get rid of the copy-pasted Microsoft.Health.Fhir.Anonymizer.Shared.Core code and simply use the upstream nuget package: microsoft/Tools-for-Health-Data-Anonymization#74. The solution implemented here could also be implemented by combining the generalize method which matches on an ifNoneExist identifier with a pseudonymize method. There's currently a PR open which allows extending the generalize method: microsoft/Tools-for-Health-Data-Anonymization#169.

jabberwoc · 2022-10-14T10:57:40Z

Do you think there will be any breaking changes, ie. things that previously weren't pseudonymized now are or things that were pseudonymized now no longer are? (We should probably add some sort of snapshot testing to better protect against such regressions eventually). If no breaking changes are expected I'd be OK with merging despite my comments and simply iterating on the feature later on since the raised concerns may not actually affect the users (you) at this point. Putting this feature behind a feature flag to avoid any unintended changes might also be an option for peace of mind.

I don't think this will be a breaking change. However, you made a valid point that this feature isn't quite ready as a general solution.
I was basically trying to solve a specific problem which is (possibly) the most common use case: Conditional references using identifiers and urls. At least, this is how we build our resources so that was my motivation behind this, obviously.

We use a separate fork of this project and have this feature in production for 5 months now. Along with other changes which are probably even more specific to our needs.
So there is currently no pressure to have this merged quickly from our side.

Off-topic, but I think it might be better to eventually get rid of the copy-pasted Microsoft.Health.Fhir.Anonymizer.Shared.Core code and simply use the upstream nuget package: microsoft/Tools-for-Health-Data-Anonymization#74. The solution implemented here could also be implemented by combining the generalize method which matches on an ifNoneExist identifier with a pseudonymize method. There's currently a PR open which allows extending the generalize method: microsoft/Tools-for-Health-Data-Anonymization#169.

That sounds promising. I will have a look at it when time permits. 😉

…es and identifier match urls

github-actions · 2023-01-19T21:24:51Z

🦙 MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Errors	Elapsed time
✅ CSHARP	dotnet-format	7	0	16.34s
✅ DOCKERFILE	hadolint	1	0	0.09s
✅ EDITORCONFIG	editorconfig-checker	9	0	0.01s
✅ JSON	eslint-plugin-jsonc	1	0	1.93s
✅ JSON	jsonlint	1	0	0.21s
✅ JSON	npm-package-json-lint	yes	no	0.38s
✅ JSON	prettier	1	0	0.4s
✅ JSON	v8r	1	0	2.46s
✅ REPOSITORY	checkov	yes	no	8.86s
✅ REPOSITORY	dustilock	yes	no	0.0s
✅ REPOSITORY	gitleaks	yes	no	0.27s
✅ REPOSITORY	git_diff	yes	no	0.01s
✅ REPOSITORY	secretlint	yes	no	0.75s
✅ REPOSITORY	syft	yes	no	0.52s
✅ REPOSITORY	trivy	yes	no	6.94s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by

…e flag

github-actions · 2023-01-20T10:50:56Z

Package	Line Rate	Branch Rate	Health
FhirPseudonymizer.Tests	99%	100%	✔
FhirPseudonymizer	78%	69%	✔
Summary	82% (526 / 641)	70% (74 / 106)	✔

Minimum allowed line rate is 50%

iter8 report

Experiment summary:
*******************

  Experiment completed: true
  No task failures: true
  Total number of tasks: 7
  Number of completed tasks: 7

Whether or not service level objectives (SLOs) are satisfied:
*************************************************************

  SLO Conditions                 | Satisfied
  --------------                 | ---------
  http/error-count <= 0          | true
  http/latency-mean (msec) <= 25 | true
  http/latency-p99 (msec) <= 50  | true
  

Latest observed values for metrics:
***********************************

  Metric                     | value
  -------                    | -----
  http/error-count           | 0.00
  http/error-rate            | 0.00
  http/latency-max (msec)    | 2014.22
  http/latency-mean (msec)   | 13.81
  http/latency-min (msec)    | 3.04
  http/latency-p50 (msec)    | 11.61
  http/latency-p75 (msec)    | 16.68
  http/latency-p90 (msec)    | 22.86
  http/latency-p95 (msec)    | 27.55
  http/latency-p99 (msec)    | 41.21
  http/latency-p99.9 (msec)  | 69.50
  http/latency-stddev (msec) | 26.72
  http/request-count         | 25000.00

chgl · 2023-01-20T10:51:55Z

@jabberwoc I've rebased and put the conditional reference pseudonymization behind a feature flag (arguably excessive given that the change shouldn't really cause anything backwards-incompatible...). You can enable it by setting Features__ConditionalReferencePseudonymization=true. You can address the comments above in a seperate PR should the need ever arise.

But this way you don't have to maintain your own fork and can benefit from the latest upstream features (e.g. https://github.com/miracum/fhir-pseudonymizer#usesystemtextjsonfhirserializer). Let me know if the changes look OK to you and I can go ahead and merge it.

jabberwoc · 2023-01-26T12:07:28Z

@chgl Yes, go ahead. Thanks!

miracum-bot · 2023-01-26T13:49:08Z

🎉 This PR is included in version 2.16.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

jabberwoc linked an issue Apr 28, 2022 that may be closed by this pull request

Conditional references #50

Closed

chgl reviewed Oct 10, 2022

View reviewed changes

jabberwoc marked this pull request as draft October 26, 2022 10:06

jabberwoc and others added 5 commits January 19, 2023 21:34

feat: support anonymization/pseudonymization for conditional referenc…

3492be1

…es and identifier match urls

fix whitespace formatting

0bf1a35

feat: support pseudonymization of Ids in Conditional create

a2285ed

fix whitespace formatting

60d9f73

refactor: use source generation for regexes

9bd3806

chgl force-pushed the 50-conditional-references branch from 92ac9e3 to f3a08d7 Compare January 19, 2023 21:21

chgl force-pushed the 50-conditional-references branch from f3a08d7 to 0a61eaa Compare January 19, 2023 21:25

refactor: hide conditional reference pseudonymization behind a featur…

bb5e061

…e flag

chgl force-pushed the 50-conditional-references branch from 0a61eaa to bb5e061 Compare January 20, 2023 10:47

jabberwoc marked this pull request as ready for review January 26, 2023 12:09

chgl approved these changes Jan 26, 2023

View reviewed changes

chgl merged commit a0d78d6 into master Jan 26, 2023

chgl deleted the 50-conditional-references branch January 26, 2023 13:24

miracum-bot added the released label Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support anonymization/pseudonymization for conditional references #51

feat: support anonymization/pseudonymization for conditional references #51

jabberwoc commented Apr 28, 2022 •

edited

Loading

jabberwoc commented May 3, 2022

jabberwoc commented May 20, 2022

chgl Oct 10, 2022

jabberwoc Oct 14, 2022

chgl Oct 10, 2022

jabberwoc Oct 14, 2022

chgl Oct 10, 2022

chgl commented Oct 10, 2022

jabberwoc commented Oct 14, 2022

github-actions bot commented Jan 19, 2023 •

edited

Loading

github-actions bot commented Jan 20, 2023 •

edited

Loading

chgl commented Jan 20, 2023

jabberwoc commented Jan 26, 2023

miracum-bot commented Jan 26, 2023

feat: support anonymization/pseudonymization for conditional references #51

feat: support anonymization/pseudonymization for conditional references #51

Conversation

jabberwoc commented Apr 28, 2022 • edited Loading

jabberwoc commented May 3, 2022

jabberwoc commented May 20, 2022

chgl Oct 10, 2022

Choose a reason for hiding this comment

jabberwoc Oct 14, 2022

Choose a reason for hiding this comment

chgl Oct 10, 2022

Choose a reason for hiding this comment

jabberwoc Oct 14, 2022

Choose a reason for hiding this comment

chgl Oct 10, 2022

Choose a reason for hiding this comment

chgl commented Oct 10, 2022

jabberwoc commented Oct 14, 2022

github-actions bot commented Jan 19, 2023 • edited Loading

🦙 MegaLinter status: ✅ SUCCESS

github-actions bot commented Jan 20, 2023 • edited Loading

iter8 report

chgl commented Jan 20, 2023

jabberwoc commented Jan 26, 2023

miracum-bot commented Jan 26, 2023

jabberwoc commented Apr 28, 2022 •

edited

Loading

github-actions bot commented Jan 19, 2023 •

edited

Loading

github-actions bot commented Jan 20, 2023 •

edited

Loading