Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support anonymization/pseudonymization for conditional references #51

Merged
merged 6 commits into from
Jan 26, 2023

Conversation

jabberwoc
Copy link
Contributor

@jabberwoc jabberwoc commented Apr 28, 2022

@jabberwoc jabberwoc linked an issue Apr 28, 2022 that may be closed by this pull request
@jabberwoc
Copy link
Contributor Author

Any comments on this?

@jabberwoc
Copy link
Contributor Author

Now supports identifier search parameters in IfNoneExist properties of Bunde entry requests (Conditional creates).

// Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
new Regex(@"^(?<prefix>(("
+ string.Join("|", ModelInfo.SupportedResources)
+ @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[A-Za-z0-9-.]{1,64}

while Resource.id is limited by this regex, an Identifier.value is not and can consist of arbitrary characters and also be of arbitrary length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was just stupid copy&paste on my part. 😬

// Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
new Regex(@"^(?<prefix>(("
+ string.Join("|", ModelInfo.SupportedResources)
+ @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditional references could also be more complicated by chaining multiple search parameters like so: ResearchSubject?patient.identifier=http://example.com/patient|123&study.identifier=http://example.com/study|456 which currently doesn't seem to be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of that, actually. This should be fixed along with the arbitrary Identifier.value length.

// Regex for conditional references (https://www.hl7.org/fhir/http.html#trules) or search parameters with identifier
new Regex(@"^(?<prefix>(("
+ string.Join("|", ModelInfo.SupportedResources)
+ @")\?)?identifier=((http|https)://([A-Za-z0-9\\\/\.\:\%\$\-])*\|)?)(?<id>[A-Za-z0-9\-\.]{1,64})$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier.system could also be an oid or uuid reference.

@chgl
Copy link
Contributor

chgl commented Oct 10, 2022

Thanks for the PR! There's only a conflict in the Dockerfile which you can address by rebasing on master.

Do you think there will be any breaking changes, ie. things that previously weren't pseudonymized now are or things that were pseudonymized now no longer are? (We should probably add some sort of snapshot testing to better protect against such regressions eventually). If no breaking changes are expected I'd be OK with merging despite my comments and simply iterating on the feature later on since the raised concerns may not actually affect the users (you) at this point. Putting this feature behind a feature flag to avoid any unintended changes might also be an option for peace of mind.

Off-topic, but I think it might be better to eventually get rid of the copy-pasted Microsoft.Health.Fhir.Anonymizer.Shared.Core code and simply use the upstream nuget package: microsoft/Tools-for-Health-Data-Anonymization#74. The solution implemented here could also be implemented by combining the generalize method which matches on an ifNoneExist identifier with a pseudonymize method. There's currently a PR open which allows extending the generalize method: microsoft/Tools-for-Health-Data-Anonymization#169.

@jabberwoc
Copy link
Contributor Author

Do you think there will be any breaking changes, ie. things that previously weren't pseudonymized now are or things that were pseudonymized now no longer are? (We should probably add some sort of snapshot testing to better protect against such regressions eventually). If no breaking changes are expected I'd be OK with merging despite my comments and simply iterating on the feature later on since the raised concerns may not actually affect the users (you) at this point. Putting this feature behind a feature flag to avoid any unintended changes might also be an option for peace of mind.

I don't think this will be a breaking change. However, you made a valid point that this feature isn't quite ready as a general solution.
I was basically trying to solve a specific problem which is (possibly) the most common use case: Conditional references using identifiers and urls. At least, this is how we build our resources so that was my motivation behind this, obviously.

We use a separate fork of this project and have this feature in production for 5 months now. Along with other changes which are probably even more specific to our needs.
So there is currently no pressure to have this merged quickly from our side.

Off-topic, but I think it might be better to eventually get rid of the copy-pasted Microsoft.Health.Fhir.Anonymizer.Shared.Core code and simply use the upstream nuget package: microsoft/Tools-for-Health-Data-Anonymization#74. The solution implemented here could also be implemented by combining the generalize method which matches on an ifNoneExist identifier with a pseudonymize method. There's currently a PR open which allows extending the generalize method: microsoft/Tools-for-Health-Data-Anonymization#169.

That sounds promising. I will have a look at it when time permits. 😉

@jabberwoc jabberwoc marked this pull request as draft October 26, 2022 10:06
@chgl chgl force-pushed the 50-conditional-references branch from 92ac9e3 to f3a08d7 Compare January 19, 2023 21:21
@github-actions
Copy link

github-actions bot commented Jan 19, 2023

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ CSHARP dotnet-format 7 0 16.34s
✅ DOCKERFILE hadolint 1 0 0.09s
✅ EDITORCONFIG editorconfig-checker 9 0 0.01s
✅ JSON eslint-plugin-jsonc 1 0 1.93s
✅ JSON jsonlint 1 0 0.21s
✅ JSON npm-package-json-lint yes no 0.38s
✅ JSON prettier 1 0 0.4s
✅ JSON v8r 1 0 2.46s
✅ REPOSITORY checkov yes no 8.86s
✅ REPOSITORY dustilock yes no 0.0s
✅ REPOSITORY gitleaks yes no 0.27s
✅ REPOSITORY git_diff yes no 0.01s
✅ REPOSITORY secretlint yes no 0.75s
✅ REPOSITORY syft yes no 0.52s
✅ REPOSITORY trivy yes no 6.94s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

@chgl chgl force-pushed the 50-conditional-references branch from f3a08d7 to 0a61eaa Compare January 19, 2023 21:25
@chgl chgl force-pushed the 50-conditional-references branch from 0a61eaa to bb5e061 Compare January 20, 2023 10:47
@github-actions
Copy link

github-actions bot commented Jan 20, 2023

Code Coverage

Package Line Rate Branch Rate Health
FhirPseudonymizer.Tests 99% 100%
FhirPseudonymizer 78% 69%
Summary 82% (526 / 641) 70% (74 / 106)

Minimum allowed line rate is 50%


iter8 report

Experiment summary:
*******************

  Experiment completed: true
  No task failures: true
  Total number of tasks: 7
  Number of completed tasks: 7

Whether or not service level objectives (SLOs) are satisfied:
*************************************************************

  SLO Conditions                 | Satisfied
  --------------                 | ---------
  http/error-count <= 0          | true
  http/latency-mean (msec) <= 25 | true
  http/latency-p99 (msec) <= 50  | true
  

Latest observed values for metrics:
***********************************

  Metric                     | value
  -------                    | -----
  http/error-count           | 0.00
  http/error-rate            | 0.00
  http/latency-max (msec)    | 2014.22
  http/latency-mean (msec)   | 13.81
  http/latency-min (msec)    | 3.04
  http/latency-p50 (msec)    | 11.61
  http/latency-p75 (msec)    | 16.68
  http/latency-p90 (msec)    | 22.86
  http/latency-p95 (msec)    | 27.55
  http/latency-p99 (msec)    | 41.21
  http/latency-p99.9 (msec)  | 69.50
  http/latency-stddev (msec) | 26.72
  http/request-count         | 25000.00
  

@chgl
Copy link
Contributor

chgl commented Jan 20, 2023

@jabberwoc I've rebased and put the conditional reference pseudonymization behind a feature flag (arguably excessive given that the change shouldn't really cause anything backwards-incompatible...). You can enable it by setting Features__ConditionalReferencePseudonymization=true. You can address the comments above in a seperate PR should the need ever arise.

But this way you don't have to maintain your own fork and can benefit from the latest upstream features (e.g. https://github.com/miracum/fhir-pseudonymizer#usesystemtextjsonfhirserializer). Let me know if the changes look OK to you and I can go ahead and merge it.

@jabberwoc
Copy link
Contributor Author

@chgl Yes, go ahead. Thanks!

@jabberwoc jabberwoc marked this pull request as ready for review January 26, 2023 12:09
@chgl chgl merged commit a0d78d6 into master Jan 26, 2023
@chgl chgl deleted the 50-conditional-references branch January 26, 2023 13:24
@miracum-bot
Copy link

🎉 This PR is included in version 2.16.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conditional references
3 participants