SAFEHR-data · stefpiatek · Dec 23, 2024 · Dec 12, 2024 · Dec 12, 2024 · Dec 12, 2024
diff --git a/docs/design/decision-record/0001-multiservice-architecture.md b/docs/design/decision-record/0001-multiservice-architecture.md
@@ -0,0 +1,28 @@
+# Multi-service Architecture
+
+* Status: accepted
+* Deciders: Original PIXL team
+* Date: 2023-11-01 (retrospectively)
+
+## Context and Problem Statement
+
+We want a software solution that has distinct functionality. How do we structure those services.
+
+## Decision Drivers <!-- optional -->
+
+* Will need to allow for concurrent processing of tasks
+* Lingua franca of the team is python
+
+## Considered Options
+
+* Multi-service architecture
+* Monolith architecture
+
+## Decision Outcome
+
+Chosen option: "Multi-service architecture", because it allows us to:
+
+- Break up our code into logical packages and services
+- Get around a single python process being blocked by the global interpreter lock, as each service runs in its own process
+- Forces us to consider where code should go, and restrict services from using other services code
+- Works well with UCLH's restriction to deploy services using docker and docker compose, open to extend into kubernetes should we want that
diff --git a/docs/design/decision-record/0002-message-processing.md b/docs/design/decision-record/0002-message-processing.md
@@ -0,0 +1,47 @@
+# Message-based Processing of Images
+
+* Status: accepted
+* Deciders: Original PIXL team, most recent changes: Stef Piatek & Paul Smith
+* Date: 2024-12-12
+
+## Context and Problem Statement
+
+- We need a way to buffer the messages awaiting processing. 
+  We expect hundreds to tens of thousands of imaging studies to be requested per project, 
+  and want to find each study in the source systems individually. 
+
+
+## Decision Drivers <!-- optional -->
+
+- Be able to process multiple research projects at the same time
+- Should be persistent if services are taken down
+- Allow for a secondary DICOM source to be used if study isn't found in the primary
+- Limit the total number of images that are being processed for a given source system
+- Studies that have already been successfully exported for a project should not get processed again
+
+## Considered Options
+
+* Use of database alone to schedule run, with the CLI driving a long-running job to process all studies in research project
+* Use of queues to buffer requests that the `imaging-api` processes, database to track the status of exported studies
+
+## Decision Outcome
+
+Chosen option: `Queue for buffering requests, database to track status of export`, 
+because it fulfills all requirements and allows us to invest in use of generic technologies.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### Database alone
+
+
+* Good, simple to set up and use. Single solution for buffering requests and tracking status
+* Bad, bespoke solution where we could use a technology that can be transferrable to other situations 
+* Bad, complex setup to limit total queries per source system
+
+### Queue with database for status
+
+
+* Good, fulfills all requirements fairly easily, creating a queue for primary and another for secondary imaging sources
+* Good, because we have previously invested in rabbitmq as a technology
+* Bad, extra services to manage and extra development
+* Bad, because original implementation was broken and required effort to fix. Though we learned more about the libraries we're using.
diff --git a/docs/design/decision-record/0003-dicom-processing.md b/docs/design/decision-record/0003-dicom-processing.md
@@ -0,0 +1,44 @@
+# DICOM server and processing
+
+* Status: accepted
+* Deciders: Original PIXL team
+* Date: 2023-11-01 (retrospectively)
+
+## Context and Problem Statement
+
+We need to have a DICOM server to query DICOM images, store them, anonymise them and export them.
+
+
+## Decision Drivers <!-- optional -->
+
+* Will need to have a robust DICOM server that has been in use
+* Keep original studies in a cache locally to reduce use of clinical imaging systems if failures in anonymisation or export
+* The team's lingua franca is Python
+* Per-project anonymisation profiles and custom hashing of fields will require plugins to be written for anonymisation
+* UCLH infrastructure allows for running docker, but we don't have admin accounts and cannot install software directly onto the machines.
+
+## Considered Options
+
+* XNAT server
+* Orthanc server
+
+## Decision Outcome
+
+Chosen option: `Orthanc`, 
+because it's relatively lightweight, under development and allows for python-based extensions to be written. 
+
+## Pros and Cons of the Options <!-- optional -->
+
+### XNAT
+
+* Good, ARC has a history of using this in the medical imaging subgroup.
+* Good, widely regarded
+* Bad, heavyweight and has many more features than we need to use. May take longer to learn and deploy
+* Bad, does not allow for python-based plugins to be used for anonymisation without getting into running docker in docker
+
+### Orthanc
+
+* Good, has been battle tested
+* Good, has DICOMWeb plugin to allow for export via that modality
+* Good, allows for python-based plugins and running in docker
+* Bad, no previous usage within ARC. Will be teething problems
diff --git a/docs/design/decision-record/0004-multiple-project-configuration.md b/docs/design/decision-record/0004-multiple-project-configuration.md
@@ -0,0 +1,68 @@
+# Multiple-project configuration
+
+* Status: accepted
+* Deciders: Milan Malfait, Peter Tsrunchev, Jeremy Stein,  Stef Piatek
+* Date: 2024-03-05
+
+Technical Story: [PIXL can take multiple projects](https://github.com/SAFEHR-data/PIXL/issues/330)
+
+## Context and Problem Statement
+
+Each project should be able to define its own anonymisation profile and export destinations. 
+How can we store the secrets.
+
+![pixl-multi-project-config.png](../diagrams/pixl-multi-project-config.png)
+
+## Decision Drivers <!-- optional -->
+
+* When hashing a given field for an imaging study for two different projects, each project's hash should be different.
+  This it to avoid inappropriate linkage of data between research projects. 
+* Secure storage of secrets, especially for connection to destinations and per-project hashing salts
+
+## Considered Options
+
+* file-based: env files for docker or structured files for secrets 
+* self-hosted secret storage service
+* azure keyvault
+
+## Decision Outcome
+
+Chosen option: "azure keyvault", because its low hassle and gives us good control of secret access.
+
+### Positive Consequences <!-- optional -->
+
+* Single place for secrets, which multiple deployments can access
+* Can have secrets which expire, so even if compromised we limit the amount of secret leakage possible
+* Per-project salts stored in a separate keyvault than the export endpoints
+* Only need to define the destination type, with the keyvalut defining all the connection details
+
+### Negative Consequences <!-- optional -->
+
+* Requires other implementations to set up their own azure storage accounts or develop new secret management
+* Developers also have to update a `.env` file for running the system test
+* Slight increase in cost, can be slightly offset by caching credentials 
+
+## Pros and Cons of the Options <!-- optional -->
+
+### File-based
+
+* Good, simple to do
+* Bad, will keep on expanding as time goes on which can be a pain to maintain
+* Bad, no access control beyond unix permissions
+
+### Self-hosted secret storage
+
+* Good, fine-grained access control possible
+* Good, free in terms of upfront cost
+* Bad, another service to maintain - residual costs
+
+### Azure keyvault
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, fine-grained access control possible
+* Bad, slight increase in cost
+
+## Links <!-- optional -->
+
+* Routing of studies based on projects in [ADR-0005](0005-project-based-study-routing.md) 
diff --git a/docs/design/decision-record/0005-project-based-study-routing.md b/docs/design/decision-record/0005-project-based-study-routing.md
@@ -0,0 +1,50 @@
+# Project-based study routing
+
+* Status: accepted
+* Deciders: Stef Piatek, Paul Smith
+* Date: 2024-11-27
+
+Technical Story: [PIXL can take multiple projects](https://github.com/SAFEHR-data/PIXL/issues/330)
+
+## Context and Problem Statement
+
+Each study sent to `orthanc-anon` needs to be de-identified using the project-specific configuration. 
+We need a way to pass along this information along with the DICOM file.
+
+## Considered Options
+
+* DICOM tag: Adding a custom DICOM tag for the project name
+* custom REST API endpoint: Creating a custom REST API endpoint for `orthanc-anon` to pull the data from `orthanc-raw`
+
+## Decision Outcome
+
+Chosen option: "custom REST API endpoint", because the project tag updating was causing `orthanc-raw` to crash,
+and we were no longer using study stability to control export.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### DICOM tag
+
+Add private creator group to instances as they arrive, and a dummy value in the custom tag. 
+Once the study has been pulled from the DICOM source, update the tag with the filename stem of the project config.
+
+* Good, because you can see which study was most recently exported
+* Bad, `orthanc-raw` started crashing when updating the DICOM tag via the orthanc API on large studies
+* Bad, because we were having to update the DICOM tag without changing the instance UIDs. 
+  So that we can check for missing instances for studies which already exist in `orthanc-raw`
+* Bad, studies appeared where instances didn't have their custom DICOM tag updates
+* Bad, could have a race condition where the same study is trying to be exported by two projects
+
+### Custom REST API endpoint
+
+Once the study has been pulled from the DICOM source, send a REST request to `orthanc-anon` with the study UID and project.
+`orthanc-anon` then adds this job to a threadpool and returns a 200 status to the client. 
+
+* Good, keeping original instances unaltered
+* Good, thread-pooling allows for faster de-identification
+* Good, simpler flow
+* Bad, we can't alter the queue of de-identification jobs short of taking down `orthanc-anon`
+
+## Links <!-- optional -->
+
+* Related to multiple project configuration [ADR-0004](0004-multiple-project-configuration.md)
diff --git a/docs/design/decision-record/0006-data-export.md b/docs/design/decision-record/0006-data-export.md
@@ -0,0 +1,45 @@
+# Export of parquet files and DICOM data
+
+* Status: accepted
+* Deciders: Haroon Chughtai, Jeremy Stein, Milan Malfait, Ruaridh Gollifer, Stef Piatek
+* Date: 2024-02-26
+
+## Context and Problem Statement
+
+The pipeline needs to be able to export DICOM images and structured data files to different endpoints. 
+
+## Decision Drivers <!-- optional -->
+
+* We expect that some projects will have more data than we can store locally. Will need a rolling export of images
+* We will need to be able to export images and structured data via FTPS in an automated fashion
+* We will need to be able to export images via DICOMWeb
+
+
+## Considered Options
+
+* Shared python library for exporting of data, used in `orthanc-anon` and the `pixl` CLI.
+* `export-api` service, which can export both DICOM and structured data files. 
+
+## Decision Outcome
+
+Chosen options: "`export-api` service", for clear separation of responsibilities.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### Shared python library
+
+Add private creator group to instances as they arrive, and a dummy value in the custom tag.
+Once the study has been pulled from the DICOM source, update the tag with the filename stem of the project config.
+
+* Good, one less service to maintain
+* Good, export via DICOMWeb is using the orthanc API already
+* Bad, duplication of implementation for export
+* Bad, duplication of areas where secrets are used
+
+### `export-api` service
+
+Instead of shared library the code would be in the service alone. 
+
+* Good, single service that will access all secrets and orchestrate exports
+* Good, allows caching of export secrets in a long-running service
+* Bad, would require extra code for interacting with the service from the CLI for parquet export
diff --git a/docs/design/decision-record/index.md b/docs/design/decision-record/index.md
@@ -0,0 +1,19 @@
+# Architectural Decision Log
+
+This log lists the architectural decisions for PIXL.
+
+Using the [Markdown Architectural Decision Records](https://adr.github.io/madr/)
+
+Regenerate the content by using `adr-log -i` from this directory. 
+You can install it via `npm install -g adr-log`, removing the template
+
+<!-- adrlog -->
+
+* [ADR-0001](0001-multiservice-architecture.md) - Multi-service Architecture
+* [ADR-0002](0002-message-processing.md) - Message-based Processing of Images
+* [ADR-0003](0003-dicom-processing.md) - DICOM server and processing
+* [ADR-0004](0004-multiple-project-configuration.md) - Multiple-project configuration
+* [ADR-0005](0005-project-based-study-routing.md) - Project-based study routing
+* [ADR-0006](0006-data-export.md) - Export of parquet files and DICOM data
+
+<!-- adrlogstop -->
diff --git a/docs/design/decision-record/template.md b/docs/design/decision-record/template.md
@@ -0,0 +1,72 @@
+# [Template: short title of solved problem and solution]
+
+* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0105](0105-example.md)] <!-- optional -->
+* Deciders: [list everyone involved in the decision] <!-- optional -->
+* Date: [YYYY-MM-DD when the decision was last updated] <!-- optional -->
+
+Technical Story: [description | ticket/issue URL] <!-- optional -->
+
+## Context and Problem Statement
+
+[Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.]
+
+## Decision Drivers <!-- optional -->
+
+* [driver 1, e.g., a force, facing concern, …]
+* [driver 2, e.g., a force, facing concern, …]
+* … <!-- numbers of drivers can vary -->
+
+## Considered Options
+
+* [option 1]
+* [option 2]
+* [option 3]
+* … <!-- numbers of options can vary -->
+
+## Decision Outcome
+
+Chosen option: "[option 1]", because [justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force force | … | comes out best (see below)].
+
+### Positive Consequences <!-- optional -->
+
+* [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …]
+* …
+
+### Negative Consequences <!-- optional -->
+
+* [e.g., compromising quality attribute, follow-up decisions required, …]
+* …
+
+## Pros and Cons of the Options <!-- optional -->
+
+### [option 1]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+### [option 2]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+### [option 3]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+## Links <!-- optional -->
+
+* [Link type] [Link to ADR] <!-- example: Refined by [ADR-0005](0005-example.md) -->
+* … <!-- numbers of links can vary -->