-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Graham Thomas
committed
Jul 8, 2024
1 parent
93d7193
commit c8ea2a3
Showing
9 changed files
with
441 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55 changes: 55 additions & 0 deletions
55
sdk/healthdataaiservices/azure-health-deidentification/samples/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Azure Health Deidentification client library for Python | ||
Azure Health Deidentification is Microsoft's solution to anonymize unstructured health text. | ||
|
||
[Source code](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/healthdataaiservices/azure-health-deidentification/azure/health/deidentification) | ||
| [Package (PyPI)](https://pypi.org/project/azure-health-deidentification/) | ||
<!-- | [API reference documentation](https://aka.ms/azsdk-python-storage-blob-ref) --> | ||
| [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/healthdataaiservices/azure-health-deidentification/samples) | ||
|
||
|
||
## Getting started | ||
|
||
### Prerequisites | ||
* Python 3.8 or later is required to use this package. For more details, please read our page on [Azure SDK for Python version support policy](https://github.com/Azure/azure-sdk-for-python/wiki/Azure-SDKs-Python-version-support-policy). | ||
* You must have an [Azure subscription](https://azure.microsoft.com/free/) and an | ||
**Azure Deidentification Service** to use this package. | ||
|
||
### Install the package | ||
Install the Azure Health Deidentification client library for Python with [pip](https://pypi.org/project/pip/): | ||
|
||
```bash | ||
pip install azure-health-deidentification | ||
``` | ||
|
||
### Create a Deidentification Service | ||
If you wish to create a new storage account, you can use the | ||
[Azure Portal](https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal). | ||
|
||
### Create the client | ||
In order to create a Deidentification client you must obtain the **Service URL** from your Azure Deidentification Service | ||
|
||
```python | ||
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"] | ||
endpoint = endpoint.replace("https://", "") | ||
print(endpoint) | ||
# example: fuf4h4bxg5b0d0dr.api.cac001.deid.azure.com | ||
|
||
credential = DefaultAzureCredential() | ||
|
||
client = DeidentificationClient(endpoint, DefaultAzureCredential()) | ||
``` | ||
|
||
## Key concepts | ||
Operation Modes: | ||
- Tag: Will return a structure of offset and length with the PHI category of the related text spans. | ||
- Redact: Will return output text with placeholder stubbed text. ex. `[name]` | ||
- Surrogate: Will return output text with synthetic replacements. | ||
- `My name is John Smith` | ||
- `My name is Tom Jones` | ||
|
||
## Contributing | ||
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. | ||
|
||
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. | ||
|
||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [[email protected]](mailto:[email protected]) with any additional questions or comments. |
61 changes: 61 additions & 0 deletions
61
...e-health-deidentification/samples/async_samples/sample_realtime_deidentification_async.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# ------------------------------------ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# ------------------------------------ | ||
|
||
""" | ||
FILE: sample_realtime_deidentification_async.py | ||
DESCRIPTION: | ||
This sample demonstrates the most simple deidentification scenario. It takes in a string of text and will return | ||
the deidentified text. | ||
USAGE: | ||
python sample_realtime_deidentification_async.py | ||
Set the environment variables with your own values before running the sample: | ||
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the endpoint to your Deidentification Service resource. | ||
""" | ||
import asyncio | ||
|
||
|
||
async def sample_realtime_deidentification_async(): | ||
# [START realtime_deidentification] | ||
import os | ||
from azure.identity import DefaultAzureCredential | ||
from azure.health.deidentification import DeidentificationClient | ||
from azure.health.deidentification.models import ( | ||
DeidentificationResult, | ||
DeidentificationContent, | ||
OperationType, | ||
DocumentDataType, | ||
) | ||
|
||
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"] | ||
endpoint = endpoint.replace("https://", "") | ||
# uri decode | ||
print(endpoint) | ||
|
||
credential = DefaultAzureCredential() | ||
|
||
client = DeidentificationClient(endpoint, DefaultAzureCredential()) | ||
|
||
body = DeidentificationContent( | ||
input_text="Hello, my name is John Smith.", | ||
operation=OperationType.SURROGATE, | ||
data_type=DocumentDataType.PLAINTEXT, | ||
) | ||
|
||
result: DeidentificationResult = await client.deidentify(body) | ||
|
||
print(f'Original Text: "{body.input_text}"') | ||
print(f'Deidentified Text: "{result.output_text}"') | ||
# [END realtime_deidentification] | ||
|
||
|
||
async def main(): | ||
await sample_realtime_deidentification_async() | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
83 changes: 83 additions & 0 deletions
83
sdk/healthdataaiservices/azure-health-deidentification/samples/sample_create_and_wait_job.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# ------------------------------------ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# ------------------------------------ | ||
|
||
""" | ||
FILE: sample_create_and_wait_job.py | ||
DESCRIPTION: | ||
This sample demonstrates the most simple job-based deidentification scenario. | ||
It takes a blob uri as input and an input prefix. It will create a job and wait for the job to complete. | ||
USAGE: | ||
python sample_create_and_wait_job.py | ||
Set the environment variables with your own values before running the sample: | ||
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the endpoint to your Deidentification Service resource. | ||
2) AZURE_STORAGE_ACCOUNT_LOCATION - the location of the storage account where the input and output files are stored. | ||
This can be either a URL (which is configured with Managed Identity) or a SasURI. | ||
3) INPUT_PREFIX - the prefix of the input files in the storage account. | ||
""" | ||
|
||
|
||
import uuid | ||
|
||
|
||
def sample_create_and_wait_job(): | ||
# [START sample_create_and_wait_job] | ||
import os | ||
from azure.identity import DefaultAzureCredential | ||
from azure.health.deidentification import DeidentificationClient | ||
from azure.health.deidentification.models import ( | ||
DeidentificationJob, | ||
SourceStorageLocation, | ||
TargetStorageLocation, | ||
OperationType, | ||
DocumentDataType, | ||
) | ||
from azure.core.polling import LROPoller | ||
|
||
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"] | ||
endpoint = endpoint.replace("https://", "") | ||
# uri decode | ||
print(endpoint) | ||
|
||
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"] | ||
inputPrefix = os.environ["INPUT_PREFIX"] | ||
outputPrefix = "_output" | ||
|
||
credential = DefaultAzureCredential() | ||
|
||
client = DeidentificationClient( | ||
endpoint, | ||
DefaultAzureCredential(), | ||
connection_verify="localhost" not in endpoint, | ||
) | ||
|
||
jobname = f"sample-job-{uuid.uuid4().hex[:8]}" | ||
|
||
job = DeidentificationJob( | ||
source_location=SourceStorageLocation( | ||
location=storage_location, | ||
prefix=inputPrefix, | ||
), | ||
target_location=TargetStorageLocation( | ||
location=storage_location, prefix=outputPrefix | ||
), | ||
operation=OperationType.SURROGATE, | ||
data_type=DocumentDataType.PLAINTEXT, | ||
) | ||
|
||
lro: LROPoller = client.begin_create_job(jobname, job) | ||
lro.wait(timeout=60) | ||
|
||
finished_job: DeidentificationJob = lro.result() | ||
print(f"Job Name: {finished_job.name}") | ||
print(f"Job Status: {finished_job.status}") | ||
print(f"File Count: {finished_job.summary.total}") | ||
# [END sample_create_and_wait_job] | ||
|
||
|
||
if __name__ == "__main__": | ||
sample_create_and_wait_job() |
93 changes: 93 additions & 0 deletions
93
sdk/healthdataaiservices/azure-health-deidentification/samples/sample_list_job_files.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# ------------------------------------ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# ------------------------------------ | ||
|
||
""" | ||
FILE: sample_list_job_files.py | ||
DESCRIPTION: | ||
This sample demonstrates how to create a job, wait for it to finish, and then list the files associated with the job. | ||
USAGE: | ||
python sample_list_job_files.py | ||
Set the environment variables with your own values before running the sample: | ||
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the endpoint to your Deidentification Service resource. | ||
2) AZURE_STORAGE_ACCOUNT_LOCATION - the location of the storage account where the input and output files are stored. | ||
This can be either a URL (which is configured with Managed Identity) or a SasURI. | ||
3) INPUT_PREFIX - the prefix of the input files in the storage account. | ||
""" | ||
|
||
|
||
import uuid | ||
|
||
|
||
def sample_list_job_files(): | ||
# [START sample_list_job_files] | ||
import os | ||
from azure.identity import DefaultAzureCredential | ||
from azure.health.deidentification import DeidentificationClient | ||
from azure.health.deidentification.models import ( | ||
DeidentificationJob, | ||
SourceStorageLocation, | ||
TargetStorageLocation, | ||
OperationType, | ||
DocumentDataType, | ||
) | ||
from azure.core.polling import LROPoller | ||
|
||
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"] | ||
endpoint = endpoint.replace("https://", "") | ||
# uri decode | ||
print(endpoint) | ||
|
||
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"] | ||
inputPrefix = os.environ["INPUT_PREFIX"] | ||
outputPrefix = "_output" | ||
|
||
credential = DefaultAzureCredential() | ||
|
||
client = DeidentificationClient( | ||
endpoint, | ||
DefaultAzureCredential(), | ||
connection_verify="localhost" not in endpoint, | ||
) | ||
|
||
jobname = f"sample-job-{uuid.uuid4().hex[:8]}" | ||
|
||
job = DeidentificationJob( | ||
source_location=SourceStorageLocation( | ||
location=storage_location, | ||
prefix=inputPrefix, | ||
), | ||
target_location=TargetStorageLocation( | ||
location=storage_location, prefix=outputPrefix | ||
), | ||
operation=OperationType.SURROGATE, | ||
data_type=DocumentDataType.PLAINTEXT, | ||
) | ||
|
||
print(f"Creating job with name: {jobname}") | ||
poller: LROPoller = client.begin_create_job(jobname, job) | ||
poller.wait(timeout=60) | ||
|
||
job = poller.result() | ||
print(f"Job Status: {job.status}") | ||
|
||
files = client.list_job_files(job.name) | ||
|
||
print("Completed files (Max 10):") | ||
filesToLookThrough = 10 | ||
for f in files: | ||
print(f"\t - {f.input.path}") | ||
|
||
filesToLookThrough -= 1 | ||
if filesToLookThrough <= 0: | ||
break | ||
|
||
# [END sample_list_job_files] | ||
|
||
|
||
if __name__ == "__main__": | ||
sample_list_job_files() |
90 changes: 90 additions & 0 deletions
90
sdk/healthdataaiservices/azure-health-deidentification/samples/sample_list_jobs.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# ------------------------------------ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# ------------------------------------ | ||
|
||
""" | ||
FILE: sample_list_jobs.py | ||
DESCRIPTION: | ||
This sample demonstrates how to list the latest 5 jobs in the Deidentification Service resource. | ||
It will create a job and then list it using the list_jobs method. | ||
USAGE: | ||
python sample_list_jobs.py | ||
Set the environment variables with your own values before running the sample: | ||
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the endpoint to your Deidentification Service resource. | ||
2) AZURE_STORAGE_ACCOUNT_LOCATION - the location of the storage account where the input and output files are stored. | ||
This can be either a URL (which is configured with Managed Identity) or a SasURI. | ||
3) INPUT_PREFIX - the prefix of the input files in the storage account. | ||
""" | ||
|
||
|
||
import uuid | ||
|
||
|
||
def sample_list_jobs(): | ||
# [START sample_list_jobs] | ||
import os | ||
from azure.identity import DefaultAzureCredential | ||
from azure.health.deidentification import DeidentificationClient | ||
from azure.health.deidentification.models import ( | ||
DeidentificationJob, | ||
SourceStorageLocation, | ||
TargetStorageLocation, | ||
OperationType, | ||
DocumentDataType, | ||
) | ||
from azure.core.polling import LROPoller | ||
|
||
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"] | ||
endpoint = endpoint.replace("https://", "") | ||
# uri decode | ||
print(endpoint) | ||
|
||
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"] | ||
inputPrefix = os.environ["INPUT_PREFIX"] | ||
outputPrefix = "_output" | ||
|
||
credential = DefaultAzureCredential() | ||
|
||
client = DeidentificationClient( | ||
endpoint, | ||
DefaultAzureCredential(), | ||
connection_verify="localhost" not in endpoint, | ||
) | ||
|
||
jobname = f"sample-job-{uuid.uuid4().hex[:8]}" | ||
|
||
job = DeidentificationJob( | ||
source_location=SourceStorageLocation( | ||
location=storage_location, | ||
prefix=inputPrefix, | ||
), | ||
target_location=TargetStorageLocation( | ||
location=storage_location, prefix=outputPrefix | ||
), | ||
operation=OperationType.SURROGATE, | ||
data_type=DocumentDataType.PLAINTEXT, | ||
) | ||
|
||
print(f"Creating job with name: {jobname}") | ||
client.begin_create_job(jobname, job) | ||
|
||
jobs = client.list_jobs() | ||
|
||
print("Listing latest 5 jobs:") | ||
jobsToLookThrough = 5 | ||
for j in jobs: | ||
print(f"Job Name: {j.name}") | ||
|
||
jobsToLookThrough -= 1 | ||
if jobsToLookThrough <= 0: | ||
break | ||
|
||
# [END sample_list_jobs] | ||
|
||
|
||
if __name__ == "__main__": | ||
sample_list_jobs() |
Oops, something went wrong.