Skip to content

Commit

Permalink
Initial Creation of azure-health-deidentification Dataplane SDK (#36041)
Browse files Browse the repository at this point in the history
* Initial commit of Health.Deidentification dataplane

* Use MI instead of SAS

* Regenerates with Plaintext

* Adds rest of tests

* First attempt patch

* Patch Attempt #2

* Patch Attempt #3

* Creates base recordings

* Fixes sanitizers; Test replay functioning

* Creates all sync samples

* Creates all async samples

* Adds description in readme

* Adds tsplocation

* Checkpoint

* Executes test recording migration

* Adds pipeline yamls

* Updates ci.yml triggers

* Removes ArtifactName from ci.yaml

* Fixes analysis failures

* Fixes analysis failures 2

* Update sdk/healthdataaiservices/ci.yml

Co-authored-by: Scott Beddall <[email protected]>

* Updates test.yml

* Uniquifier default to false for pipelines

* Updates from feedback

* Updates from feedback 2

---------

Co-authored-by: Graham Thomas <[email protected]>
Co-authored-by: Scott Beddall <[email protected]>
  • Loading branch information
3 people authored Jul 24, 2024
1 parent cf6238a commit da60a42
Show file tree
Hide file tree
Showing 64 changed files with 8,935 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,10 @@
# PRLabel: %Cognitive - Text Analytics
/sdk/textanalytics/ @quentinRobinson @wangyuantao

# ServiceLabel: %Health Deidentification
# PRLabel: %Health Deidentification
/sdk/healthdataaiservices/ @GrahamMThomas @danielszaniszlo

# AzureSdkOwners: @YalinLi0312
# ServiceLabel: %Cognitive - Form Recognizer
# ServiceOwners: @bojunehsu @vkurpad
Expand Down
14 changes: 14 additions & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@
"guids",
"hanaonazure",
"hdinsight",
"healthdataaiservices",
"heapq",
"hexlify",
"himds",
Expand Down Expand Up @@ -402,6 +403,7 @@
"unpad",
"unpadder",
"unpartial",
"uniquifier",
"unredacted",
"unseekable",
"unsubscriptable",
Expand Down Expand Up @@ -440,6 +442,7 @@
"BUILDID",
"documentdb",
"chdir",
"radiculopathy",
"reqs",
"rgpy",
"swaggertosdk",
Expand Down Expand Up @@ -1840,6 +1843,17 @@
"words": [
"dcid"
]
},
{
"filename": "sdk/healthdataaiservices/azure-health-deidentification/**",
"words": [
"deid",
"deidservices",
"deidentification",
"healthdataaiservices",
"deidentify",
"deidentified"
]
}
],
"allowCompoundWords": true
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Release History

## 1.0.0b1 (1970-01-01)

- Initial version

### Features Added

- Initial Code
21 changes: 21 additions & 0 deletions sdk/healthdataaiservices/azure-health-deidentification/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Copyright (c) Microsoft Corporation.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include *.md
include LICENSE
include azure/health/deidentification/py.typed
recursive-include tests *.py
recursive-include samples *.py *.md
include azure/__init__.py
include azure/health/__init__.py
108 changes: 108 additions & 0 deletions sdk/healthdataaiservices/azure-health-deidentification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@


# Azure Health Deidentification client library for Python
Azure.Health.Deidentification is a managed service that enables users to tag, redact, or surrogate health data.

## Getting started

### Install the package

```bash
python -m pip install azure-health-deidentification
```

#### Prequisites

- Python 3.8 or later is required to use this package.
- You need an [Azure subscription][azure_sub] to use this package.
- An existing Azure Health Deidentification instance.
#### Create with an Azure Active Directory Credential
To use an [Azure Active Directory (AAD) token credential][authenticate_with_token],
provide an instance of the desired credential type obtained from the
[azure-identity][azure_identity_credentials] library.

To authenticate with AAD, you must first [pip][pip] install [`azure-identity`][azure_identity_pip]

After setup, you can choose which type of [credential][azure_identity_credentials] from azure.identity to use.
As an example, [DefaultAzureCredential][default_azure_credential] can be used to authenticate the client:

Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables:
`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`

Use the returned token credential to authenticate the client:

```python
>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
```

## Key concepts

**Operation Modes**
- Tag: Will return a structure of offset and length with the PHI category of the related text spans.
- Redact: Will return output text with placeholder stubbed text. ex. `[name]`
- Surrogate: Will return output text with synthetic replacements.
- `My name is John Smith`
- `My name is Tom Jones`

**Job Integration with Azure Storage**
Instead of sending text, you can send an Azure Storage Location to the service. We will asynchronously
process the list of files and output the deidentified files to a location of your choice.

Limitations:
- Maximum file count per job: 1000 documents
- Maximum file size per file: 2 MB

## Examples

```python
>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> from azure.core.exceptions import HttpResponseError

>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
>>> try:
<!-- write test code here -->
except HttpResponseError as e:
print('service responds error: {}'.format(e.response.json()))

```

## Next steps

- Find a bug, or have feedback? Raise an issue with "Health Deidentification" Label.


## Troubleshooting

- **Unabled to Access Source or Target Storage**
- Ensure you create your deid service with a system assigned managed identity
- Ensure your storage account has given permissions to that managed identity

## Contributing

This project welcomes contributions and suggestions. Most contributions require
you to agree to a Contributor License Agreement (CLA) declaring that you have
the right to, and actually do, grant us the rights to use your contribution.
For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether
you need to provide a CLA and decorate the PR appropriately (e.g., label,
comment). Simply follow the instructions provided by the bot. You will only
need to do this once across all repos using our CLA.

This project has adopted the
[Microsoft Open Source Code of Conduct][code_of_conduct]. For more information,
see the Code of Conduct FAQ or contact [email protected] with any
additional questions or comments.

<!-- LINKS -->
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/
[authenticate_with_token]: https://docs.microsoft.com/azure/cognitive-services/authentication?tabs=powershell#authenticate-with-an-authentication-token
[azure_identity_credentials]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#credentials
[azure_identity_pip]: https://pypi.org/project/azure-identity/
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#defaultazurecredential
[pip]: https://pypi.org/project/pip/
[azure_sub]: https://azure.microsoft.com/free/

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/healthdataaiservices/azure-health-deidentification",
"Tag": "python/healthdataaiservices/azure-health-deidentification_a8eed6d322"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# coding=utf-8
# --------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# Code generated by Microsoft (R) Python Code Generator.
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from ._client import DeidentificationClient
from ._version import VERSION

__version__ = VERSION

try:
from ._patch import __all__ as _patch_all
from ._patch import * # pylint: disable=unused-wildcard-import
except ImportError:
_patch_all = []
from ._patch import patch_sdk as _patch_sdk

__all__ = [
"DeidentificationClient",
]
__all__.extend([p for p in _patch_all if p not in __all__])

_patch_sdk()
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# coding=utf-8
# --------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# Code generated by Microsoft (R) Python Code Generator.
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from copy import deepcopy
from typing import Any, TYPE_CHECKING
from typing_extensions import Self

from azure.core import PipelineClient
from azure.core.pipeline import policies
from azure.core.rest import HttpRequest, HttpResponse

from ._configuration import DeidentificationClientConfiguration
from ._operations import DeidentificationClientOperationsMixin
from ._serialization import Deserializer, Serializer

if TYPE_CHECKING:
# pylint: disable=unused-import,ungrouped-imports
from azure.core.credentials import TokenCredential


class DeidentificationClient(
DeidentificationClientOperationsMixin
): # pylint: disable=client-accepts-api-version-keyword
"""DeidentificationClient.
:param endpoint: Url of your De-identification Service. Required.
:type endpoint: str
:param credential: Credential used to authenticate requests to the service. Required.
:type credential: ~azure.core.credentials.TokenCredential
:keyword api_version: The API version to use for this operation. Default value is
"2024-07-12-preview". Note that overriding this default value may result in unsupported
behavior.
:paramtype api_version: str
:keyword int polling_interval: Default waiting time between two polls for LRO operations if no
Retry-After header is present.
"""

def __init__(self, endpoint: str, credential: "TokenCredential", **kwargs: Any) -> None:
_endpoint = "https://{endpoint}"
self._config = DeidentificationClientConfiguration(endpoint=endpoint, credential=credential, **kwargs)
_policies = kwargs.pop("policies", None)
if _policies is None:
_policies = [
policies.RequestIdPolicy(**kwargs),
self._config.headers_policy,
self._config.user_agent_policy,
self._config.proxy_policy,
policies.ContentDecodePolicy(**kwargs),
self._config.redirect_policy,
self._config.retry_policy,
self._config.authentication_policy,
self._config.custom_hook_policy,
self._config.logging_policy,
policies.DistributedTracingPolicy(**kwargs),
policies.SensitiveHeaderCleanupPolicy(**kwargs) if self._config.redirect_policy else None,
self._config.http_logging_policy,
]
self._client: PipelineClient = PipelineClient(base_url=_endpoint, policies=_policies, **kwargs)

self._serialize = Serializer()
self._deserialize = Deserializer()
self._serialize.client_side_validation = False

def send_request(self, request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse:
"""Runs the network request through the client's chained policies.
>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request
:param request: The network request you want to make. Required.
:type request: ~azure.core.rest.HttpRequest
:keyword bool stream: Whether the response payload will be streamed. Defaults to False.
:return: The response of your network call. Does not do error handling on your response.
:rtype: ~azure.core.rest.HttpResponse
"""

request_copy = deepcopy(request)
path_format_arguments = {
"endpoint": self._serialize.url("self._config.endpoint", self._config.endpoint, "str"),
}

request_copy.url = self._client.format_url(request_copy.url, **path_format_arguments)
return self._client.send_request(request_copy, stream=stream, **kwargs) # type: ignore

def close(self) -> None:
self._client.close()

def __enter__(self) -> Self:
self._client.__enter__()
return self

def __exit__(self, *exc_details: Any) -> None:
self._client.__exit__(*exc_details)
Loading

0 comments on commit da60a42

Please sign in to comment.