-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial Creation of azure-health-deidentification Dataplane SDK (#36041)
* Initial commit of Health.Deidentification dataplane * Use MI instead of SAS * Regenerates with Plaintext * Adds rest of tests * First attempt patch * Patch Attempt #2 * Patch Attempt #3 * Creates base recordings * Fixes sanitizers; Test replay functioning * Creates all sync samples * Creates all async samples * Adds description in readme * Adds tsplocation * Checkpoint * Executes test recording migration * Adds pipeline yamls * Updates ci.yml triggers * Removes ArtifactName from ci.yaml * Fixes analysis failures * Fixes analysis failures 2 * Update sdk/healthdataaiservices/ci.yml Co-authored-by: Scott Beddall <[email protected]> * Updates test.yml * Uniquifier default to false for pipelines * Updates from feedback * Updates from feedback 2 --------- Co-authored-by: Graham Thomas <[email protected]> Co-authored-by: Scott Beddall <[email protected]>
- Loading branch information
1 parent
cf6238a
commit da60a42
Showing
64 changed files
with
8,935 additions
and
0 deletions.
There are no files selected for viewing
Validating CODEOWNERS rules …
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9 changes: 9 additions & 0 deletions
9
sdk/healthdataaiservices/azure-health-deidentification/CHANGELOG.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Release History | ||
|
||
## 1.0.0b1 (1970-01-01) | ||
|
||
- Initial version | ||
|
||
### Features Added | ||
|
||
- Initial Code |
21 changes: 21 additions & 0 deletions
21
sdk/healthdataaiservices/azure-health-deidentification/LICENSE
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
Copyright (c) Microsoft Corporation. | ||
|
||
MIT License | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
7 changes: 7 additions & 0 deletions
7
sdk/healthdataaiservices/azure-health-deidentification/MANIFEST.in
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
include *.md | ||
include LICENSE | ||
include azure/health/deidentification/py.typed | ||
recursive-include tests *.py | ||
recursive-include samples *.py *.md | ||
include azure/__init__.py | ||
include azure/health/__init__.py |
108 changes: 108 additions & 0 deletions
108
sdk/healthdataaiservices/azure-health-deidentification/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
|
||
|
||
# Azure Health Deidentification client library for Python | ||
Azure.Health.Deidentification is a managed service that enables users to tag, redact, or surrogate health data. | ||
|
||
## Getting started | ||
|
||
### Install the package | ||
|
||
```bash | ||
python -m pip install azure-health-deidentification | ||
``` | ||
|
||
#### Prequisites | ||
|
||
- Python 3.8 or later is required to use this package. | ||
- You need an [Azure subscription][azure_sub] to use this package. | ||
- An existing Azure Health Deidentification instance. | ||
#### Create with an Azure Active Directory Credential | ||
To use an [Azure Active Directory (AAD) token credential][authenticate_with_token], | ||
provide an instance of the desired credential type obtained from the | ||
[azure-identity][azure_identity_credentials] library. | ||
|
||
To authenticate with AAD, you must first [pip][pip] install [`azure-identity`][azure_identity_pip] | ||
|
||
After setup, you can choose which type of [credential][azure_identity_credentials] from azure.identity to use. | ||
As an example, [DefaultAzureCredential][default_azure_credential] can be used to authenticate the client: | ||
|
||
Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables: | ||
`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET` | ||
|
||
Use the returned token credential to authenticate the client: | ||
|
||
```python | ||
>>> from azure.health.deidentification import DeidentificationClient | ||
>>> from azure.identity import DefaultAzureCredential | ||
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential()) | ||
``` | ||
|
||
## Key concepts | ||
|
||
**Operation Modes** | ||
- Tag: Will return a structure of offset and length with the PHI category of the related text spans. | ||
- Redact: Will return output text with placeholder stubbed text. ex. `[name]` | ||
- Surrogate: Will return output text with synthetic replacements. | ||
- `My name is John Smith` | ||
- `My name is Tom Jones` | ||
|
||
**Job Integration with Azure Storage** | ||
Instead of sending text, you can send an Azure Storage Location to the service. We will asynchronously | ||
process the list of files and output the deidentified files to a location of your choice. | ||
|
||
Limitations: | ||
- Maximum file count per job: 1000 documents | ||
- Maximum file size per file: 2 MB | ||
|
||
## Examples | ||
|
||
```python | ||
>>> from azure.health.deidentification import DeidentificationClient | ||
>>> from azure.identity import DefaultAzureCredential | ||
>>> from azure.core.exceptions import HttpResponseError | ||
|
||
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential()) | ||
>>> try: | ||
<!-- write test code here --> | ||
except HttpResponseError as e: | ||
print('service responds error: {}'.format(e.response.json())) | ||
|
||
``` | ||
|
||
## Next steps | ||
|
||
- Find a bug, or have feedback? Raise an issue with "Health Deidentification" Label. | ||
|
||
|
||
## Troubleshooting | ||
|
||
- **Unabled to Access Source or Target Storage** | ||
- Ensure you create your deid service with a system assigned managed identity | ||
- Ensure your storage account has given permissions to that managed identity | ||
|
||
## Contributing | ||
|
||
This project welcomes contributions and suggestions. Most contributions require | ||
you to agree to a Contributor License Agreement (CLA) declaring that you have | ||
the right to, and actually do, grant us the rights to use your contribution. | ||
For details, visit https://cla.microsoft.com. | ||
|
||
When you submit a pull request, a CLA-bot will automatically determine whether | ||
you need to provide a CLA and decorate the PR appropriately (e.g., label, | ||
comment). Simply follow the instructions provided by the bot. You will only | ||
need to do this once across all repos using our CLA. | ||
|
||
This project has adopted the | ||
[Microsoft Open Source Code of Conduct][code_of_conduct]. For more information, | ||
see the Code of Conduct FAQ or contact [email protected] with any | ||
additional questions or comments. | ||
|
||
<!-- LINKS --> | ||
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/ | ||
[authenticate_with_token]: https://docs.microsoft.com/azure/cognitive-services/authentication?tabs=powershell#authenticate-with-an-authentication-token | ||
[azure_identity_credentials]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#credentials | ||
[azure_identity_pip]: https://pypi.org/project/azure-identity/ | ||
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#defaultazurecredential | ||
[pip]: https://pypi.org/project/pip/ | ||
[azure_sub]: https://azure.microsoft.com/free/ | ||
|
6 changes: 6 additions & 0 deletions
6
sdk/healthdataaiservices/azure-health-deidentification/assets.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"AssetsRepo": "Azure/azure-sdk-assets", | ||
"AssetsRepoPrefixPath": "python", | ||
"TagPrefix": "python/healthdataaiservices/azure-health-deidentification", | ||
"Tag": "python/healthdataaiservices/azure-health-deidentification_a8eed6d322" | ||
} |
1 change: 1 addition & 0 deletions
1
sdk/healthdataaiservices/azure-health-deidentification/azure/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore |
1 change: 1 addition & 0 deletions
1
sdk/healthdataaiservices/azure-health-deidentification/azure/health/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore |
26 changes: 26 additions & 0 deletions
26
...lthdataaiservices/azure-health-deidentification/azure/health/deidentification/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# coding=utf-8 | ||
# -------------------------------------------------------------------------- | ||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. See License.txt in the project root for license information. | ||
# Code generated by Microsoft (R) Python Code Generator. | ||
# Changes may cause incorrect behavior and will be lost if the code is regenerated. | ||
# -------------------------------------------------------------------------- | ||
|
||
from ._client import DeidentificationClient | ||
from ._version import VERSION | ||
|
||
__version__ = VERSION | ||
|
||
try: | ||
from ._patch import __all__ as _patch_all | ||
from ._patch import * # pylint: disable=unused-wildcard-import | ||
except ImportError: | ||
_patch_all = [] | ||
from ._patch import patch_sdk as _patch_sdk | ||
|
||
__all__ = [ | ||
"DeidentificationClient", | ||
] | ||
__all__.extend([p for p in _patch_all if p not in __all__]) | ||
|
||
_patch_sdk() |
103 changes: 103 additions & 0 deletions
103
...althdataaiservices/azure-health-deidentification/azure/health/deidentification/_client.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# coding=utf-8 | ||
# -------------------------------------------------------------------------- | ||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. See License.txt in the project root for license information. | ||
# Code generated by Microsoft (R) Python Code Generator. | ||
# Changes may cause incorrect behavior and will be lost if the code is regenerated. | ||
# -------------------------------------------------------------------------- | ||
|
||
from copy import deepcopy | ||
from typing import Any, TYPE_CHECKING | ||
from typing_extensions import Self | ||
|
||
from azure.core import PipelineClient | ||
from azure.core.pipeline import policies | ||
from azure.core.rest import HttpRequest, HttpResponse | ||
|
||
from ._configuration import DeidentificationClientConfiguration | ||
from ._operations import DeidentificationClientOperationsMixin | ||
from ._serialization import Deserializer, Serializer | ||
|
||
if TYPE_CHECKING: | ||
# pylint: disable=unused-import,ungrouped-imports | ||
from azure.core.credentials import TokenCredential | ||
|
||
|
||
class DeidentificationClient( | ||
DeidentificationClientOperationsMixin | ||
): # pylint: disable=client-accepts-api-version-keyword | ||
"""DeidentificationClient. | ||
:param endpoint: Url of your De-identification Service. Required. | ||
:type endpoint: str | ||
:param credential: Credential used to authenticate requests to the service. Required. | ||
:type credential: ~azure.core.credentials.TokenCredential | ||
:keyword api_version: The API version to use for this operation. Default value is | ||
"2024-07-12-preview". Note that overriding this default value may result in unsupported | ||
behavior. | ||
:paramtype api_version: str | ||
:keyword int polling_interval: Default waiting time between two polls for LRO operations if no | ||
Retry-After header is present. | ||
""" | ||
|
||
def __init__(self, endpoint: str, credential: "TokenCredential", **kwargs: Any) -> None: | ||
_endpoint = "https://{endpoint}" | ||
self._config = DeidentificationClientConfiguration(endpoint=endpoint, credential=credential, **kwargs) | ||
_policies = kwargs.pop("policies", None) | ||
if _policies is None: | ||
_policies = [ | ||
policies.RequestIdPolicy(**kwargs), | ||
self._config.headers_policy, | ||
self._config.user_agent_policy, | ||
self._config.proxy_policy, | ||
policies.ContentDecodePolicy(**kwargs), | ||
self._config.redirect_policy, | ||
self._config.retry_policy, | ||
self._config.authentication_policy, | ||
self._config.custom_hook_policy, | ||
self._config.logging_policy, | ||
policies.DistributedTracingPolicy(**kwargs), | ||
policies.SensitiveHeaderCleanupPolicy(**kwargs) if self._config.redirect_policy else None, | ||
self._config.http_logging_policy, | ||
] | ||
self._client: PipelineClient = PipelineClient(base_url=_endpoint, policies=_policies, **kwargs) | ||
|
||
self._serialize = Serializer() | ||
self._deserialize = Deserializer() | ||
self._serialize.client_side_validation = False | ||
|
||
def send_request(self, request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse: | ||
"""Runs the network request through the client's chained policies. | ||
>>> from azure.core.rest import HttpRequest | ||
>>> request = HttpRequest("GET", "https://www.example.org/") | ||
<HttpRequest [GET], url: 'https://www.example.org/'> | ||
>>> response = client.send_request(request) | ||
<HttpResponse: 200 OK> | ||
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request | ||
:param request: The network request you want to make. Required. | ||
:type request: ~azure.core.rest.HttpRequest | ||
:keyword bool stream: Whether the response payload will be streamed. Defaults to False. | ||
:return: The response of your network call. Does not do error handling on your response. | ||
:rtype: ~azure.core.rest.HttpResponse | ||
""" | ||
|
||
request_copy = deepcopy(request) | ||
path_format_arguments = { | ||
"endpoint": self._serialize.url("self._config.endpoint", self._config.endpoint, "str"), | ||
} | ||
|
||
request_copy.url = self._client.format_url(request_copy.url, **path_format_arguments) | ||
return self._client.send_request(request_copy, stream=stream, **kwargs) # type: ignore | ||
|
||
def close(self) -> None: | ||
self._client.close() | ||
|
||
def __enter__(self) -> Self: | ||
self._client.__enter__() | ||
return self | ||
|
||
def __exit__(self, *exc_details: Any) -> None: | ||
self._client.__exit__(*exc_details) |
Oops, something went wrong.