-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[text analytics] Add redacted_text #13449
Merged
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
596be4d
add redacted_text property
iscai-msft 6be20d7
add tests
iscai-msft f77b69f
fix copy paste mistakes in tests
iscai-msft 318b112
Merge branch 'master' of https://github.com/Azure/azure-sdk-for-pytho…
iscai-msft 224caa9
tiny docstring fixes
iscai-msft afd535c
revert incorrect capitalization
iscai-msft File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,4 +1,4 @@ | ||||||
# coding=utf-8 | ||||||
# coding=utf-8 pylint: disable=too-many-lines | ||||||
# ------------------------------------ | ||||||
# Copyright (c) Microsoft Corporation. | ||||||
# Licensed under the MIT License. | ||||||
|
@@ -141,6 +141,8 @@ class RecognizePiiEntitiesResult(DictMixin): | |||||
:ivar entities: Recognized PII entities in the document. | ||||||
:vartype entities: | ||||||
list[~azure.ai.textanalytics.PiiEntity] | ||||||
:ivar str redacted_text: Returns the text of the input document with all of the PII information | ||||||
redacted out. Only returned for api versions v3.1-preview.2 and up. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
:ivar warnings: Warnings encountered while processing document. Results will still be returned | ||||||
if there are warnings, but they may not be fully accurate. | ||||||
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning] | ||||||
|
@@ -150,18 +152,28 @@ class RecognizePiiEntitiesResult(DictMixin): | |||||
~azure.ai.textanalytics.TextDocumentStatistics | ||||||
:ivar bool is_error: Boolean check for error item when iterating over list of | ||||||
results. Always False for an instance of a RecognizePiiEntitiesResult. | ||||||
.. versionadded:: v3.1-preview.2 | ||||||
The *redacted_text* parameter. | ||||||
""" | ||||||
|
||||||
def __init__(self, **kwargs): | ||||||
self.id = kwargs.get("id", None) | ||||||
self.entities = kwargs.get("entities", None) | ||||||
self.redacted_text = kwargs.get("redacted_text", None) | ||||||
self.warnings = kwargs.get("warnings", []) | ||||||
self.statistics = kwargs.get("statistics", None) | ||||||
self.is_error = False | ||||||
|
||||||
def __repr__(self): | ||||||
return "RecognizePiiEntitiesResult(id={}, entities={}, warnings={}, statistics={}, is_error={})" \ | ||||||
.format(self.id, repr(self.entities), repr(self.warnings), repr(self.statistics), self.is_error)[:1024] | ||||||
return "RecognizePiiEntitiesResult(id={}, entities={}, redacted_text={}, warnings={}, " \ | ||||||
"statistics={}, is_error={})" .format( | ||||||
self.id, | ||||||
repr(self.entities), | ||||||
self.redacted_text, | ||||||
repr(self.warnings), | ||||||
repr(self.statistics), | ||||||
self.is_error | ||||||
)[:1024] | ||||||
|
||||||
|
||||||
class DetectLanguageResult(DictMixin): | ||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
...ure-ai-textanalytics/tests/recordings/test_recognize_pii_entities.test_redacted_text.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
interactions: | ||
- request: | ||
body: '{"documents": [{"id": "0", "text": "My SSN is 859-98-0987.", "language": | ||
"en"}]}' | ||
headers: | ||
Accept: | ||
- application/json, text/json | ||
Accept-Encoding: | ||
- gzip, deflate | ||
Connection: | ||
- keep-alive | ||
Content-Length: | ||
- '80' | ||
Content-Type: | ||
- application/json | ||
User-Agent: | ||
- azsdk-python-ai-textanalytics/5.0.1 Python/3.8.5 (macOS-10.13.6-x86_64-i386-64bit) | ||
method: POST | ||
uri: https://cognitiveusw2dev.azure-api.net/text/analytics/v3.1-preview.2/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
response: | ||
body: | ||
string: '{"documents":[{"redactedText":"My SSN is ***********.","id":"0","entities":[{"text":"859-98-0987","category":"U.S. | ||
Social Security Number (SSN)","offset":10,"length":11,"confidenceScore":0.65}],"warnings":[]}],"errors":[],"modelVersion":"2020-07-01"}' | ||
headers: | ||
apim-request-id: | ||
- c5ba8c84-0e46-471a-b4c8-f02c411c20ec | ||
content-type: | ||
- application/json; charset=utf-8 | ||
csp-billing-usage: | ||
- CognitiveServices.TextAnalytics.BatchScoring=1 | ||
date: | ||
- Mon, 31 Aug 2020 20:15:43 GMT | ||
strict-transport-security: | ||
- max-age=31536000; includeSubDomains; preload | ||
transfer-encoding: | ||
- chunked | ||
x-content-type-options: | ||
- nosniff | ||
x-envoy-upstream-service-time: | ||
- '78' | ||
status: | ||
code: 200 | ||
message: OK | ||
version: 1 |
44 changes: 44 additions & 0 deletions
44
...ytics/tests/recordings/test_recognize_pii_entities.test_redacted_text_v3_1_preview_1.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
interactions: | ||
- request: | ||
body: '{"documents": [{"id": "0", "text": "My SSN is 859-98-0987.", "language": | ||
"en"}]}' | ||
headers: | ||
Accept: | ||
- application/json, text/json | ||
Accept-Encoding: | ||
- gzip, deflate | ||
Connection: | ||
- keep-alive | ||
Content-Length: | ||
- '80' | ||
Content-Type: | ||
- application/json | ||
User-Agent: | ||
- azsdk-python-ai-textanalytics/5.0.1 Python/3.8.5 (macOS-10.13.6-x86_64-i386-64bit) | ||
method: POST | ||
uri: https://westus2.api.cognitive.microsoft.com/text/analytics/v3.1-preview.1/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
response: | ||
body: | ||
string: '{"documents":[{"id":"0","entities":[{"text":"859-98-0987","category":"U.S. | ||
Social Security Number (SSN)","offset":10,"length":11,"confidenceScore":0.65}],"warnings":[]}],"errors":[],"modelVersion":"2020-07-01"}' | ||
headers: | ||
apim-request-id: | ||
- 4ae026d1-15d1-4d77-8913-46922e72d7cb | ||
content-type: | ||
- application/json; charset=utf-8 | ||
csp-billing-usage: | ||
- CognitiveServices.TextAnalytics.BatchScoring=1 | ||
date: | ||
- Mon, 31 Aug 2020 19:58:17 GMT | ||
strict-transport-security: | ||
- max-age=31536000; includeSubDomains; preload | ||
transfer-encoding: | ||
- chunked | ||
x-content-type-options: | ||
- nosniff | ||
x-envoy-upstream-service-time: | ||
- '68' | ||
status: | ||
code: 200 | ||
message: OK | ||
version: 1 |
33 changes: 33 additions & 0 deletions
33
...-textanalytics/tests/recordings/test_recognize_pii_entities_async.test_redacted_text.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
interactions: | ||
- request: | ||
body: '{"documents": [{"id": "0", "text": "My SSN is 859-98-0987.", "language": | ||
"en"}]}' | ||
headers: | ||
Accept: | ||
- application/json, text/json | ||
Content-Length: | ||
- '80' | ||
Content-Type: | ||
- application/json | ||
User-Agent: | ||
- azsdk-python-ai-textanalytics/5.0.1 Python/3.8.5 (macOS-10.13.6-x86_64-i386-64bit) | ||
method: POST | ||
uri: https://cognitiveusw2dev.azure-api.net/text/analytics/v3.1-preview.2/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
response: | ||
body: | ||
string: '{"documents":[{"redactedText":"My SSN is ***********.","id":"0","entities":[{"text":"859-98-0987","category":"U.S. | ||
Social Security Number (SSN)","offset":10,"length":11,"confidenceScore":0.65}],"warnings":[]}],"errors":[],"modelVersion":"2020-07-01"}' | ||
headers: | ||
apim-request-id: dc638432-dc71-4f52-aadb-829c2dfd1935 | ||
content-type: application/json; charset=utf-8 | ||
csp-billing-usage: CognitiveServices.TextAnalytics.BatchScoring=1 | ||
date: Mon, 31 Aug 2020 20:15:43 GMT | ||
strict-transport-security: max-age=31536000; includeSubDomains; preload | ||
transfer-encoding: chunked | ||
x-content-type-options: nosniff | ||
x-envoy-upstream-service-time: '80' | ||
status: | ||
code: 200 | ||
message: OK | ||
url: https://cognitiveusw2dev.azure-api.net//text/analytics/v3.1-preview.2/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
version: 1 |
33 changes: 33 additions & 0 deletions
33
...tests/recordings/test_recognize_pii_entities_async.test_redacted_text_v3_1_preview_1.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
interactions: | ||
- request: | ||
body: '{"documents": [{"id": "0", "text": "My SSN is 859-98-0987.", "language": | ||
"en"}]}' | ||
headers: | ||
Accept: | ||
- application/json, text/json | ||
Content-Length: | ||
- '80' | ||
Content-Type: | ||
- application/json | ||
User-Agent: | ||
- azsdk-python-ai-textanalytics/5.0.1 Python/3.8.5 (macOS-10.13.6-x86_64-i386-64bit) | ||
method: POST | ||
uri: https://westus2.api.cognitive.microsoft.com/text/analytics/v3.1-preview.1/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
response: | ||
body: | ||
string: '{"documents":[{"id":"0","entities":[{"text":"859-98-0987","category":"U.S. | ||
Social Security Number (SSN)","offset":10,"length":11,"confidenceScore":0.65}],"warnings":[]}],"errors":[],"modelVersion":"2020-07-01"}' | ||
headers: | ||
apim-request-id: eeda4dd4-74dd-4e54-88cb-5a0352f065cf | ||
content-type: application/json; charset=utf-8 | ||
csp-billing-usage: CognitiveServices.TextAnalytics.BatchScoring=1 | ||
date: Mon, 31 Aug 2020 19:58:17 GMT | ||
strict-transport-security: max-age=31536000; includeSubDomains; preload | ||
transfer-encoding: chunked | ||
x-content-type-options: nosniff | ||
x-envoy-upstream-service-time: '106' | ||
status: | ||
code: 200 | ||
message: OK | ||
url: https://westus2.api.cognitive.microsoft.com//text/analytics/v3.1-preview.1/entities/recognition/pii?showStats=false&stringIndexType=UnicodeCodePoint | ||
version: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for PII, right? should we specify the top-level result object type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, I didn't think it was necessary, since this entry is actually showing up indented under the previous entry about the introduction of
recognize_pii_entities
. I'll add in the top level result object though, since there's one other docstring comment and I might as well fix both