-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[text analytics] PII updates for v5.1.0b6 #17038
Conversation
TODO: update tests |
@@ -7,6 +7,12 @@ | |||
- Renamed properties `aspect` and `opinions` to `target` and `assessments` respectively in class `MinedOpinion`. | |||
- Renamed classes `AspectSentiment` and `OpinionSentiment` to `TargetSentiment` and `AssessmentSentiment` respectively. | |||
|
|||
**New Features** | |||
|
|||
- Added parameter `pii_entity_categories` to the `recognize_pii_entities` client method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @abhahn I actually made a gist for this design, here it is, sorry I should've 100% clarified this with you. js also did a pii entity pr that aligns with this (idt they added an enum though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've update things to align with the gist above.
@@ -829,7 +835,7 @@ def begin_analyze_batch_actions( # type: ignore | |||
except ValueError as error: | |||
if "API version v3.0 does not have operation 'begin_analyze'" in str(error): | |||
raise ValueError( | |||
"'begin_analyze_batch_actions' endpoint is only available for API version v3.1-preview.3" | |||
"'begin_analyze_batch_actions' endpoint is only available for API version v3.1-preview.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for pii categories too? You can also add it to samples if you think it's necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple of tests. One thing I'm still unsure about is why using domain=phi
seems to recognize all entities, even the ones that, according to the official docs, are not usually supposed to be returned as PHI. Since the interaction between domain and pii_categories is an intersection, I'd eventually like to write a test to show that there is at least one combination that will cause a list of zero entities to be returned, but I need to know which entities are not supposed to be classified as PHI.
For samples, we may want to consider adding a new one to show this interaction between domain and categories, but seems like it doesn't fit well with the existing sample. Let me know if this is something we should do for this release or if we should wait to get more info from the service team about the issue I mentioned above about PHI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, thanks so much @abhahn !
@@ -10,6 +10,9 @@ | |||
|
|||
**New Features** | |||
|
|||
- Added parameter `categories_filter` to the `recognize_pii_entities` client method. | |||
- Added `categries_filter` property to class `RecognizePiiEntitiesAction`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v tiny nit: categories_filter
@@ -1410,13 +1410,21 @@ class RecognizePiiEntitiesAction(DictMixin): | |||
:keyword str model_version: The model version to use for the analysis. | |||
:keyword str domain_filter: An optional string to set the PII domain to include only a | |||
subset of the PII entity categories. Possible values include 'phi' or None. | |||
:keyword categories_filter: A list of specific PII entity categories to return. If the value of `domain_filter` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if it's worth to mention domain_filter
in this docstring. Following UX studies I've been trying to put more examples in the docstrings, so the docstring could be something like
:keyword categories_filter: Instead of filtering over all PII entity categories, you can pass in a list of the specific PII entity categories you want to filter out. For example, if you only want to filter out U.S. social security numbers in a document, you can pass in `[PiiEntityCategoryType. US_SOCIAL_SECURITY_NUMBER]` for this kwarg.
|
||
self.assertEqual(len(result[0].entities), 1) | ||
entity = result[0].entities[0] | ||
self.assertEqual(entity.category, PiiEntityCategoryType.US_SOCIAL_SECURITY_NUMBER.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test look great, and I think you've already mentioned this offline. But just for posterity, can you also add tests for pii-categories
to the analyze endpoint? thanks!
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py
Outdated
Show resolved
Hide resolved
...tanalytics/azure-ai-textanalytics/azure/ai/textanalytics/aio/_text_analytics_client_async.py
Outdated
Show resolved
Hide resolved
sdk/textanalytics/azure-ai-textanalytics/tests/test_recognize_pii_entities.py
Show resolved
Hide resolved
…into ta-v5.1.0b6-analyze * 'master' of https://github.com/Azure/azure-sdk-for-python: (24 commits) [text analytics] PII updates for v5.1.0b6 (Azure#17038) Fix bug where imported matrix parameter duplicates are not overrided (Azure#17126) Add NULL to readme (Azure#17076) Sas batching (Azure#17133) dropping py3.5 (Azure#17127) [EventHubs] 5.3.1 update changelog (Azure#17132) [text analytics] assertions (Azure#17098) add recordings (Azure#17125) [core] add HttpRequest and HttpResponse reprs (Azure#16972) [text analytics] add actual normalized_text tests (Azure#17123) update samples to use actual role names (Azure#17124) Sync eng/common directory with azure-sdk-tools for PR 1448 (Azure#17085) Enable APIView status check (Azure#17107) Fix PackageName typo (Azure#17109) Move SetTestPipelineVersion.ps1 to eng/common (Azure#17103) Fix paths for non-windows agents (Azure#17096) [Communication] - Identity - Update README (Azure#17091) Rename CertificateCredential's certificate_bytes -> certificate_data (Azure#17090) fix shared reqs (Azure#17095) [translation] initial library (Azure#16837) ...
fixes #16174