Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial pii detection microservice #153

Merged
merged 21 commits into from
Jun 25, 2024

Conversation

xuechendi
Copy link
Collaborator

Description

Enable PII detection microservice

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

  • phonenumbers
  • detect_secrets
  • gibberish-detector

Tests

Describe the tests that you ran to verify your changes.

python test.py --test_text --batch_size 100
python test.py --test_pdf --batch_size 100
python test.py --test_html --batch_size 100

@xuechendi xuechendi force-pushed the pii_detection branch 4 times, most recently from d14a6bc to 6248ca4 Compare June 11, 2024 22:23
@xuechendi
Copy link
Collaborator Author

request response will be
image

system log will show detected pii
image

@xuechendi
Copy link
Collaborator Author

@minmin-intel @ftian1 , please help to review this PR
I had a discussion with @minmin-intel, once this one merged, minmin can continuously add other strategy for pii detection

@chensuyue
Copy link
Collaborator

Please fill in the new deps into the BoM list, and add an e2e test for the new microservice, e.g. https://github.com/opea-project/GenAIComps/blob/main/tests/test_embeddings_langchain.sh

@xuechendi
Copy link
Collaborator Author

Please fill in the new deps into the BoM list, and add an e2e test for the new microservice, e.g. https://github.com/opea-project/GenAIComps/blob/main/tests/test_embeddings_langchain.sh

Thanks, @chensuyue , I've added this new component to BOM and add test_pii_detection.sh in tests

@xuechendi
Copy link
Collaborator Author

@letonghan ,please help to take a review on this PR, thanks

@xuechendi
Copy link
Collaborator Author

@ftian1 @letonghan @lvliang-intel
I have resolved all the comments except the one of update OPEA spec.
Please check and help to merge, thanks

@chensuyue chensuyue added this to the v0.7 milestone Jun 24, 2024
@xuechendi
Copy link
Collaborator Author

@chensuyue , UT is now passed

@xuechendi xuechendi merged commit e380417 into opea-project:main Jun 25, 2024
7 checks passed
sharanshirodkar7 pushed a commit to sharanshirodkar7/GenAIComps that referenced this pull request Jul 9, 2024
* add initial framework for pii detection

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add e2e test to tests

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README per comments

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove model specification in README

Signed-off-by: Chendi Xue <[email protected]>

* Remove big_model and update README

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable debug mode in test bash

Signed-off-by: Chendi Xue <[email protected]>

* rename test file

Signed-off-by: Chendi Xue <[email protected]>

* mv pandas import into test

Signed-off-by: Chendi Xue <[email protected]>

* add new requirement for prometheus and except for user didn't provide
hg_token

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mv pandas import to function

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove ip_addr hardcode

Signed-off-by: Chendi Xue <[email protected]>

---------

Signed-off-by: Chendi Xue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <[email protected]>
Signed-off-by: sharanshirodkar7 <[email protected]>
yogeshmpandey pushed a commit to yogeshmpandey/GenAIComps that referenced this pull request Jul 10, 2024
* add initial framework for pii detection

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add e2e test to tests

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README per comments

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove model specification in README

Signed-off-by: Chendi Xue <[email protected]>

* Remove big_model and update README

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable debug mode in test bash

Signed-off-by: Chendi Xue <[email protected]>

* rename test file

Signed-off-by: Chendi Xue <[email protected]>

* mv pandas import into test

Signed-off-by: Chendi Xue <[email protected]>

* add new requirement for prometheus and except for user didn't provide
hg_token

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mv pandas import to function

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove ip_addr hardcode

Signed-off-by: Chendi Xue <[email protected]>

---------

Signed-off-by: Chendi Xue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <[email protected]>
Signed-off-by: Yogesh Pandey <[email protected]>
dwhitena pushed a commit to predictionguard/GenAIComps that referenced this pull request Jul 24, 2024
* add initial framework for pii detection

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add e2e test to tests

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README per comments

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove model specification in README

Signed-off-by: Chendi Xue <[email protected]>

* Remove big_model and update README

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable debug mode in test bash

Signed-off-by: Chendi Xue <[email protected]>

* rename test file

Signed-off-by: Chendi Xue <[email protected]>

* mv pandas import into test

Signed-off-by: Chendi Xue <[email protected]>

* add new requirement for prometheus and except for user didn't provide
hg_token

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mv pandas import to function

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove ip_addr hardcode

Signed-off-by: Chendi Xue <[email protected]>

---------

Signed-off-by: Chendi Xue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <[email protected]>
Signed-off-by: Daniel Whitenack <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants