-
Notifications
You must be signed in to change notification settings - Fork 144
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add initial pii detection microservice (#153)
* add initial framework for pii detection Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add e2e test to tests Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update README per comments Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove model specification in README Signed-off-by: Chendi Xue <[email protected]> * Remove big_model and update README Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable debug mode in test bash Signed-off-by: Chendi Xue <[email protected]> * rename test file Signed-off-by: Chendi Xue <[email protected]> * mv pandas import into test Signed-off-by: Chendi Xue <[email protected]> * add new requirement for prometheus and except for user didn't provide hg_token Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * mv pandas import to function Signed-off-by: Chendi Xue <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove ip_addr hardcode Signed-off-by: Chendi Xue <[email protected]> --------- Signed-off-by: Chendi Xue <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: chen, suyue <[email protected]>
- Loading branch information
1 parent
a58ca4a
commit e380417
Showing
22 changed files
with
1,723 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
**/*csv | ||
**/*log | ||
**/*pyc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# PII Detection Microservice | ||
|
||
PII Detection a method to detect Personal Identifiable Information in text. This microservice provides users a unified API to either upload your files or send a list of text, and return with a list following original sequence of labels marking if it contains PII or not. | ||
|
||
# 🚀1. Start Microservice with Python(Option 1) | ||
|
||
## 1.1 Install Requirements | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## 1.2 Start PII Detection Microservice with Python Script | ||
|
||
Start pii detection microservice with below command. | ||
|
||
```bash | ||
python pii_detection.py | ||
``` | ||
|
||
# 🚀2. Start Microservice with Docker (Option 2) | ||
|
||
## 2.1 Prepare PII detection model | ||
|
||
export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN} | ||
|
||
## 2.1.1 use LLM endpoint (will add later) | ||
|
||
intro placeholder | ||
|
||
## 2.2 Build Docker Image | ||
|
||
```bash | ||
cd ../../../ # back to GenAIComps/ folder | ||
docker build -t opea/guardrails-pii-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/pii_detection/docker/Dockerfile . | ||
``` | ||
|
||
## 2.3 Run Docker with CLI | ||
|
||
```bash | ||
docker run -d --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest | ||
``` | ||
|
||
> debug mode | ||
```bash | ||
docker run --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 -v ./comps/guardrails/pii_detection/:/home/user/comps/guardrails/pii_detection/ --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest | ||
``` | ||
|
||
# 🚀3. Get Status of Microservice | ||
|
||
```bash | ||
docker container logs -f guardrails-pii-detection-endpoint | ||
``` | ||
|
||
# 🚀4. Consume Microservice | ||
|
||
Once microservice starts, user can use below script to invoke the microservice for pii detection. | ||
|
||
```python | ||
import requests | ||
import json | ||
|
||
proxies = {"http": ""} | ||
url = "http://localhost:6357/v1/dataprep" | ||
urls = [ | ||
"https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4" | ||
] | ||
payload = {"link_list": json.dumps(urls)} | ||
|
||
try: | ||
resp = requests.post(url=url, data=payload, proxies=proxies) | ||
print(resp.text) | ||
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes | ||
print("Request successful!") | ||
except requests.exceptions.RequestException as e: | ||
print("An error occurred:", e) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import os | ||
import pathlib | ||
|
||
# Embedding model | ||
|
||
EMBED_MODEL = os.getenv("EMBED_MODEL", "BAAI/bge-base-en-v1.5") | ||
|
||
# Redis Connection Information | ||
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") | ||
REDIS_PORT = int(os.getenv("REDIS_PORT", 6379)) | ||
|
||
|
||
def get_boolean_env_var(var_name, default_value=False): | ||
"""Retrieve the boolean value of an environment variable. | ||
Args: | ||
var_name (str): The name of the environment variable to retrieve. | ||
default_value (bool): The default value to return if the variable | ||
is not found. | ||
Returns: | ||
bool: The value of the environment variable, interpreted as a boolean. | ||
""" | ||
true_values = {"true", "1", "t", "y", "yes"} | ||
false_values = {"false", "0", "f", "n", "no"} | ||
|
||
# Retrieve the environment variable's value | ||
value = os.getenv(var_name, "").lower() | ||
|
||
# Decide the boolean value based on the content of the string | ||
if value in true_values: | ||
return True | ||
elif value in false_values: | ||
return False | ||
else: | ||
return default_value | ||
|
||
|
||
LLM_URL = os.getenv("LLM_ENDPOINT_URL", None) | ||
|
||
current_file_path = pathlib.Path(__file__).parent.resolve() | ||
comps_path = os.path.join(current_file_path, "../../../") |
Oops, something went wrong.