Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial pii detection microservice #153

Merged
merged 21 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a6b2efe
add initial framework for pii detection
xuechendi Jun 11, 2024
20e2e57
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 11, 2024
0f5b920
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 11, 2024
e5c6c1c
add e2e test to tests
xuechendi Jun 12, 2024
07aa1d6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2024
03ea7f0
update README per comments
xuechendi Jun 14, 2024
b9e5030
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 14, 2024
a2b63f0
remove model specification in README
xuechendi Jun 20, 2024
1d7147c
Merge branch 'main' into pii_detection
xuechendi Jun 20, 2024
ad93a5f
Merge branch 'main' into pii_detection
xuechendi Jun 21, 2024
7a44a97
Remove big_model and update README
xuechendi Jun 21, 2024
1d7c2f8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 21, 2024
1894d81
Merge branch 'main' into pii_detection
chensuyue Jun 24, 2024
9f14ab0
enable debug mode in test bash
xuechendi Jun 24, 2024
5235e39
rename test file
xuechendi Jun 24, 2024
ffa09f9
mv pandas import into test
xuechendi Jun 25, 2024
d37195d
add new requirement for prometheus and except for user didn't provide
xuechendi Jun 25, 2024
c52b67d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 25, 2024
cf034c1
mv pandas import to function
xuechendi Jun 25, 2024
42a80e0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 25, 2024
57939ec
remove ip_addr hardcode
xuechendi Jun 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions comps/guardrails/pii_detection/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
**/*pdf
**/*csv
**/*log
**/*pyc
78 changes: 78 additions & 0 deletions comps/guardrails/pii_detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# PII Detection Microservice
xuechendi marked this conversation as resolved.
Show resolved Hide resolved
xuechendi marked this conversation as resolved.
Show resolved Hide resolved

PII Detection a method to detect Personal Identifiable Information in text. This microservice provides users a unified API to either upload your files or send a list of text, and return with a list following original sequence of labels marking if it contains PII or not.

# 🚀1. Start Microservice with Python(Option 1)

## 1.1 Install Requirements

```bash
pip install -r requirements.txt
```

## 1.2 Start PII Detection Microservice with Python Script

Start pii detection microservice with below command.

```bash
python pii_detection.py
```

# 🚀2. Start Microservice with Docker (Option 2)

## 2.1 Prepare PII detection model

export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}

## 2.1.1 use LLM endpoint (will add later)

intro placeholder

xuechendi marked this conversation as resolved.
Show resolved Hide resolved
## 2.2 Build Docker Image

```bash
cd ../../../ # back to GenAIComps/ folder
docker build -t opea/guardrails-pii-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/pii_detection/docker/Dockerfile .
```

## 2.3 Run Docker with CLI

```bash
docker run -d --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest
```

> debug mode

```bash
docker run --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 -v ./comps/guardrails/pii_detection/:/home/user/comps/guardrails/pii_detection/ --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest
```

# 🚀3. Get Status of Microservice

```bash
docker container logs -f guardrails-pii-detection-endpoint
```

# 🚀4. Consume Microservice

Once microservice starts, user can use below script to invoke the microservice for pii detection.

```python
import requests
import json

proxies = {"http": ""}
url = "http://localhost:6357/v1/dataprep"
urls = [
"https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4"
]
payload = {"link_list": json.dumps(urls)}

try:
resp = requests.post(url=url, data=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
```
2 changes: 2 additions & 0 deletions comps/guardrails/pii_detection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
45 changes: 45 additions & 0 deletions comps/guardrails/pii_detection/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import os
import pathlib

# Embedding model

EMBED_MODEL = os.getenv("EMBED_MODEL", "BAAI/bge-base-en-v1.5")

# Redis Connection Information
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = int(os.getenv("REDIS_PORT", 6379))


def get_boolean_env_var(var_name, default_value=False):
"""Retrieve the boolean value of an environment variable.

Args:
var_name (str): The name of the environment variable to retrieve.
default_value (bool): The default value to return if the variable
is not found.

Returns:
bool: The value of the environment variable, interpreted as a boolean.
"""
true_values = {"true", "1", "t", "y", "yes"}
false_values = {"false", "0", "f", "n", "no"}

# Retrieve the environment variable's value
value = os.getenv(var_name, "").lower()

# Decide the boolean value based on the content of the string
if value in true_values:
return True
elif value in false_values:
return False
else:
return default_value


LLM_URL = os.getenv("LLM_ENDPOINT_URL", None)

current_file_path = pathlib.Path(__file__).parent.resolve()
comps_path = os.path.join(current_file_path, "../../../")
Loading
Loading