Skip to content

Commit

Permalink
add initial pii detection microservice (#153)
Browse files Browse the repository at this point in the history
* add initial framework for pii detection

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add e2e test to tests

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README per comments

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove model specification in README

Signed-off-by: Chendi Xue <[email protected]>

* Remove big_model and update README

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable debug mode in test bash

Signed-off-by: Chendi Xue <[email protected]>

* rename test file

Signed-off-by: Chendi Xue <[email protected]>

* mv pandas import into test

Signed-off-by: Chendi Xue <[email protected]>

* add new requirement for prometheus and except for user didn't provide
hg_token

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mv pandas import to function

Signed-off-by: Chendi Xue <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove ip_addr hardcode

Signed-off-by: Chendi Xue <[email protected]>

---------

Signed-off-by: Chendi Xue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <[email protected]>
  • Loading branch information
3 people authored Jun 25, 2024
1 parent a58ca4a commit e380417
Show file tree
Hide file tree
Showing 22 changed files with 1,723 additions and 0 deletions.
4 changes: 4 additions & 0 deletions comps/guardrails/pii_detection/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
**/*pdf
**/*csv
**/*log
**/*pyc
78 changes: 78 additions & 0 deletions comps/guardrails/pii_detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# PII Detection Microservice

PII Detection a method to detect Personal Identifiable Information in text. This microservice provides users a unified API to either upload your files or send a list of text, and return with a list following original sequence of labels marking if it contains PII or not.

# 🚀1. Start Microservice with Python(Option 1)

## 1.1 Install Requirements

```bash
pip install -r requirements.txt
```

## 1.2 Start PII Detection Microservice with Python Script

Start pii detection microservice with below command.

```bash
python pii_detection.py
```

# 🚀2. Start Microservice with Docker (Option 2)

## 2.1 Prepare PII detection model

export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}

## 2.1.1 use LLM endpoint (will add later)

intro placeholder

## 2.2 Build Docker Image

```bash
cd ../../../ # back to GenAIComps/ folder
docker build -t opea/guardrails-pii-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/pii_detection/docker/Dockerfile .
```

## 2.3 Run Docker with CLI

```bash
docker run -d --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest
```

> debug mode
```bash
docker run --rm --runtime=runc --name="guardrails-pii-detection-endpoint" -p 6357:6357 -v ./comps/guardrails/pii_detection/:/home/user/comps/guardrails/pii_detection/ --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-pii-detection:latest
```

# 🚀3. Get Status of Microservice

```bash
docker container logs -f guardrails-pii-detection-endpoint
```

# 🚀4. Consume Microservice

Once microservice starts, user can use below script to invoke the microservice for pii detection.

```python
import requests
import json

proxies = {"http": ""}
url = "http://localhost:6357/v1/dataprep"
urls = [
"https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4"
]
payload = {"link_list": json.dumps(urls)}

try:
resp = requests.post(url=url, data=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
```
2 changes: 2 additions & 0 deletions comps/guardrails/pii_detection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
45 changes: 45 additions & 0 deletions comps/guardrails/pii_detection/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import os
import pathlib

# Embedding model

EMBED_MODEL = os.getenv("EMBED_MODEL", "BAAI/bge-base-en-v1.5")

# Redis Connection Information
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = int(os.getenv("REDIS_PORT", 6379))


def get_boolean_env_var(var_name, default_value=False):
"""Retrieve the boolean value of an environment variable.
Args:
var_name (str): The name of the environment variable to retrieve.
default_value (bool): The default value to return if the variable
is not found.
Returns:
bool: The value of the environment variable, interpreted as a boolean.
"""
true_values = {"true", "1", "t", "y", "yes"}
false_values = {"false", "0", "f", "n", "no"}

# Retrieve the environment variable's value
value = os.getenv(var_name, "").lower()

# Decide the boolean value based on the content of the string
if value in true_values:
return True
elif value in false_values:
return False
else:
return default_value


LLM_URL = os.getenv("LLM_ENDPOINT_URL", None)

current_file_path = pathlib.Path(__file__).parent.resolve()
comps_path = os.path.join(current_file_path, "../../../")
Loading

0 comments on commit e380417

Please sign in to comment.