opea-project · ashahba · Sep 27, 2024 · Sep 26, 2024 · Sep 26, 2024 · Sep 27, 2024
@@ -4,11 +4,9 @@
 
 Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistilledBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance making it readily deployable on both Intel Gaudi and Xeon.
 
-Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
-
-## Future Development
+This microservice uses [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta) that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
 
-- Add a RoBERTa (125M params) toxicity model fine-tuned on Gaudi2 with ToxicChat and Jigsaw dataset in an optimized serving framework.
+Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
 
 ## 🚀1. Start Microservice with Python（Option 1）
 
@@ -65,7 +63,7 @@ curl localhost:9091/v1/toxicity
 Example Output:
 
 ```bash
-"\nI'm sorry, but your query or LLM's response is TOXIC with an score of 0.97 (0-1)!!!\n"
+"Violated policies: toxicity, please check your input."
 ```
 
 **Python Script:**

@@ -19,13 +19,13 @@ def llm_generate(input: TextDoc):
     input_text = input.text
     toxic = toxicity_pipeline(input_text)
     print("done")
-    if toxic[0]["label"] == "toxic":
+    if toxic[0]["label"].lower() == "toxic":
         return TextDoc(text="Violated policies: toxicity, please check your input.", downstream_black_list=[".*"])
     else:
         return TextDoc(text=input_text)
 
 
 if __name__ == "__main__":
-    model = "citizenlab/distilbert-base-multilingual-cased-toxicity"
+    model = "Intel/toxic-prompt-roberta"
     toxicity_pipeline = pipeline("text-classification", model=model, tokenizer=model)
     opea_microservices["opea_service@toxicity_detection"].start()