Add guardrails gateway proposal

Signed-off-by: Xie Zhihao <[email protected]>
opea-project · Jun 21, 2024 · f93fd86 · f93fd86
1 parent f1fec03
commit f93fd86
Showing 1 changed file with 176 additions and 0 deletions.
diff --git a/community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md b/community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md
@@ -0,0 +1,176 @@
+## RFC Title
+
+Guardrails Gateway
+
+## RFC Content
+
+### Author
+
+[zhxie](https://github.com/zhxie), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel)
+
+### Status
+
+Under Review
+
+### Objective
+
+Deploy opt-in guardrails in gateway on deployment environment.
+
+### Motivation
+
+- Reduce latency in network transmission and protocol encoding/decoding.
+- Support stateful guardrails.
+- Enhance Observability.
+- Leverage OpenVINO for AI acceleration instructions including AVX, AVX512 and AMX.
+
+### Design Proposal
+
+#### Inference In Place
+
+The LangChain-like workflow is presented below.
+
+```mermaid
+graph LR
+  Entry(Entry)-->Gateway
+  Gateway-->Embedding
+  Embedding-->Gateway
+  Gateway-->Retrieve
+  Retrieve-->Gateway
+  Gateway-->Rerank
+  Rerank-->Gateway
+  Gateway-->LLM
+  LLM-->Guardrails
+  Guardrails-->LLM
+  LLM-->Gateway
+```
+
+All services use RESTful API calling to communicate. There is overhead in network transmission and protocol encoding/decoding. Early studies have shown that each hop adds a 3ms of latency, which can be even longer when mTLS is turned on for security reason in inter-nodes deployment.
+
+The opt-in guardrails in gateway works in the architecture given below.
+
+```mermaid
+graph LR
+  Entry(Entry)-->Gateway["Gateway\nGuardrails"]
+  Gateway-->Embedding
+  Embedding-->Gateway
+  Gateway-->Retrieve
+  Retrieve-->Gateway
+  Gateway-->Rerank
+  Rerank-->Gateway
+  Gateway-->LLM
+  LLM-->Gateway
+```
+
+The gateway can host multiple guardrails without extra network transmission or protocol encoding/decoding. In the real world deployment, there may be many guardrails in all perspectives, and the gateway is the best place to provide guardrails for the system.
+
+The gateway consists of 2 basic components, inference runtime and guardrails.
+
+```mermaid
+graph TD
+  Gateway---Runtime[Inference Runtime API]
+  Runtime---OpenVINO
+  Runtime---PyTorch
+  Runtime---Others[...]
+  Gateway---Guardrails
+  Guardrails---Load[Load Model]
+  Guardrails---Inference
+  Guardrails---Access[Access Control]
+```
+
+A unified inference runtime API provides a general interface for inference runtimes. Any inference runtime can be integrated into the system including OpenVINO. The guardrails leverages the inferece runtime and decides if the request/reponse is valid.
+
+#### Stateful Guardrails
+
+The traditional workflow from ingress to egress is presented below.
+
+```mermaid
+flowchart LR
+  Entry(Entry)-->GuardrailsA
+  GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->Embedding
+  Embedding-->Retrieve
+  Retrieve-->Rerank
+  Rerank-->LLM
+  LLM-->GuardrailsB["Guardrails\nAnti-Profanity"]
+```
+
+Guardrails service provides certain protection for LLM, such as anti-jailbreaking, anti-poisoning for the input side, anti-toxicity, factuality check for the output side, and PII detection for both input and output side.
+
+Guardrails can also be spliited into 2 types, stateless and stateful. Guardrails including anti-jailbreaking, anti-toxicity and PII detection are considered as stateless guards, since they do not rely on both prompt input and response output, while anti-hallucination is regarded as a stateful guard, it needs both input and ouput for the relativity between.
+
+[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails) provides certain guardrails as microservice, but due to the limitation microservice, it is not able to track requests for responses, leading to difficulty in providing stateless guard ability.
+
+The opt-in guardrails in gateway works in the architecture given below.
+
+```mermaid
+flowchart LR
+  Entry(Entry)-->GuardrailsA
+  subgraph Gateway
+    GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->GuardrailsC
+    GuardrailsB-->GuardrailsC
+  end
+  GuardrailsC["Guardrails\nAnti-Hallucination"]-->Embedding
+  Embedding-->Retrieve
+  Retrieve-->Rerank
+  Rerank-->LLM
+  LLM-->GuardrailsB["Guardrails\nAnti-Profanity"]
+```
+
+As a alternative choice, the gateway will also provide guardrails ability, no matter stateful or stateless.
+
+#### Observability
+
+Envoy is the most popular proxy in cloud native, which contains out-of-box access log, stats and metrics, and can be integrated into observability platform including OpenTelemetry and Prometheus naturally.
+
+Guardrails in gateway will leverages these abilities about observability to meet potential regulartory and compliance needs.
+
+#### Multi-Services Deployment
+
+Let's say the embedding and LLM services are AI-powered and require guardrails protection.
+
+The opt-in gateway can be deployed as a gateway or sidecar services.
+
+```mermaid
+graph LR
+  Entry(Entry)-->Embedding
+  subgraph SidecarA[Sidecar]
+    Embedding
+  end
+  Embedding-->Retrieve
+  Retrieve-->Rerank
+  Rerank-->LLM
+  subgraph SidecarB[Sidecar]
+    LLM
+  end
+```
+
+The gateway can also work with guardrails microservices.
+
+```mermaid
+graph LR
+  Entry(Entry)-->GuardrailsC["Guardrails\nAnti-Hallucination"]
+  GuardrailsC["Guardrails\nAnti-Hallucination"]-->GuardrailsA["Guardrails\nAnti-Jailbreaking"]
+  GuardrailsA-->Embedding
+  Embedding-->Retrieve
+  Retrieve-->Rerank
+  Rerank-->GuardrailsB["Guardrails\nAnti-Jailbreaking"]
+  GuardrailsB-->LLM
+  LLM-->GuardrailsD["Guardrails\nAnti-Profanity"]
+  subgraph Gateway
+    GuardrailsD-->GuardrailsC
+  end
+```
+
+### Alternatives Considered
+
+[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails): has provided certain guardrails, however it only supports stateless guardrails.
+
+### Compatibility
+
+N/A
+
+### Miscs
+
+- TODO
+
+  - [ ] API definitions for meta service deployment and Kubernetes deployment
+  - [ ] Envoy inference framework and guardrails HTTP filter